Coffee Lake inherit most of Kabylake production
workarounds.
v2: Fix typo on commit message and remove
WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC,
since as Mika pointed out they shouldn't be here for cfl
according to BSpec.
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497653398-15722-1-git-send-email-rodrigo.vivi@intel.com
This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.
Sync objects are managed via the drm syncobj ioctls.
The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.
This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.
In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.
NOTE: this interface addition needs a version bump to expose
it to userspace.
TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)
v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis
v6: lookup post deps earlier, and just replace fences
in post deps stage (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This just splits out the fence depenency checking into it's
own function to make it easier to add semaphore dependencies.
v2: rebase onto other changes.
v1-Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoids printing spurious messages like this:
[ 3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Although we use 9 bits of Device ID for identifying PCH, only 8 bits are
stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and
HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the
platforms with LP PCH.
v2: Drop PCH_LPT_LP change (Imre)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Fixes: commit ec7e0bb35f ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH")
Reported-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
During execbuf, a mandatory step is that we add this request (this
fence) to each object's reservation_object. Inside execbuf, we track the
vma, and to add the fence to the reservation_object then means having to
first chase the obj, incurring another cache miss. We can reduce the
number of cache misses by stashing a pointer to the reservation_object
in the vma itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.
v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
When choosing a slot for an execbuffer, we ideally want to use the same
address as last time (so that we don't have to rebind it) and the same
address as expected by the user (so that we don't have to fixup any
relocations pointing to it). If we first try to bind the incoming
execbuffer->offset from the user, or the currently bound offset that
should hopefully achieve the goal of avoiding the rebind cost and the
relocation penalty. However, if the object is not currently bound there
we don't want to arbitrarily unbind an object in our chosen position and
so choose to rebind/relocate the incoming object instead. After we
report the new position back to the user, on the next pass the
relocations should have settled down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
If we take a reference to the object/vma when it is first used in an
execbuf, we can keep that reference until the object's file-local handle
is closed. Thereby saving a frequent ref/unref pair.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.
Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.
The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.
The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.
v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If we write a relocation into the buffer, we require our own implicit
synchronisation added after the start of the execbuf, outside of the
user's control. As we may end up clflushing, or doing the patch itself
on the GPU, asynchronously we need to look at the implicit serialisation
on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
object.
If the user does trigger a stall for relocations, we make sure the stall
is complete enough so that the batch is not submitted before we complete
those relocations.
Fixes: 77ae995789 ("drm/i915: Enable userspace to opt-out of implicit fencing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
We can simplify our tracking of pending writes in an execbuf to the
single bit in the vma->exec_entry->flags, but that requires the
relocation function knowing the object's vma. Pass it along.
Note we have only been using a single bit to track flushing since
commit cc889e0f6c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Wed Jun 13 20:45:19 2012 +0200
drm/i915: disable flushing_list/gpu_write_list
unconditionally flushed all render caches before the breadcrumb and
commit 6ac42f4148
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Sat Jul 21 12:25:01 2012 +0200
drm/i915: Replace the complex flushing logic with simple invalidate/flush all
did away with the explicit GPU domain tracking. This was then codified
into the ABI with NO_RELOC in
commit ed5982e6ce
Author: Daniel Vetter <daniel.vetter@ffwll.ch> # Oi! Patch stealer!
Date: Thu Jan 17 22:23:36 2013 +0100
drm/i915: Allow userspace to hint that the relocations were known
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The advent of full-ppgtt lead to an extra indirection between the object
and its binding. That extra indirection has a noticeable impact on how
fast we can convert from the user handles to our internal vma for
execbuffer. In order to bypass the extra indirection, we use a
resizable hashtable to jump from the object to the per-ctx vma.
rhashtable was considered but we don't need the online resizing feature
and the extra complexity proved to undermine its usefulness. Instead, we
simply reallocate the hastable on demand in a background task and
serialize it before iterating.
In non-full-ppgtt modes, multiple files and multiple contexts can share
the same vma. This leads to having multiple possible handle->vma links,
so we only use the first to establish the fast path. The majority of
buffers are not shared and so we should still be able to realise
speedups with multiple clients.
v2: Prettier names, more magic.
v3: Many style tweaks, most notably hiding the misuse of execobj[].rsvd2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
A 2 pixel wide pink strip was observed on the left end of some HDMI
monitors configured in a HDMI mode.
It turned out that we were missing out on configuring AVI infoframes, and
unlike APQ8064, the 8x96 HDMI H/W seems to be sensitive to that.
Add configuration of AVI infoframes. While at it, make sure that
hdmi_audio_update is only called when we've detected that the monitor
supports HDMI.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Without doing anything in unprepare, the HDMI driver isn't able to
switch modes successfully. Calling set_rate with a new rate results
in an un-locked PLL.
If we reset the PLL in unprepare, the PLL is able to lock with the
new rate.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Commit c0c0d9eeeb ("drm/msm: hdmi audio support") uses logical
OR operators to build up a value to be written in the
REG_HDMI_AUDIO_INFO0 and REG_HDMI_AUDIO_INFO1 registers when it
should have used bitwise operators.
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
Fixes: c0c0d9eeeb ("drm/msm: hdmi audio support")
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that the msm_gem supports an arbitrary number of vma's, we no longer
need to assign an id (index) to each address space. So rip out the
associated code.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It means we have to do a list traversal where we once had an index into
a table. But the list will normally have one or two entries.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Pull some of the logic out into msm_gem_new() (since we don't need to
care about the imported-bo case), and don't defer allocating pages. The
latter is generally a good idea, since if we are using VRAM carveout to
allocate contiguous buffers (ie. no IOMMU), the allocation is more
likely to fail. So failing at allocation time is a more sane option.
Plus this simplifies things in the next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
No functional change, that will come later. But this will make it
easier to deal with dynamically created address spaces (ie. per-
process pagetables for gpu).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Before we can shift to passing the address-space object to _get_iova(),
we need to fix a few places (dsi+fbdev) that were hard-coding the adress
space id. That gets somewhat easier if we just move these to the kms
base class.
Prep work for next patch.
Signed-off-by: Rob Clark <robdclark@gmail.com>
It serves no purpose, things should be sufficiently synchronized already
by atomic framework. And it is somewhat awkward to be holding a spinlock
when msm_gem_iova() is going to start needing to grab a mutex.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Most, but not all, paths where calling the with struct_mutex held. The
fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run
the first time) was masking this issue.
So lets just always hold struct_mutex for hw_init(). And sprinkle some
WARN_ON()'s and might_lock() to avoid this sort of problem in the
future.
Signed-off-by: Rob Clark <robdclark@gmail.com>
memptrs->wptr seems to be unused. Remove it to avoid
confusing the upcoming preemption code.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Modify the 'pad' member of struct drm_msm_gem_info to 'flags'. If the
user sets 'flags' to non-zero it means that they want a IOVA for the
GEM object instead of a mmap() offset. Return the iova in the 'offset'
member.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[robclark: s/hint/flags in commit msg]
Signed-off-by: Rob Clark <robdclark@gmail.com>
There isn't any generic code that uses ->idle so remove it.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The ioctl array is sparsely populated but the compiler will make sure
that it is sufficiently sized for all the values that we have so we
can safely use ARRAY_SIZE() instead of having a constantly changing
#define in the uapi header.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The A5XX GPU powers on in "secure" mode. In secure mode the GPU can
only render to buffers that are marked as secure and inaccessible
to the kernel and user through a series of hardware protections. In
practice secure mode is used to draw things like a UI on a secure
video frame.
In order to switch out of secure mode the GPU executes a special
shader that clears out the GMEM and other sensitve registers and
then writes a register. Because the kernel can't be trusted the
shader binary is signed and verified and programmed by the
secure world. To do this we need to read the MDT header and the
segments from the firmware location and put them in memory and
present them for approval.
For targets without secure support there is an out: if the
secure world doesn't support secure then there are no hardware
protections and we can freely write the SECVID_TRUST register from
the CPU. We don't have 100% confidence that we can query the
secure capabilities at run time but we have enough calls that
need to go right to give us some confidence that we're at least doing
something useful.
Of course if we guess wrong you trigger a permissions violation
which usually ends up in a system crash but thats a problem
that shows up immediately.
[v2: use child device per Bjorn]
[v3: use generic MDT loader per Bjorn]
[v4: use managed dma functions and ifdefs for the MDT loader]
[v5: Add depends for QCOM_MDT_LOADER]
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[robclark: fix Kconfig to use select instead of depends + #if IS_ENABLED()]
Signed-off-by: Rob Clark <robdclark@gmail.com>
For ease of use (i.e. avoiding a few checks and function calls), store
the object's cache coherency next to the cache is dirty bit.
Specifically this patch aims to reduce the frequency of no-op calls to
i915_gem_object_clflush() to counter-act the increase of such calls for
GPU only objects in the previous patch.
v2: Replace cache_dirty & ~cache_coherent with cache_dirty &&
!cache_coherent as gcc generates much better code for the latter
(Tvrtko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616105455.16977-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Currently, we only mark the CPU cache as dirty if we skip a clflush.
This leads to some confusion where we have to ask if the object is in
the write domain or missed a clflush. If we always mark the cache as
dirty, this becomes a much simply question to answer.
The goal remains to do as few clflushes as required and to do them as
late as possible, in the hope of deferring the work to a kthread and not
block the caller (e.g. execbuf, flips).
v2: Always call clflush before GPU execution when the cache_dirty flag
is set. This may cause some extra work on llc systems that migrate dirty
buffers back and forth - but we do try to limit that by only setting
cache_dirty at the end of the gpu sequence.
v3: Always mark the cache as dirty upon a level change, as we need to
invalidate any stale cachelines due to external writes.
Reported-by: Dongwon Kim <dongwon.kim@intel.com>
Fixes: a6a7cc4b7d ("drm/i915: Always flush the dirty CPU cache when pinning the scanout")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Tested-by: Dongwon Kim <dongwon.kim@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615123850.26843-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
i915_vma_destroy() is now not used outside of i915_vma.c so we can
remove the export and make the function static.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616123508.12673-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Attach the tv_format property to the SDVO connector instead of passing
a '0' in place of the pointer to the property. This got broken when
the SDVO connector properties were converted to atomic.
We can thank sparse for catching this:
drivers/gpu/drm/i915/intel_sdvo.c:2742:75: warning: Using plain integer as NULL pointer
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Fixes: 630d30a4ee ("drm/i915: Convert intel_sdvo connector properties to atomic.")
Link: http://patchwork.freedesktop.org/patch/msgid/20170615172308.10121-1-ville.syrjala@linux.intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
CMA has gained a recent helper function for calculating the start
of a plane buffer's physical address. Use that instead of the
hand rolled version.
Cc: Brian Starkey <brian.starkey@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
If an instance of Mali DP hardware shares the interrupt line with
another hardware (usually another instance of the Mali DP) its
interrupt handler can get called when the device is suspended.
Check the PM status before making access to the hardware registers
to avoid deadlocks.
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Now that we have a callback to check if crtc supports a given mode
we can use it in malidp so that we restrict the number of probbed
modes to the ones we can actually display.
Also, remove the mode_fixup() callback as this is no longer needed
because mode_valid() will be called before.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Carlos Palminha <palminha@synopsys.com>
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Cc: Archit Taneja <architt@codeaurora.org>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Liviu Dudau <liviu@dudau.co.uk>
gvt-next-2017-06-08
First gvt-next pull for 4.13:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608093547.bjgs436e3iokrzdm@zhen-hp.sh.intel.com
There is a prototype for this function in the header, but the function
itself lacks a 'void' in the argument list, causing a harmless warning
when building with 'make W=1':
drivers/gpu/drm/nouveau/nouveau_drm.c: In function 'nouveau_pmops_runtime':
drivers/gpu/drm/nouveau/nouveau_drm.c:730:1: error: old-style function definition [-Werror=old-style-definition]
Fixes: 321f5c5f2c ("drm/nouveau: replace multiple open-coded runpm support checks with function")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As with vga_init, this function doesn't make sense on non-PCI devices,
and the Thunderbolt check in it dereferences a NULL pointer in that
case. Add some code to skip this function when the device is not a PCI
device.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On Tegra186 systems with certain firmware revisions, leaving the GPU in
reset can cause a hang. To prevent this, don't leave the GPU in reset.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
On Tegra186, powergating is handled by the BPMP power domain provider
and the "legacy" powergating API is not available. Therefore skip
these calls if we are attached to a power domain.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This patch replaces the symbolic permissions with the numeric ones.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This patch creates a special group attributes for attrs like "*auto_point*".
We check if we have support for them, and if we do, we gather them all in
an attribute_group's structure which is the parameter regarding special groups
of hwmon_device_register_with_info
We also do the same for pwm_min/max attrs.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This patch removes old code related to the old api and transforms the
functions for the new api. It also adds the .write and .read operations.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This patch introduces the nouveau_hwmon_ops structure, sets up
.is_visible and .read_string operations and adds all the functions
for these operations.
This is also a preparation for the next patches, where most of the
work is being done.
This code doesn't interacture with the old one.
It's just to make easier the review of all patches.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is a preparation for the next patches. It just adds the sensors with
their possible configurable settings and then fills the struct hwmon_channel_info
with all this information.
Signed-off-by: Oscar Salvador <osalvador.vilardaga@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This makes use of all the additional routing and state added in previous
commits, making it possible to deal with GM20x macro link routing, while
also sharing code between the NV50 and GF119 implementations.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This shouldn't have been needed ever since we started executing the
DisableLT script when shutting down heads.
Testing of the board this was originally written for seems to agree.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Training/Untraining will be hooked up to the routing logic, which
doesn't allow us to pass in a data rate.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These exist to give NVKM information on the set of display paths that
the DD needs to be active at any given time.
Previously, the supervisor attempted to determine this solely from OR
state, but there's a few configurations where this information on its
own isn't enough to determine the specific display paths in question:
- ANX9805, where the PIOR protocol for both DP and TMDS is TMDS.
- On a device using DCB Switched Outputs.
- On GM20x and newer, with a crossbar between the SOR and macro links.
After this commit, the DD tells NVKM *exactly* which display path it's
attempting a modeset on.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
All of the necessary hw-specific logic is now handled at the output
resource level, so all of this can go away.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Also removes the user-facing methods to these controls, as they're not
currently utilised by the DD anyway.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This essentially (unless the link becomes unstable and needs to be
re-trained) gives us a single entry-point to link training, during
supervisor handling, where we can ensure all routing is up to date.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
An upcoming commit will limit link training to only when the sink is
meant to be displaying an image.
We still need IRQs enabled even when the link isn't trained (for MST
messages), but don't want to train the link unnecessarily.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The aim here is to protect the OR against locking up when something
unexpected happens (such as the display disappearing during modeset,
or the DD misbehaving).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This struct doesn't hold link configuration data anymore, so we can
limit its use to internal DP training (anx9805 handles training for
external DP).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This hasn't been used since atomic.
We may want to re-implement "fast" DPMS at some point, but for now,
this just gets in the way.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This essentially replicates our current behaviour in a way that's
compatible with the new model that's emerging, so that we're able
to start porting the hw-specific functions to it.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming commits make supervisor handling share code between the NV50
and GF119 implementations. Because of this, and a few other cleanups,
we need to allow some additional customisation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
In order to properly support the SOR -> SOR + pad macro separation
that occurred with GM20x GPUs, we need to separate OR handling out
of the output path code.
This will be used as the base to support ORs (DAC, SOR, PIOR).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Primarily intended as a way to pass per-head state around during
supervisor handling, and share logic between NV50/GF119.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is to allow hw-specific code to instantiate output resources first,
so we can cull unsupported output paths based on them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Not all users of nvkm_output_dp have been changed here. The remaining
ones belong to code that's disappearing in upcoming commits.
This also modifies the debug level of some messages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This isn't technically "output", but, "display/output path".
Not all users of nvkm_output have been changed here. The remaining
ones belong to code that's disappearing in upcoming commits.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming changes to split OR from output path drastically change the
placement of various operations.
In order to make the real changes clearer, do the moving around part
ahead of time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
As of DCB 4.1, these are not the same thing.
Compatibility temporarily in place until callers have been updated.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We already have a subdev pointer, from which we can locate the device's
BIOS subdev. No need for a separate pointer.
Structure/callers not updated yet, as I want to batch more changes and
only touch the callers once.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nvkm_timer_alarm() already handles this as part of protecting against
callers passing in no timeout value.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
I only saw those values inside the vbios: 0xff, 0xfd, 0xfc, 0xfa for valid
rails.
No idea what the lower value does, but at least we get power readings on
a lot of Fermi GPUs with that.
v2: add missing parentheses
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is according to what we have in nvbios.
Fixes "ERROR: Can't get value of subfeature in0_min: Can't read" errors
in sensors for some GPUs.
Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Enable stereoscopic output for HDMI and DisplayPort connectors on
NV50+ (G80+) hardware. We do not enable stereoscopy on older
hardware in case there is some older board that still has HDMI
output but for which we have no logic for setting the Vendor
InfoFrame.
With this, I get an obvious 3D output when using the "testdisplay"
program from intel-gpu-tools with the "-3" parameter and outputting
to a 3D-capable HDMI display, for all available 3D modes (be they
TB, SBSH, or FP) on all four G80+ DISPs.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Frame-packing modes add an extra vtotal raster lines to each frame
above and beyond what the basic mode description calls for.
Account for this during scaler configuration (possibly a bit of a
hack), during CRTC configuration (clearly not a hack), and when
checking that a mode is valid for a given connector (cribbed from
the i915 driver).
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any InfoFrame that is not provided, and disable
the Vendor InfoFrame when disabling the output.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any InfoFrame that is not provided, and disable
the Vendor InfoFrame when disabling the output.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic") InfoFrame.
Also don't enable any AVI or Vendor InfoFrame that is not provided,
and disable the Vendor InfoFrame when disabling the output.
Ignore the Audio InfoFrame: We don't supply it, and altering HDMI
audio semantics (for better or worse) on this hardware is out of
scope for me at this time.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have the InfoFrame data being provided, for the most
part, program the hardware to use it.
While we're here, and since the functionality will come in handy
for supporting 3D stereoscopy, implement setting the Vendor
("generic"?) InfoFrame.
Also don't enable any AVI or Vendor InfoFrame that is not provided,
and disable the Vendor InfoFrame when disabling the output.
Ignore the Audio InfoFrame: We don't supply it, and altering HDMI
audio semantics (for better or worse) on this hardware is out of
scope for me at this time.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
HDMI InfoFrames are passed to NVKM as bags of bytes, but the
hardware needs them to be packed into words. Rather than having
four (or more) copies of the packing logic introduce a single copy
now, in a central place.
We currently need these for AVI and Vendor InfoFrames, but we may
also expect to need them for Audio InfoFrames at some point.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Now that we have mechanism by which to pass mode-dependent HDMI
InfoFrames to the low-level hardware driver, it is incumbent upon
us to do so.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The nouveau driver, in the Linux 3.7 days, used to try and set the
AVI InfoFrame based on the selected display mode. These days, it
uses a fixed set of InfoFrames. Start to correct that, by
providing a mechanism whereby InfoFrame data may be passed to the
NVKM functions that do the actual configuration.
At this point, only establish the new parameters and their parsing,
don't actually use the data anywhere yet (since it's not supplied
anywhere).
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
drm_mode_set_crtcinfo() does compensation for interlace and
doublescan timing effects already, so do it first and use the
compensated figures instead of the constant "vscan / ilace" terms
that we had before.
And then it turns out that the hardware model for how the timing
parameters are configured is basically the standard model, but
starting one clock before the sync pulse rather than at the start
of the display area, which lets us drastically simplify the
overall timing calculations (verifying the changes by algebraic
operations is left as an exercise for the reader).
Finally, there were a couple of issues with the computation of
m->v.blankus that are addressed here. Interlaced modes would
generate a negative intermediate result. Double scan modes would
generate an overestimate rather than an underestimate. And when
enabling frame-packing modes, a rather extreme overestimate would
be generated. Fixed, by using the timings as adjusted for the
CRTC to find the length of the vertical blanking period instead of
mixing adjusted and pre-adjustment timing parameters.
Signed-off-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
=m2Gx
-----END PGP SIGNATURE-----
BackMerge tag 'v4.12-rc5' into drm-next
Linux 4.12-rc5 for nouveau fixes
- Remove counter load enable form PRE, which has no effect.
- Add support for setting the double read/write reduction flag in channel
parameter memory. This can be used to save some memory bandwidth when
capturing in YUV 4:2:0 chroma subsampled formats.
- Allocate DMA channel structures as needed, most of the 64 channels are
unused or even reserved.
- Remove unused interrupt busy waiting routine.
- Set VDIC field order for both AUTO and MAN inputs simultaneously as
both can't be active at the same time.
-----BEGIN PGP SIGNATURE-----
iQJLBAABCAA1FiEEBsBxhV1FaKwXuCOBUMKIHHCeYOsFAlk49zAXHHAuemFiZWxA
cGVuZ3V0cm9uaXguZGUACgkQUMKIHHCeYOtX+g/+M5DPkGhcP8fN9m47MOAEa+nP
oDciYC76VrQev/Qys/zLM3/6sWF9h82USJ52H+zw41RuKKkYlcOzVWnSPQd2yN6Y
hIl0fCsFzGxOMnIAhmi6BHFnvJKP1jsfeBdXSHyxI0y5kGoufG7BEHiJ7TTSgy/I
JhccDKTRV9NzAfwpD37EI3a/Nc53DRpw3jrnHPnAaBJ6hYVPZ9YCSrBYbQQbIrDr
x6NB8E1Ga3KRGZMTw45bTBiOs4AbZKSunzrqWQFnTRjbE+aTDs9W5n5wcdr7AoWi
gqnx+b6TkiarNK3taHffjYioYvvn2nbGhuoAtg7hpS0CeUup0gitQquKM0kqsWnk
yBykkP+Z7udASRgXdK6Gtzo6hzdhPeFPmmMbKmSBdIvT26t0ikf9RN1UlEhE6nY3
A68jKC4+gNTu8kF6imzWCfwM9KB4pWn0N0qTY5U9Y7/gWFky6IEDn3V5OM3XXqUa
c/iglYyzO+B7vVu6ZajlH+shemO1mVaxGjVFrfX29syVooZrmo0NVJPdoKwiYx0r
E08FCOdUIEhzFS6h1/FII+mZ6YzAmNkXVz+l+MWaoOW2tIWWX4xSsFTKBka2hDfq
daU3CDcKcg8LuSybHg6lzj8Hw+/CWkqgtv6ESVnvHEUYCHoZsvArJTNoQrx/zJbE
EwgZIM4daufEuv6I5zo=
=m3BY
-----END PGP SIGNATURE-----
Merge tag 'imx-drm-next-2017-06-08' of git://git.pengutronix.de/git/pza/linux into drm-next
imx-drm: cleanups and YUV 4:2:0 memory read/write reduction support
- Remove counter load enable form PRE, which has no effect.
- Add support for setting the double read/write reduction flag in channel
parameter memory. This can be used to save some memory bandwidth when
capturing in YUV 4:2:0 chroma subsampled formats.
- Allocate DMA channel structures as needed, most of the 64 channels are
unused or even reserved.
- Remove unused interrupt busy waiting routine.
- Set VDIC field order for both AUTO and MAN inputs simultaneously as
both can't be active at the same time.
* tag 'imx-drm-next-2017-06-08' of git://git.pengutronix.de/git/pza/linux:
gpu: ipu-v3: vdic: include AUTO field order bit in ipu_vdi_set_field_order
gpu: ipu-v3: remove interrupt busy waiting routine
gpu: ipu-v3: allocate ipuv3_channels as needed
gpu: ipu-v3: Add support for double read/write reduction
gpu: ipu-v3: prg: remove counter load enable
some fsl-dcu cleanups
* tag 'drm-fsl-dcu-for-v4.13' of http://git.agner.ch/git/linux-drm-fsl-dcu:
drm/fsl-dcu: use new drm_atomic_helper_shutdown
drm/fsl-dcu: implement irq_preinstall/uninstall callbacks
drm/fsl: Drop drm_vblank_cleanup
The series interleaves DRM and V4L2 patches due to dependencies between the R-
Car DU and VSP drivers. Mauro has acked all the V4L2 patches to go through
your tree, and they don't conflict with anything queued for v4.13 in his tree.
If I need to send any conflicting patches through Mauro's tree for v4.13, I'll
make sure to base them on this branch.
* 'drm/next/du' of git://linuxtv.org/pinchartl/media:
drm: rcar-du: Map memory through the VSP device
v4l: vsp1: Add API to map and unmap DRM buffers through the VSP
v4l: vsp1: Map the DL and video buffers through the proper bus master
v4l: rcar-fcp: Add an API to retrieve the FCP device
v4l: rcar-fcp: Don't get/put module reference
drm: rcar-du: Register a completion callback with VSP1
v4l: vsp1: Extend VSP1 module API to allow DRM callbacks
v4l: vsp1: Postpone frame end handling in event of display list race
drm: rcar-du: Arm the page flip event after queuing the page flip
An unusually big pull request for this merge window, with three notable
features:
- V3s display engine support. This is especially notable because it uses
a different display engine used on the newer Allwinner SoCs (H3, A64
and the likes) that will be quite easily supported now.
- HDMI support for the old Allwinner SoCs. This is enabled only on the
A10s for now, but should be really easy to extend to deal with A10, A20
and A31
- Preliminary work to deal with dual-pipeline SoCs (A10, A20, A31, H3,
etc.). It currently ignores the second pipeline, but we can use the
dual-pipelines bindings. This will be useful to enable the display
pipeline while we work on the dual-pipeline.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZQZ40AAoJEBx+YmzsjxAgEDgP+wd29aHXZPIp7EGlCR9L1qPS
62Ps5lrbgwxGJx8Nf7T51tQLl0SDBHrI9xgrQLla1bdqItAMyIkJG2NQ+INN2ZSt
Yz2aUZL3Xx9DE3dUhbLdfUejOCQZqopn8pPe2egeMYNXS5hsAhwwLB8AYcX3LLCN
WFpbC3hcd4pih/teDGyJ4bn4I/teyA3qWleq73f+A/eBBbCI4aRZWvx34/G/pzUL
KeHryE35evukgfOJmZ9wAYMbj8BWHV3t6xObk9TqYrgQ/Vzh7IkAWl1wkcd2qRSQ
RVhfCI4s0jkd44EHMQLxqh27LQakSviIqZWsJEf91qvi5g9A4ZyIS/H4/dGusfbP
59QkREjOdnlVqTSfP5E0QzZbzdcbZhV2YoPp73Rv319lif4HqV/9Z7IhDBmNC2IZ
g2O4J3DKvDjtP4A2s3CoGRVgxf1ag/K+Ig6AEQZL+ShrcHwPEBrVqfqp/MnV+Qg2
gRv0YimGMKReHaMYkT8NdDtqlAzaLWUFnEYRO92X0B4m3NZENMeeDZ8wMy26/yxo
TJxwzxxsNON59mZESS0Ci88q/bMnFkyCYvuWIeHol2GgUK7yzN2ePf9p/eUPOziK
gDbJaKhSZxH67Lerqg+zCFbuPGDu7ZGyOHWMgfG6Xh69bCJptJAJNK03TEvoBvpX
CSdBObTExOj5S/m64+SI
=HtFh
-----END PGP SIGNATURE-----
Merge tag 'sunxi-drm-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-next
sun4i-drm changes for 4.13
An unusually big pull request for this merge window, with three notable
features:
- V3s display engine support. This is especially notable because it uses
a different display engine used on the newer Allwinner SoCs (H3, A64
and the likes) that will be quite easily supported now.
- HDMI support for the old Allwinner SoCs. This is enabled only on the
A10s for now, but should be really easy to extend to deal with A10, A20
and A31
- Preliminary work to deal with dual-pipeline SoCs (A10, A20, A31, H3,
etc.). It currently ignores the second pipeline, but we can use the
dual-pipelines bindings. This will be useful to enable the display
pipeline while we work on the dual-pipeline.
* tag 'sunxi-drm-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux: (27 commits)
drm/sun4i: Add compatible for the A10s pipeline
drm/sun4i: Add HDMI support
dt-bindings: display: sun4i: Add allwinner,tcon-channel property
dt-bindings: display: sun4i: Add HDMI display bindings
drm/sun4i: Ignore the generic connectors for components
drm/sun4i: tcon: multiply the vtotal when not in interlace
drm/sun4i: tcon: Change vertical total size computation inconsistency
drm/sun4i: tcon: Fix tcon channel 1 backporch calculation
drm/sun4i: tcon: Switch mux on only for composite
drm/sun4i: tcon: Move the muxing out of the mode set function
drm/sun4i: tcon: Add channel debug
drm/sun4i: tcon: add support for V3s TCON
drm/sun4i: Add compatible string for V3s display engine
drm/sun4i: add support for Allwinner DE2 mixers
drm/sun4i: add a Kconfig option for sun4i-backend
drm/sun4i: abstract a engine type
drm/sun4i: return only planes for layers created
dt-bindings: add bindings for DE2 on V3s SoC
drm/sun4i: backend: Clarify sun4i_backend_layer_enable debug message
drm/sun4i: Set TCON clock inside sun4i_tconX_mode_set
...
Driver Changes:
- dw-hdmi: Fix compilation error if REGMAP_MMIO not selected (Laurent)
- host1x: Fix incorrect return value (Christophe)
- tegra: Shore up idr API usage in tegra staging code (Dmitry)
- mgag200: Always use HiPri mode for G200e4v2 and limit max bandwidth (Mathieu)
- mxsfb: Ensure display can be lit up without bootloader initialization (Fabio)
Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Cc: Dmitry Osipenko <digetx@gmail.com>
Cc: Mathieu Larouche <mathieu.larouche@matrox.com>
Cc: Fabio Estevam <fabio.estevam@nxp.com>
* tag 'drm-misc-fixes-2017-06-15' of git://anongit.freedesktop.org/git/drm-misc:
drm: mxsfb_crtc: Reset the eLCDIF controller
drm/mgag200: Fix to always set HiPri for G200e4 V2
drm/tegra: Correct idr_alloc() minimum id
drm/tegra: Fix lockup on a use of staging API
gpu: host1x: Fix error handling
drm: dw-hdmi: Fix compilation breakage by selecting REGMAP_MMIO
New radeon and amdgpu features for 4.13:
- Lots of Vega10 bug fixes
- Preliminary Raven support
- KIQ support for compute rings
- MEC queue management rework from Andres
- Audio support for DCE6
- SR-IOV improvements
- Improved module parameters for controlling radeon vs amdgpu support
for SI and CIK
- Bug fixes
- General code cleanups
[airlied: dropped drmP.h header from one file was needed and build broke]
* 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux: (362 commits)
drm/amdgpu: Fix compiler warnings
drm/amdgpu: vm_update_ptes remove code duplication
drm/amd/amdgpu: Port VCN over to new SOC15 macros
drm/amd/amdgpu: Port PSP v10.0 over to new SOC15 macros
drm/amd/amdgpu: Port PSP v3.1 over to new SOC15 macros
drm/amd/amdgpu: Port NBIO v7.0 driver over to new SOC15 macros
drm/amd/amdgpu: Port NBIO v6.1 driver over to new SOC15 macros
drm/amd/amdgpu: Port UVD 7.0 over to new SOC15 macros
drm/amd/amdgpu: Port MMHUB over to new SOC15 macros
drm/amd/amdgpu: Cleanup gfxhub read-modify-write patterns
drm/amd/amdgpu: Port GFXHUB over to new SOC15 macros
drm/amd/amdgpu: Add offset variant to SOC15 macros
drm/amd/powerplay: add avfs control for Vega10
drm/amdgpu: add virtual display support for raven
drm/amdgpu/gfx9: fix compute ring doorbell index
drm/amd/amdgpu: Rename KIQ ring to avoid spaces
drm/amd/amdgpu: gfx9 tidy ups (v2)
drm/amdgpu: add contiguous flag in ucode bo create
drm/amdgpu: fix missed gpu info firmware when cache firmware during S3
drm/amdgpu: export test ib debugfs interface
...
This allows mesa to set the tiling format for a BO and have that
tiling format be respected by mesa on the other side of an
import/export (and by vc4 scanout in the kernel), without defining a
protocol to pass the tiling through userspace.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-2-eric@anholt.net
Acked-by: Dave Airlie <airlied@redhat.com>
The T tiling format is what V3D uses for textures, with no raster
support at all until later revisions of the hardware (and always at a
large 3D performance penalty). If we can't scan out V3D's format,
then we often need to do a relayout at some stage of the pipeline,
either right before texturing from the scanout buffer (common in X11
without a compositor) or between a tiled screen buffer right before
scanout (an option I've considered in trying to resolve this
inconsistency, but which means needing to use the dirty fb ioctl and
having some update policy).
T-format scanout lets us avoid either of those shadow copies, for a
massive, obvious performance improvement to X11 window dragging
without a compositor. Unfortunately, enabling a compositor to work
around the discrepancy has turned out to be too costly in memory
consumption for the Raspbian distribution.
Because the HVS operates a scanline at a time, compositing from T does
increase the memory bandwidth cost of scanout. On my 1920x1080@32bpp
display on a RPi3, we go from about 15% of system memory bandwidth
with linear to about 20% with tiled. However, for X11 this still ends
up being a huge performance win in active usage.
This patch doesn't yet handle src_x/src_y offsetting within the tiled
buffer. However, we fail to do so for untiled buffers already.
Signed-off-by: Eric Anholt <eric@anholt.net>
Link: http://patchwork.freedesktop.org/patch/msgid/20170608001336.12842-1-eric@anholt.net
Reviewed-by: Boris Brezillon <boris.brezillon@free-electrons.com>
This reverts commit bb9d85f6e9.
New ddb allocation algorithm is a show stopper on my SKL system.
Besides not be able to get external DP 4k@60 (through USB type C),
It fully hang my screen when unplugging the USB type C.
Bugzilla: https://patchwork.freedesktop.org/patch/161571/
Fixes: bb9d85f6e9 ("drm/i915/skl: New ddb allocation algorithm")
Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497376350-3400-1-git-send-email-rodrigo.vivi@intel.com
As per BSEPC, if device ready bit is '0' in enable IO sequence
then its a cold boot/reset scenario eg: S3/S4 resume. If cold boot
scenario detected in enable IO, then prepare port immediately.
In normal boot scenario, prepare port after glk_dsi_device_ready().
Without cold boot sequence enabled, features like S3/S4 doesn't work.
Signed-off-by: Madhav Chauhan <madhav.chauhan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497340095-5877-2-git-send-email-madhav.chauhan@intel.com
This patch divides glk_dsi_device_ready() function into
two part. First part will program LP wake and MIPI DSI mode
to MIPI_CTRL reg using newly defined function glk_dsi_enable_io().
glk_dsi_enable_io() will be called from intel_dsi_pre_enable.
Second part will do remaining device ready activities using
the existing function glk_dsi_device_ready().
Signed-off-by: Madhav Chauhan <madhav.chauhan@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497340095-5877-1-git-send-email-madhav.chauhan@intel.com
According to the eLCDIF initialization steps listed in the MX6SX
Reference Manual the eLCDIF block reset is mandatory.
Without performing the eLCDIF reset the display shows garbage content
when the kernel boots.
In earlier tests this issue has not been observed because the bootloader
was previously showing a splash screen and the bootloader display driver
does properly implement the eLCDIF reset.
Add the eLCDIF reset to the driver, so that it can operate correctly
independently of the bootloader.
Tested on a imx6sx-sdb board.
Cc: <stable@vger.kernel.org>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1494007301-14535-1-git-send-email-fabio.estevam@nxp.com
Smaller scope reduces visibility of variable and makes usage of
uninitialized variable less possible.
Changes in v2:
- separate declaration and initialization
Changes in v3:
- add missing signed-off-by tag
Signed-off-by: Dawid Kurek <dawikur@gmail.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615174556.GA8872@gmail.com
According to the eLCDIF initialization steps listed in the MX6SX
Reference Manual the eLCDIF block reset is mandatory.
Without performing the eLCDIF reset the display shows garbage content
when the kernel boots.
In earlier tests this issue has not been observed because the bootloader
was previously showing a splash screen and the bootloader display driver
does properly implement the eLCDIF reset.
Add the eLCDIF reset to the driver, so that it can operate correctly
independently of the bootloader.
Tested on a imx6sx-sdb board.
Cc: <stable@vger.kernel.org>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1494007301-14535-1-git-send-email-fabio.estevam@nxp.com
CPU and GPU paths were mostly the same.
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allows reading/writing via SOC15 macros with offset for
various register banks.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Same as other asics. If enabled, exposes a user selectable
number of virtual displays.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This got lost when the code was revamped. Copy/paste bug from
gfx8.
Reported-by: Evan Quan <evan.quan@amd.com>
Fixes: 78c168342 (drm/amdgpu: allow split of queues with kfd at queue granularity v4)
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Swap space for underscore in ring name.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
A couple of simple tidy ups to register programming.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
(v2): Avoid using 'data' uninitialized
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Under VF environment, the ucode would be settled to the visible VRAM,
As it would be pinned to the visible VRAM, it's better to add
contiguous flag,otherwise it need to move gpu address during the pin
process. This movement is not necessary.
Signed-off-by: horchen <horace.chen@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gpu_info firmware is released after data is used. But when system enters into
suspend, upper class driver will cache all firmware names. At that time,
gpu_info will be failing to load. It seems an upper class issue, that we should
not release gpu_info firmware until device finished.
[ 903.236589] cache_firmware: amdgpu/vega10_sdma1.bin
[ 903.236590] fw_set_page_data: fw-amdgpu/vega10_sdma1.bin buf=ffff88041eee10c0 data=ffffc90002561000 size=17408
[ 903.236591] cache_firmware: amdgpu/vega10_sdma1.bin ret=0
[ 903.464160] __allocate_fw_buf: fw-amdgpu/vega10_gpu_info.bin buf=ffff88041eee2c00
[ 903.471815] (NULL device *): loading /lib/firmware/updates/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
[ 903.482870] (NULL device *): loading /lib/firmware/updates/amdgpu/vega10_gpu_info.bin failed with error -2
[ 903.492716] (NULL device *): loading /lib/firmware/4.11.0-custom/amdgpu/vega10_gpu_info.bin failed with error -2
[ 903.503156] (NULL device *): direct-loading amdgpu/vega10_gpu_info.bin
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As Christian and David's suggestion, submit the test ib ring debug interfaces.
It's useful for debugging with the command submission without VM case.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to clear it. The values are set explicitly.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Maarten and Ville noticed that we are enabling backlight via DP aux very
early in the modeset_init path via the intel_dp_aux_setup_backlight()
function, since commit e7156c8339 ("drm/i915: Add Backlight Control using
DPCD for eDP connectors (v9)"). Looks like all we need to do during
_setup_backlight() is read the current brightness state instead of
modifying it.
v2: Rewrote commit message.
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Yetunde Adebisi <yetundex.adebisi@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Puthikorn Voravootivat <puthik@chromium.org>
Fixes: e7156c8339 ("drm/i915: Add Backlight Control using DPCD for eDP connectors (v9)")
Link: http://patchwork.freedesktop.org/patch/msgid/1497384239-2965-1-git-send-email-dhinakaran.pandiyan@intel.com
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
The function cnl_ddi_dp_set_dpll_hw_state does not need to be in global
scope, so make it static.
Cleans up sparse warning:
"symbol 'cnl_ddi_dp_set_dpll_hw_state' was not declared. Should it
be static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170613134751.29196-1-colin.king@canonical.com
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
With 830 the only thing needing pipe quirks, we can just drop the quirk
defines and replace the checks with IS_I830() checks.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-8-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The pipe A force quirk shouldn't needed except on 830. So let's nuke it
for the IBM Thinkpad T60 945 machines. This quirk pre-dates
KMS so it's usefulness is doubtful at best now.
The original bug report [1] describes the symptoms as "system hang on
closing T60 panel lid", and we already dropped a similar quirk for
another 945 machine in
commit 736a69ca8c ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini")
so I'm hopeful we can drop this one as well.
The quirk was added into xf86-video-intel in
commit 08903abe4dc0 ("Add pipe a force enable quirk for Lenovo T60")
[1] https://bugs.freedesktop.org/show_bug.cgi?id=16494
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-7-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The pipe A force quirk shouldn't needed except on 830. So let's nuke it
for the Toshiba Protege R-205/S-209 945 machines. This quirk pre-dates
KMS so it's usefulness is doubtful at best now.
Unfortunately the original bug report [1] isn't very helpful since it
doesn't describe the symptoms. And the commit message in xf86-video-intel
commit ecdb5963ef68 ("Add pipe A force enable quirk for Toshiba Portege R205-S209")
is not much help either.
However, if we assume the problem was the typical "closing the lid
hangs the box" type of thing, we already nuked the quirk for another
945 machine in
commit 736a69ca8c ("drm/i915: Drop PIPE-A quirk for 945GSE HP Mini")
and so I hope we can drop this one as well.
[1] https://bugs.freedesktop.org/show_bug.cgi?id=14944
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-6-ville.syrjala@linux.intel.com
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
830 more or less requires both pipes and DPLLs to remain on as long
as either pipe is needed. However, when neither pipe is actually needed,
we can save a bit of power by turning everything off. To do that we add
a new "power well" that turns both pipes and DPLLs on and off in the
right order. Seems to save ~50mW on my Fujitsu-Siemens Lifebook S6010.
This also avoids having to abuse the load detection to force pipe A on
at init time. That was never very robust, and it only worked for one
pipe, whereas 830 really needs both pipes enabled. As a bonus the 830
pipe quirk is now a bit more isolated from the rest of the mode setting
infrastructure, which should mean that it's much less likely someone
will accidentally break it in the future. The extra cost is of course
slight code duplication, but that seems like a worthwile tradeoff here.
v2; s/BIT/BIT_ULL/
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-5-ville.syrjala@linux.intel.com
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The magic "enable the DPLL three times" sequence feels like it
deserves a loop.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-4-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
The blocking gather copy allocation is a major performance downside of the
Host1x firewall, it may take hundreds milliseconds which is unacceptable
for the real-time graphics operations. Let's try a non-blocking allocation
first as a least invasive solution, it makes opentegra (Xorg driver)
performance indistinguishable with/without the firewall.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
This is largely a rewrite of the Host1x channel allocation code, bringing
several changes:
- The previous code could deadlock due to an interaction
between the 'reflock' mutex and CDMA timeout handling.
This gets rid of the mutex.
- Support for more than 32 channels, required for Tegra186
- General refactoring, including better encapsulation
of channel ownership handling into channel.c
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
There is no host1x_cdma_stop() in the code, let's remove its definition
from the header file.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The struct host1x_cmdbuf is unused, let's remove it.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Check waits in the firewall in a way it is done for relocations.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
If intel_crtc_disable_noatomic() were to ever get called during resume
we'd end up deadlocking since resume has its own acqcuire_ctx but
intel_crtc_disable_noatomic() still tries to use the
mode_config.acquire_ctx. Pass down the correct acquire ctx from the top.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e ("drm/i915: Use atomic helpers for suspend, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-3-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Several channels could be made to write the same unit concurrently via
the SETCLASS opcode, trusting userspace is a bad idea. It should be
possible to drop the per-client channel reservation and add a per-unit
locking by inserting MLOCK's to the command stream to re-allow the
SETCLASS opcode, but it will be much more work. Let's forbid the
unit-unrelated class changes for now.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The RESTART opcode terminates the gather and restarts the CDMA fetching
from a specified word << 2 relative to the CDMA start address. That
shouldn't be allowed to be done by userspace.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Incorrectly shifted relocation address will cause a lower memory
corruption and likely a hang on a write or a read of an arbitrary data
in case of IOMMU absence. As of now, there is no known use for the
address shifting and adding a proper shifts / sizes validation is a much
more work. Let's forbid shifts in the firewall till a proper validation
is implemented.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Perform gathers coping before patching them, so that original gathers are
left untouched. That's not as bad as leaking kernel addresses, but still
doesn't feel right.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
In case of relocations / waitchecks patching failure the jobs pins stay
referenced till DRM file get closed, wasting memory. Add the missed
unpinning.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The commands stream is prepended by the jobs class on the CDMA
submission, so that explicitly setting a module class in the commands
stream isn't necessary. The firewall initializes its class to 0 and the
command stream that doesn't explicitly specify the class effectively
bypasses the firewall.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
On Tegra20 if plane has width or height equal to 0, it will be infinitely
wide or tall. Let's disable the plane if it is invisible on atomic state
committing to fix the issue. The Rockchip DRM driver does the same.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
On Tegra20 an overlay plane should be clipped, otherwise its output is
distorted once plane crosses display boundary.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Pass down the correct acquire context to the pipe A quirk load detect
hack during display resume. Avoids deadlocking the entire thing.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: e2c8b8701e ("drm/i915: Use atomic helpers for suspend, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170601143619.27840-2-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Commit 33a8eb8d40 ("drm/tegra: dc: Implement runtime PM") introduced
HW reset control. It causes a hang on Tegra20 if both display
controllers are utilized (RGB panel and HDMI). The TRM suggests that
each display controller has its own reset control, apparently it is not
correct.
Fixes: 33a8eb8d40 ("drm/tegra: dc: Implement runtime PM")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
In case of invalid syncpoint ID, the host1x_syncpt_get() returns NULL and
none of its users perform a check of the returned pointer later. Let's bail
out until it's too late.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The waitchecks along with multiple syncpoints per submit are not ready
for use yet, let's forbid them for now.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
If commands buffer claims a number of words that is higher than its BO can
fit, a kernel OOPS will be fired on the out-of-bounds BO access. This was
triggered by an opentegra Xorg driver that erroneously pushed too many
commands to the pushbuf.
The CDMA commands buffer address is 4 bytes aligned, so check its
alignment.
The maximum number of the CDMA gather fetches is 16383, add a check for
it.
Add a sanity check for the relocations in a same way.
[ 46.829393] Unable to handle kernel paging request at virtual address f09b2000
...
[<c04a3ba4>] (host1x_job_pin) from [<c04dfcd0>] (tegra_drm_submit+0x474/0x510)
[<c04dfcd0>] (tegra_drm_submit) from [<c04deea0>] (tegra_submit+0x50/0x6c)
[<c04deea0>] (tegra_submit) from [<c04c07c0>] (drm_ioctl+0x1e4/0x3ec)
[<c04c07c0>] (drm_ioctl) from [<c02541a0>] (do_vfs_ioctl+0x9c/0x8e4)
[<c02541a0>] (do_vfs_ioctl) from [<c0254a1c>] (SyS_ioctl+0x34/0x5c)
[<c0254a1c>] (SyS_ioctl) from [<c0107640>] (ret_fast_syscall+0x0/0x3c)
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The client ID 0 is reserved by the host1x/cdma to mark the timeout timer
work as already been scheduled and context ID is used as the clients one.
This fixes spurious CDMA timeouts.
Fixes: bdd2f9cd10 ("drm/tegra: Don't leak kernel pointer to userspace")
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: http://patchwork.freedesktop.org/patch/msgid/9c19a44219acd988e678cf9abe21363911184625.1497480754.git.digetx@gmail.com
If 'devm_reset_control_get' returns an error, then we erroneously return
success because error code is taken from 'host->clk' instead of
'host->rst'.
Fixes: b386c6b73a ("gpu: host1x: Support module reset")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170410202922.17665-1-christophe.jaillet@wanadoo.fr
Improve kerneldoc for the public parts of the host1x infrastructure in
preparation for adding driver-specific part to the GPU documentation.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Currently the vma has one link member that is used for both holding its
place in the execbuf reservation list, and in any eviction list. This
dual property is quite tricky and error prone.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-3-chris@chris-wilson.co.uk
This has the benefit of not requiring us to manipulate the
vma->exec_link list when tearing down the execbuffer, and is a
marginally cheaper test to detect the user error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170615081435.17699-2-chris@chris-wilson.co.uk
Add OA support for Kabylake (pretty much identical to Skylake), and
also add the associated OA configurations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Add macros to detect GT2/GT3 skus so we can apply the proper OA
configuration later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
In earlier iterations of the i915-perf driver we had a number of
callbacks/hooks from other parts of the i915 driver to e.g. notify us
when a legacy context was pinned and these could run asynchronously with
respect to the stream file operations and might also run in atomic
context.
dev_priv->perf.hook_lock had been for serialising access to state needed
within these callbacks, but as the code has evolved some of the hooks
have gone away or are implemented to avoid needing to lock any state.
The remaining use of this lock was actually redundant considering how
the gen7 oacontrol state used to be updated as part of a context pin
hook.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
An oa_exponent_to_ns() utility and per-gen timebase constants where
recently removed when updating the tail pointer race condition WA, and
this restores those so we can update the _PROP_OA_EXPONENT validation
done in read_properties_unlocked() to not assume we have a 12.5MHz
timebase as we did for Haswell.
Accordingly the oa_sample_rate_hard_limit value that's referenced by
proc_dointvec_minmax defining the absolute limit for the OA sampling
frequency is now initialized to (timestamp_frequency / 2) instead of the
6.25MHz constant for Haswell.
v2:
Specify frequency of 19.2MHz for BXT (Ville)
Initialize oa_sample_rate_hard_limit per-gen too (Lionel)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
These are auto generated from an XML description of metric sets,
currently maintained in gputop, ref:
https://github.com/rib/gputop
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py
$ make -C gputop-data -f Makefile.xml
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
share (more-or-less) the same OA unit design.
Of particular note in comparison to Haswell: some OA unit HW config
state has become per-context state and as a consequence it is somewhat
more complicated to manage synchronous state changes from the cpu while
there's no guarantee of what context (if any) is currently actively
running on the gpu.
The periodic sampling frequency which can be particularly useful for
system-wide analysis (as opposed to command stream synchronised
MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
have become per-context save and restored (while the OABUFFER
destination is still a shared, system-wide resource).
This support for gen8+ takes care to consider a number of timing
challenges involved in synchronously updating per-context state
primarily by programming all config state from the cpu and updating all
current and saved contexts synchronously while the OA unit is still
disabled.
The driver intentionally avoids depending on command streamer
programming to update OA state considering the lack of synchronization
between the automatic loading of OACTXCONTROL state (that includes the
periodic sampling state and enable state) on context restore and the
parsing of any general purpose BB the driver can control. I.e. this
implementation is careful to avoid the possibility of a context restore
temporarily enabling any out-of-date periodic sampling state. In
addition to the risk of transiently-out-of-date state being loaded
automatically; there are also internal HW latencies involved in the
loading of MUX configurations which would be difficult to account for
from the command streamer (and we only want to enable the unit when once
the MUX configuration is complete).
Since the Gen8+ OA unit design no longer supports clock gating the unit
off for a single given context (which effectively stopped any progress
of counters while any other context was running) and instead supports
tagging OA reports with a context ID for filtering on the CPU, it means
we can no longer hide the system-wide progress of counters from a
non-privileged application only interested in metrics for its own
context. Although we could theoretically try and subtract the progress
of other contexts before forwarding reports via read() we aren't in a
position to filter reports captured via MI_REPORT_PERF_COUNT commands.
As a result, for Gen8+, we always require the
dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
if not root.
v5: Drain submitted requests when enabling metric set to ensure no
lite-restore erases the context image we just updated (Lionel)
v6: In addition to drain, switch to kernel context & update all
context in place (Chris)
v7: Add missing mutex_unlock() if switching to kernel context fails
(Matthew)
v8: Simplify OA period/flex-eu-counters programming by using the
batchbuffer instead of modifying ctx-image (Lionel)
v9: Back to updating the context image (due to erroneous testing,
batchbuffer programming the OA unit doesn't actually work)
(Lionel)
Pin context before updating context image (Chris)
Drop MMIO programming now that we switch to a kernel context with
right values in initial context image (Chris)
v10: Just pin_map the contexts we want to modify or let the
configuration happen on first use (Chris)
v11: Update kernel context OA config through the batchbuffer rather
than on the fly ctx-image update (Lionel)
v12: Rework OA context registers update again by swithing away from
user contexts and reconfiguring the kernel context through the
batchbuffer and updating all the other contexts' context image.
Also take care to lock slice/subslice configuration when OA is
on. (Lionel)
v13: Request rpcs updates on all engine when updating the OA config
(Lionel)
v14: Drop any kind of rpcs management now that we monitor sseu
configuration changes in a later patch (Lionel)
Remove usleep after programming the NOA configs on Gen8+, this
doesn't seem to be needed (Lionel)
v15: Respect coding style for block comments (Chris)
v16: Add missing i915_add_request() in case we fail to emit OA
configuration (Matthew)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com> \o/
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Adds a static OA unit, MUX, B Counter + Flex EU configurations for basic
render metrics on Broadwell, Cherryview, Skylake and Broxton. These are
auto generated from an XML description of metric sets, currently
maintained in gputop, ref:
https://github.com/rib/gputop
> gputop-data/oa-*.xml
> scripts/i915-perf-kernelgen.py
$ make -C gputop-data -f Makefile.xml WHITELIST=RenderBasic
v2: add newlines to debug messages + fix comment (Matthew Auld)
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Gen8+ might have mux configurations per slices/subslices. Depending on
whether slices/subslices have been fused off, only part of the
configuration needs to be applied. This change reworks the mux
configurations query mechanism to allow more than one set of registers
to be programmed.
v2: s/n_mux_regs/n_mux_configs/ (Matthew)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Assuming a uniform mask across all slices, this enables userspace to
determine the specific sub slices can be enabled. This information is
required, for example, to be able to analyse some OA counter reports
where the counter configuration depends on the HW sub slice
configuration.
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>