Currently we try to reduce the number of synchronisations (now the
number of requests we need to wait upon) by noting that if we have
earlier waited upon a request, all subsequent requests in the timeline
will be after the wait. This only applies to requests in this timeline,
as other timelines will not be ordered by that waiter.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk
Move the actual emission of the breadcrumb for closing the request from
i915_add_request() to the submit callback. (It can be moved later when
required.) This allows us to defer the allocation of the global_seqno
from request construction to actual submission, allowing us to emit the
requests out of order (wrt to the order of their construction, they
still will only be executed one all of their dependencies are resolved
including that all earlier requests on their timeline have been
submitted.) We have to specialise how we then emit the request in order
to write into the preallocated space, rather than at the tail of the
ringbuffer (which will have been advanced by the addition of new
requests).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk
In the next patch, we will use deferred breadcrumb emission. That requires
reserving sufficient space in the ringbuffer to emit the breadcrumb, which
first requires us to know how large the breadcrumb is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk
Now that the emission of the request tail and its submission to hardware
are two separate steps, engine->emit_request() is confusing.
engine->emit_request() is called to emit the breadcrumb commands for the
request into the ring, name it such (engine->emit_breadcrumb).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk
Though we will have multiple timelines, we still have a single timeline
of execution. This we can use to provide an execution and retirement order
of requests. This keeps tracking execution of requests simple, and vital
for preserving a single waiter (i.e. so that we can order the waiters so
that only the earliest to wakeup need be woken). To accomplish this we
distinguish the seqno used to order requests per-context (external) and
that used internally for execution.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk
In future patches, we will no longer be able to wait on a static global
seqno and instead have to break our wait up into phases. First we wait
for the global seqno assignment (upon submission to hardware), and once
submitted we wait for the hardware to complete.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-25-chris@chris-wilson.co.uk
Before suspend, we wait for the switch to the kernel context. In order
for all the other context images to be complete upon suspend, that
switch must be the last operation by the GPU (i.e. this idling request
must not overtake any pending requests). To make this request execute last,
we make it depend on every other inflight request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.
Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
After combining the dma-buf reservation object and the GEM reservation
object, we lost the ability to do a nonblocking wait on the i915 request
(as we blocked upon the reservation object during prepare_fb). We can
instead convert the reservation object into a fence upon which we can
asynchronously wait (including a forced timeout in case the DMA fence is
never signaled).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-22-chris@chris-wilson.co.uk
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
Having moved the locked phase of freeing an object to a separate worker,
we can now declare to the core that we only need the unlocked variant of
driver->gem_free_object, and can use the simple unreference internally.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk
We want to hide the latency of releasing objects and their backing
storage from the submission, so we move the actual free to a worker.
This allows us to switch to struct_mutex freeing of the object in the
next patch.
Furthermore, if we know that the object we are dereferencing remains valid
for the duration of our access, we can forgo the usual synchronisation
barriers and atomic reference counting. To ensure this we defer freeing
an object til after an RCU grace period, such that any lookup of the
object within an RCU read critical section will remain valid until
after we exit that critical section. We also employ this delay for
rate-limiting the serialisation on reallocation - we have to slow down
object creation in order to prevent resource starvation (in particular,
files).
v2: Return early in i915_gem_tiling() ioctl to skip over superfluous
work on error.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk
We only need struct_mutex within pwrite for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk
We only need struct_mutex within pread for a brief window where we need
to serialise with rendering and control our cache domains. Elsewhere we
can rely on the backing storage being pinned, and forgive userspace any
races against us.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk
Break the allocation of the backing storage away from struct_mutex into
a per-object lock. This allows parallel page allocation, provided we can
do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT
fault), i.e. before execbuf! The increased cost of the atomic counters
are hidden behind i915_vma_pin() for the typical case of execbuf, i.e.
as the object is typically bound between execbufs, the page_pin_count is
static. The cost will be felt around set-domain and pwrite, but offset
by the improvement from reduced struct_mutex contention.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk
The plan is to move obj->pages out from under the struct_mutex into its
own per-object lock. We need to prune any assumption of the struct_mutex
from the get_pages/put_pages backends, and to make it easier we pass
around the sg_table to operate on rather than indirectly via the obj.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk
The plan is to make obtaining the backing storage for the object avoid
struct_mutex (i.e. use its own locking). The first step is to update the
API so that normal users only call pin/unpin whilst working on the
backing storage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk
A while ago we switched from a contiguous array of pages into an sglist,
for that was both more convenient for mapping to hardware and avoided
the requirement for a vmalloc array of pages on every object. However,
certain GEM API calls (like pwrite, pread as well as performing
relocations) do desire access to individual struct pages. A quick hack
was to introduce a cache of the last access such that finding the
following page was quick - this works so long as the caller desired
sequential access. Walking backwards, or multiple callers, still hits a
slow linear search for each page. One solution is to store each
successful lookup in a radix tree.
v2: Rewrite building the radixtree for clarity, hopefully.
v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from
within an atomic section and so relax the allocation context to a simple
GFP_KERNEL and mutex.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk
The golden render state is constant, but we recreate the batch setting
it up for every new context. If we keep that batch in a volatile cache
we can safely reuse it whenever we need to initialise a new context. We
mark the pages as purgeable and use the shrinker to recover pages from
the batch whenever we face memory pressues, recreating that batch afresh
on the next new context.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-8-chris@chris-wilson.co.uk
Quite a few of our objects used for internal hardware programming do not
benefit from being swappable or from being zero initialised. As such
they do not benefit from using a shmemfs backing storage and since they
are internal and never directly exposed to the user, we do not need to
worry about providing a filp. For these we can use an
drm_i915_gem_object wrapper around a sg_table of plain struct page. They
are not swap backed and not automatically pinned. If they are reaped
by the shrinker, the pages are released and the contents discarded. For
the internal use case, this is fine as for example, ringbuffers are
pinned from being written by a request to be read by the hardware. Once
they are idle, they can be discarded entirely. As such they are a good
match for execlist ringbuffers and a small variety of other internal
objects.
In the first iteration, this is limited to the scratch batch buffers we
use (for command parsing and state initialisation).
v2: Allocate physically contiguous pages, where possible.
v3: Reduce maximum order on subsequent requests following an allocation
failure.
v4: Fix up mismatch between swiotlb segment size and page count (it
counts in 2k units, not 4k pages)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk
Our low-level wait routine has evolved from our generic wait interface
that handled unlocked, RPS boosting, waits with time tracking. If we
push our GEM fence tracking to use reservation_objects (required for
handling multiple timelines), we lose the ability to pass the required
information down to i915_wait_request(). However, if we push the extra
functionality from i915_wait_request() to the individual callsites
(i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use
of those extras, we can both simplify our low level wait and prepare for
extending the GEM interface for use of reservation_objects.
v2: Rewrite i915_wait_request() kerneldocs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk
We only need the active reference to keep the object alive after the
handle has been deleted (so as to prevent a synchronous gem_close). Why
then pay the price of a kref on every execbuf when we can insert that
final active ref just in time for the handle deletion?
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk
Since we only use the more generic unlocked variant, just rename it as
the normal i915_gem_active_wait(). The temporary cost is that we need to
always acquire the reference in a RCU safe manner, but the benefit is
that we will combine the common paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-5-chris@chris-wilson.co.uk
The throttle-ioctl never touches the struct_mutex. It does, however, as
part of its ABI report whether the hardware is terminally wedged. For
that purposes, it only has to report the current state and not incur the
cost of checking/waiting every invocation, as we do not have to wait for
a reset before waiting on a request to ensure completion (that is baked
into the wait request implementation).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk
In forthcoming patches, we want to be able to dynamically allocate the
wait_queue_t used whilst awaiting. This is more convenient if we extend
the i915_sw_fence_await_sw_fence() to perform the allocation for us if
we pass in a gfp mask as an alternative than a preallocated struct.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-2-chris@chris-wilson.co.uk
We will need to wait on DMA completion (as signaled via struct fence)
before executing our i915_gem_request. Therefore we want to expose a
method for adding the await on the fence itself to the request.
v2: Add a comment detailing a failure to handle a signal-on-any
fence-array.
v3: Pretend that magic numbers don't exist.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk
We are not allowed to touch the GTT entries underneath an atomic section,
as they take a rpm wakelock (which is illegal from atomic context) and
in the near future acquiring the DMA address for a page within an object
may sleep for an allocation. This makes the current shortcircuit in
relocation_iomap() for performing a second relocation on an adjacent page
illegal, and we need to release the atomic iomapping, lookup the DMA,
insert it into the GTT before reentering the atomic iomap section.
As it happens, this is precisely what we do on if we are using an
iomapping over the full object and not just a single page and by
removing the shortcut, we do the right thing.
Fixes: 9c870d0367 ("drm/i915: Use RPM as the barrier for controlling...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This macro's name is a bit misleading; it doesn't actually iterate over
all planes since it omits the cursor plane. Its only uses are in gen9
code which is using it to iterate over the universal planes (which we
treat as primary+sprites); in these cases the legacy cursor registers
are programmed independently if necessary. The macro's iterator value
(0 for primary plane, spritenum+1 for each secondary plane) also isn't
meaningful outside the gen9 context where the hardware considers them to
all be "universal" planes that follow this numbering.
This is just a renaming/clarification patch with no functional change.
However it will make the subsequent patches more clear.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477522291-10874-2-git-send-email-matthew.d.roper@intel.com
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
These static helper functions are required to be used during
fallback link rate implemnetation so they need to be placed at the top
of the file.
v3:
* Add cleanup to other patch (Mika Kahola)
v2:
* Dont move around functions declared in intel_drv.h (Rodrigo Vivi)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Ville Syrjala <ville.syrjala@linux.intel.com>
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477524358-16563-4-git-send-email-manasi.d.navare@intel.com
The port registers related to the phys in broxton map to different
channels and specific phys. Make that mapping explicit.
v2: Pass enum dpio_phy to macros instead of mmio base. (Imre)
v3: Fix typo in macros. (Imre)
v4: Also change variables from u32 to enum dpio_phy. (Imre)
Remove leftovers from previous version. (Imre)
v5: Actually git add the changes.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476863940-6019-1-git-send-email-ander.conselvan.de.oliveira@intel.com
Use struct bxt_ddi_phy_info to hold information of where the Rcomp
resistor is located, instead of hard coding it in the init sequence.
Note that this moves the enabling of the phy with the Rcomp resistor out
of the power well enable code. That should be safe since
bxt_ddi_phy_init() is called while the power domains lock is held, and
that is the only way that function gets called, so there is no
possibility of a concurrent phy enable caused by a power domain get
call.
v2: Replace comment about lock with lockdep_assert_held() (Imre)
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/62d209950ad48484564f3e793cf247cf62572a39.1475770848.git-series.ander.conselvan.de.oliveira@intel.com
Backmerge latest drm-next to pull in the s/fence/dma_fence/ rework,
needed before we merge more i915 fencing patches.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Karol's work which greatly improves volt/clock changes on a
heap of boards, nothing too exciting beyond a random collection of fixes.
* 'linux-4.9' of git://github.com/skeggsb/linux: (33 commits)
drm/nouveau/fb/nv50: defer DMA mapping of scratch page to oneinit() hook
drm/nouveau/fb/gf100: defer DMA mapping of scratch page to oneinit() hook
drm/nouveau/pci: set streaming DMA mask early
drm/nouveau/kms: add Maxwell to backlight initialization
drm/nouveau/bar/nv50: fix bar2 vm size
drm/nouveau/disp: remove unused function in sorg94.c
drm/nouveau/volt: use kernel's 64-bit signed division function
drm/nouveau/core: add missing header dependencies
drm/nouveau/gr/nv3x: add 0x0597 kelvin 3d class support
drm/nouveau/drm/nouveau: add a LED driver for the NVIDIA logo
drm/nouveau/fb/ram: Use Kepler implementation on Maxwell
drm/nouveau/volt: Make use of cvb coefficients
drm/nouveau/volt/gf100-: Add speedo
drm/nouveau/volt: Add implementation for gf100
drm/nouveau/bios/vmap: unk0 field is the mode
drm/nouveau/volt: Don't require perfect fit
drm/nouveau/clk: Allow boosting only when NvBoost is set
drm/nouveau/bios: Add parsing of VPSTATE table
drm/nouveau/clk: Respect voltage limits in nvkm_cstate_prog
drm/nouveau/clk: Fixup cstate selection
...
Pull request already again to get the s/fence/dma_fence/ stuff in and
allow everyone to resync. Otherwise really just misc stuff all over, and a
new bridge driver.
* tag 'topic/drm-misc-2016-10-27' of git://anongit.freedesktop.org/git/drm-intel:
drm/bridge: fix platform_no_drv_owner.cocci warnings
drm/bridge: fix semicolon.cocci warnings
drm: Print some debug/error info during DP dual mode detect
drm: mark drm_of_component_match_add dummy inline
drm/bridge: add Silicon Image SiI8620 driver
dt-bindings: add Silicon Image SiI8620 bridge bindings
video: add header file for Mobile High-Definition Link (MHL) interface
drm: convert DT component matching to component_match_add_release()
dma-buf: Rename struct fence to dma_fence
dma-buf/fence: add an lockdep_assert_held()
drm/dp: Factor out helper to distinguish between branch and sink devices
drm/edid: Only print the bad edid when aborting
drm/msm: add missing header dependencies
drm/msm/adreno: move function declarations to header file
drm/i2c/tda998x: mark symbol static where possible
doc: add missing docbook parameter for fence-array
drm: RIP mode_config->rotation_property
drm/msm/mdp5: Advertize 180 degree rotation
drm/msm/mdp5: Use per-plane rotation property
v2: move return value check as well
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix pm-hibernate bug, when suspend/resume, dpm start failed.
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
gvt-next-2016-10-27
- Resolve current left build issue with ACPI=n and 32bit kernel
- TLB workaround from Arkadiusz
- vGPU reset fix from Ping
- workload scheduler nesting sleep fix from Changbin
- more misc fixes for sparse warnings and cleanups
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
drivers/gpu/drm/bridge/sil-sii8620.c:1556:3-8: No need to set .owner here. The core will do it.
Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
CC: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: http://patchwork.freedesktop.org/patch/msgid/20161026165836.GA98766@lkp-sb04.lkp.intel.com
We cannot use blocking method mutex_lock inside a wait loop.
Here we invoke pick_next_workload() which needs acquire a
mutex in our "condition" experssion. Then we go into a another
of the going-to-sleep sequence and changing the task state.
This is a dangerous. Let's rewrite the wait sequence to avoid
nested sleeping.
v2: fix do...while loop exit condition (zhenyu)
v3: rebase to gvt-staging branch
Signed-off-by: Du, Changbin <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
throw error message in elsp emulation handler basing on execlist
submit result. guest will trigger tdr process for recovering, gvt
just follow guest's desire.
v2: populate error to top of mmio emulation logic, comments from
zhenyu
Signed-off-by: Bing Niu <bing.niu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Full vGPU reset need to release all the shadow PPGGT pages to avoid
unnecessary write-protect and also should re-initialize pvinfo after
resetting vregs to keep pvinfo correct.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently, for display there is only one DMC image for KBL.
Remove the stepping_info table for KBL and use the no_stepping_info
array for loading the firmware.
v2: Removed the block of code as pointed out by Rodrigo to make the
loads as generic as possible.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477355301-7035-1-git-send-email-anusha.srivatsa@intel.com
There's at least one LSPCON device that occasionally returns an unexpected
adaptor ID which leads to a failed detect. Print some debug info to help
debugging this and future cases. Also print an error for an unexpected
adaptor ID, so users can report it.
v2:
- s/adapter/adaptor/ and add code comment about incorrect type 1 adaptor
IDs. (Ville)
Cc: dri-devel@lists.freedesktop.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1477499359-12001-1-git-send-email-imre.deak@intel.com
Pass the framebuffer size in .16 fixed point coordinates to
drm_rect_rotate() since that's what the source coordinates are as well
at this stage. We used to do this part of the computation in integer
coordinates, but that got changed when moving the computation to
happen in the check phase of the operation. Unfortunately I forgot
to shift up the fb width and height appropriately.
With the bogus size we ended up with some negative fb offset, which when
added to the vma offset caused out scanout to start at an offset earlier
than we inteded. Eg. when testing on my SKL I saw a row of incorrect
tiles at the top of my screen.
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Fixes: b63a16f6cd ("drm/i915: Compute display surface offset in the plane check hook for SKL+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477325584-23679-1-git-send-email-ville.syrjala@linux.intel.com
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment mentioned use of intel_uncore_forcewake_irq{unlock, lock}
functions which are nonexistent (and never were).
The description was also incomplete and could cause confusion. Updated
comment is more elaborate on usage and caveats.
v2: mention __locked variant of intel_uncore_forcewake_{get,put} instead
of plain ones
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilsono.c.uk>
[Mika: removed two superfluous lines on comment noted by Chris]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477399682-3133-1-git-send-email-arkadiusz.hiler@intel.com
On my APL the LSPCON firmware resumes in PCON mode as opposed to the
expected LS mode. It also appears to be in a state where AUX DPCD reads
will succeed but return garbage recovering only after a few hundreds of
milliseconds. After the recovery time DPCD reads will result in the
correct values and things will continue to work. If I2C over AUX is
attempted during this recovery time (implying an AUX write transaction)
the firmware won't recover and will stay in this broken state.
As a workaround check if the firmware is in PCON state after resume and
if so wait until the correct DPCD values are returned. For this we
compare the branch descriptor with the one we cached during init time.
If the firmware was in the LS state, we skip the w/a and continue as
before.
v2:
- Use the DP descriptor value cached in intel_dp. (Jani)
- Get to intel_dp using container_of(), instead of a cached ptr.
(Shashank)
- Use usleep_range() instead of msleep().
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98353
Cc: Shashank Sharma <shashank.sharma@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-9-git-send-email-imre.deak@intel.com
We can use the container_of() magic to get to the DDC adapter, so no
need for caching a pointer to it. We'll also need to get at the intel_dp
ptr in the following patch, so add a helper that can be used for both
purposes.
Cc: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-8-git-send-email-imre.deak@intel.com
As for external DP sink and branch devices read and print the DP
descriptor for eDP and LSPCON devices as well to aid debugging.
v2:
- Split out this change to a separate patch. (Jani)
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-7-git-send-email-imre.deak@intel.com
All types of DP devices (eDP, DP sink, DP branch) will fail their probe
if the start of DPCD can't be read. The LSPCON PCON functionality also
depends on accessing this area, so fail the probe if the read fails.
Cc: Shashank Sharma <shashank.sharma@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-6-git-send-email-imre.deak@intel.com
Extend the branch/sink descriptor info with the missing device ID
field. While at it also read out all the descriptor registers in one
transfer and make the debug print more compact.
v2: (Jani)
- Cache the descriptor in intel_dp.
- Split out this change into a separate patch.
v3: (Jani)
- Fix return value check of __intel_dp_read_desc().
- Use %pE instead of %s to print the device ID.
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477401159-15098-1-git-send-email-imre.deak@intel.com
There are two separate sets of DPCD registers for the DP OUI - as well as
for the device ID and HW/SW revision - based on whether the given DP
device is a branch or a sink. Currently we print both branch and sink
OUIs, for consistency print only the one that corresponds to the
probed device.
v2:
- Split out this change into a separate patch. (Jani)
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-4-git-send-email-imre.deak@intel.com
Performing DPCD AUX reads based on debug settings may introduce obscure
bugs in other places that depend on the read being done (or being not
done). To reduce the uncertainty perform the reads unconditionally.
Cc: Mika Kahola <mika.kahola@intel.com>
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-3-git-send-email-imre.deak@intel.com
This check is open-coded in a few places, so it makes sense to simplify
things by having a helper for it similar to the rest of DPCD feature
helpers.
v2: (Jani)
- Move the helper to drm_dp_helper.h.
- Split out this change to a separate patch.
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477326811-30431-2-git-send-email-imre.deak@intel.com
When modeset occurs and the LS_CLK is set to some special values in DP
mode, the N/M need to be set manually if audio is playing. Otherwise the
first several seconds may be silent in audio playback.
The relationship of Maud and Naud is expressed in the following
equation:
Maud/Naud = 512 * fs / f_LS_Clk
Please refer VESA DisplayPort Standard spec for details.
v2 by Jani:
- organize Maud/Naud table according to DP 1.4 spec
- add 64k and 128k audio rates
- update HSW_AUD_M_CTS_ENABLE register when Maud not found
- remove extra checks for port clock
- simplify Maud/Naud lookup
- reset patch author back to Libin
Cc: "Zhang, Keqiao" <keqiao.zhang@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: "Lin, Mengdong" <mengdong.lin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Libin Yang <libin.yang@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477407258-30599-3-git-send-email-jani.nikula@intel.com
The array contains the crtc clock, rely on that. While at it, debug log
the HDMI N value or automatic mode.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: "Lin, Mengdong" <mengdong.lin@intel.com>
Cc: Libin Yang <libin.yang@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477407258-30599-2-git-send-email-jani.nikula@intel.com
Once we've determined that the sink is MST capable we never end up
running through the full detect cycle again, despite getting HPDs.
Fix tht by ripping out the incorrect piece of code responsible.
This got broken when I moved the long HPD handling to the ->detect()
hook, but failed to remove the leftover code.
Cc: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Cc: Rui Tiago Matos <tiagomatos@gmail.com>
Tested-by: Rui Tiago Matos <tiagomatos@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98323
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Tested-by: Kirill A. Shutemov <kirill@shutemov.name>
References: https://bugs.freedesktop.org/show_bug.cgi?id=98306
Fixes: 27d4efc559 ("drm/i915: Move long hpd handling into the hotplug work")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1477057478-29328-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Objects can have multiple VMAs used for display in which
case assertion that objects must not be pinned for display
more times than the current VMA is incorrect.
v2: Commit message update. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 058d88c433 ("drm/i915: Track pinned VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413635-3876-1-git-send-email-tvrtko.ursulin@linux.intel.com
We do not need to set up a fence for the rotated view.
Display does not need it and no one can access it.
v2: Move code to __i915_vma_set_map_and_fenceable. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 05a20d098d ("drm/i915: Move map-and-fenceable tracking to the VMA")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
SiI8620 transmitter converts eTMDS/HDMI signal to MHL 3.0.
It is controlled via I2C bus. Its interaction with other
devices in video pipeline is performed mainly on HW level.
The only interaction it does on device driver level is
filtering-out unsupported video modes, it exposes drm_bridge
interface to perform this operation.
Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Link: http://patchwork.freedesktop.org/patch/msgid/1476085157-5266-1-git-send-email-a.hajda@samsung.com
The current_vgpu will set to NULL after stopping the scheduler when
the reset is triggered by current vgpu, so here need change the
judgement condition for current vgpu detection.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The emulation handler for MMIO GDRST miss vreg write in it, as result
the vreg cannot update correspondingly.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Like other routines, intel_gvt_hypervisor_detect_host returns 0
for success.
Signed-off-by: Xiaoguang Chen <xiaoguang.chen@intel.com>
Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Driver accesses the ringbuffer pages, via GMADR BAR, if the pages are
pinned in mappable aperture portion of GGTT and for ringbuffer pages
allocated from Stolen memory, access can only be done through GMADR BAR.
In case of GuC based submission, updates done in ringbuffer via GMADR
may not get committed to memory by the time the Command streamer starts
reading them, resulting in fetching of stale data.
For Host based submission, such problem is not there as the write to Ring
Tail or ELSP register happens from the Host side prior to submission.
Access to any GFX register from CPU side goes to GTTMMADR BAR and Hw already
enforces the ordering between outstanding GMADR writes & new GTTMADR access.
MMIO writes from GuC side do not go to GTTMMADR BAR as GuC communication to
registers within GT is contained within GT, so ordering is not enforced
resulting in a race, which can manifest in form of a hang.
To ensure the flush of in-flight GMADR writes, a POSTING READ is done to
GuC register prior to doorbell ring.
There is already a similar WA in i915_gem_object_flush_gtt_write_domain(),
which takes care of GMADR writes from User space to GEM buffers, but not the
ringbuffer writes from KMD.
This WA is needed on all recent HW.
v2:
- Use POSTING_READ_FW instead of POSTING_READ as GuC register do not lie
in any forcewake domain range and so the overhead of spinlock & search
in the forcewake table is avoidable. (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1477413323-1880-1-git-send-email-akash.goel@intel.com
This way we can correctly check split VRAM buffers as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This way the driver can decide if it is valuable to evict a BO or not.
The current implementation is added as default to all existing drivers.
v2: fix some typos found during internal testing
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The current default of always using the performance power state leads
to increased power consumption of mobile devices, which have a dedicated
battery power state. Switch between the performance and battery power
state automatically, dpending on the current AC power status, when the
user asked for the balanced power state.
The user can still override this logic by asking for the performance
or battery power state explicitly.
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix trivial spelling mistake cant't -> can't and add KERN_WARNING to
printk messages. Remove redundant spaces before \n too (thanks to
Joe Perches for spotting those).
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get 2 warnings when building kernel with W=1:
drivers/gpu/drm/amd/amdgpu/si.c:908:5: warning: no previous prototype for 'si_pciep_rreg' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/si.c:921:6: warning: no previous prototype for 'si_pciep_wreg' [-Wmissing-prototypes]
In fact, both functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get a few warnings when building kernel with W=1:
drivers/gpu/drm/amd/amdgpu/atombios_crtc.c:38:6: warning: no previous prototype for 'amdgpu_atombios_crtc_overscan_setup' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/dce_v8_0.c:661:6: warning: no previous prototype for 'dce_v8_0_disable_dce' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:40:5: warning: no previous prototype for 'amdgpu_gfx_scratch_get' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:62:6: warning: no previous prototype for 'amdgpu_gfx_scratch_free' [-Wmissing-prototypes]
....
In fact, these functions are declared in
drivers/gpu/drm/amd/amdgpu/atombios_crtc.h
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
drivers/gpu/drm/amd/amdgpu/dce_v8_0.h
drivers/gpu/drm/amd/amdgpu/dce_v10_0.h
drivers/gpu/drm/amd/amdgpu/dce_v11_0.h
drivers/gpu/drm/amd/powerplay/inc/pp_acpi.h.
So this patch adds missing header dependencies.
By the way, this patch changes declaration of amdgpu_gfx_parse_disable_cu()
to subject to its implement, and clean three function declarations
in pp_acpi.h up.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix random CamelCase that has annoyed me for a while.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leftovers from the radeon.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move from asic specific code to common atom code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rather than open coding it.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix the problems with killing VCE sessions in VM mode.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This way we can use parse_cs and still keep VM mode enabled.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
KV/KB/ML was missed these was implemented for other asics.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add the rest of the basic SQ WAVE fields to
finish off the implementation. Eventually,
a separate interface will be needed for GPRs.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move IP version specific code into a callback.
Also add support for gfx7 devices.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add PG lock support as well as bank selection to
the MMIO write function.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Allow any of the se/sh/instance fields to be
specified as a broadcast by submitting 0x3FF.
(v2) Fix broadcast range checking
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On non VI/CZ platforms it would not free
the grbm index lock.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently supports CZ/VI. Allows nearly atomic read
of wave data from GPU.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This makes it easier to replace specific IP blocks on
asics for handling virtual_dce, DAL, etc. and for building
IP lists for hw or tables. This also stored the status
information in the same structure.
v2: split out spelling fix into a separate patch
add a function to add IPs to the list
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Returns the vce clock table for the user mode driver.
The user mode driver can fill this data into vce clock
data packet for optimal VCE DPM.
v2: update to the new API
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used by the powerplay dpm code.
v2: update to the new API
v3: drop old include
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used by the non-powerplay dpm code.
v2: update to the new API
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Will be used by the new info ioctl query.
v2: fetch a single state per request
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
They are constant as well.
v2: update uvd and vce phys ring structures as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's constant, so it doesn't make to much sense to keep it
with the variable data.
v2: update vce and uvd phys mode ring structures as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
I should have suggested that on the initial patchset. This saves us a
few CPU cycles during CS and a bunch of loc.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
sed -i "/\.parse_cs = NULL,/d" drivers/gpu/drm/amd/amdgpu/*.c
That's just a leftover from radeon.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
With the padding raised to 256 DW that shouldn't be needed any more.
v2: reduce estimation as well
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If a ring doesn't support that it shouldn't implement the function.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The same as on windows to avoid further problems with CE/DE
command submission overlaps.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the comment to explain why we do this.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the comment to explain why we do this.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Using the cached values has less latency for bare metal
and SR-IOV, and prevents reading back bogus values if the
engine is powergated.
v2: fix typo in tile idx calculation
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Simplify the code and properly set the csb for harvest values.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Needed when for SR-IOV and when PG is enabled.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We need to cache some additional values to handle SR-IOV
and PG.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Instead of messing with the PD directly.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only cleanup, no intended functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only cleanup, no intended functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Only cleanup, no intended functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Saves us a bit of memory.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Saves a bunch of CPU cycles when swapping things back in and
allows us to split the VM headers into a separate file.
v2: rename parameters
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's completely pointless to have two pointers to the
device in the same structure.
v2: rename function to amdgpu_ttm_adev, fix typos
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tested by reading tile/clk bits during load/idle.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch enables detecting VCE/UVD PG features and fixes the
UVD powergate function.
Tested on a Tonga (by reading UVD tile/clk bits during playback/idle).
Signed-off-by: Tom St Denis <tom.stdenis@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Far less CPU cycles needed for this approach.
v2: fix typo
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for GFX8, gfx ring's wptr_addr is needed by SRIOV & CP for polling.
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
we found some MEC ucode leads to IB test fail or even
ring test fail if Jump Table of it is not start in
FW bo with page aligned address, fixed by always make
JT address page aligned.
we don't need to patch JT2 for MEC2, because for VI,
MEC2 is a copy of MEC1, thus when converting fw_type
for MEC_JT2 we just return MEC1,hw can use the same
JT for both MEC1 & MEC2.
above two change fixed some ring/ib test failure issue
for some version of MEC ucode.
Signed-off-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for sriov, SMC need MEC_STORAGE reserved in fw bo.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Signed-off-by: Frank Min <frank.min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for GTT memory SMC can only access it within PF space, which is not
used for SRIOV case, thus for SRIOV case, we let SMC use FB space for
ucode bo.
Signed-off-by: Frank Min <frank.min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for VI smc, index_0 to index_8 are all not safe,
they may used by BIOS/FW, and index_11 is reserved
only for driver.
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get a few warnings when building kernel with W=1:
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/fiji_smumgr.c:162:5: warning: no previous prototype for 'fiji_setup_pwr_virus' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/fiji_smc.c:2052:5: warning: no previous prototype for 'fiji_program_mem_timing_parameters' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/polaris10_smumgr.c:175:5: warning: no previous prototype for 'polaris10_avfs_event_mgr' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/cz_hwmgr.c:69:10: warning: no previous prototype for 'cz_get_eclk_level' [-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../powerplay/hwmgr/smu7_hwmgr.c:92:26: warning: no previous prototype for 'cast_phw_smu7_power_state' [-Wmissing-prototypes]
....
In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get 4 warnings when building kernel with W=1:
drivers/gpu/drm/radeon/si.c:7850:5: warning: no previous prototype for 'si_vce_send_vcepll_ctlreq' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/radeon_dp_mst.c:226:21: warning: no previous prototype for 'radeon_mst_best_encoder' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/radeon_dp_mst.c:344:26: warning: no previous prototype for 'radeon_mst_find_connector' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/radeon_dp_mst.c:600:6: warning: no previous prototype for 'radeon_dp_mst_encoder_destroy' [-Wmissing-prototypes]
In fact, these functions are only used in the file in which they are
declared and don't need a declaration, but can be made static.
So this patch marks these functions with 'static'.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We get a few warnings when building kernel with W=1:
drivers/gpu/drm/radeon/radeon_clocks.c:35:10: warning: no previous prototype for 'radeon_legacy_get_engine_clock' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/atombios_encoders.c:75:1: warning: no previous prototype for 'atombios_get_backlight_level' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/r600_cs.c:2268:5: warning: no previous prototype for 'r600_cs_parse' [-Wmissing-prototypes]
drivers/gpu/drm/radeon/evergreen_cs.c:2671:5: warning: no previous prototype for 'evergreen_cs_parse' [-Wmissing-prototypes]
....
In fact, these functions are declared
in drivers/gpu/drm/radeon/radeon_asic.h,
so this patch adds missing header dependencies.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split VRAM allocations into 4MB blocks.
v2: fix typo in comment, some suggested cleanups
v3: document how to disable the feature, fix rebase issue
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to move scattered buffers around.
v2: fix a couple of typos, handle scattered to scattered moves as well.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to map scattered VRAM BOs to the VMs.
v2: fix offset handling, use pfn instead of offset,
fix PAGE_SIZE != AMDGPU_GPU_PAGE_SIZE case
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise the new VM code becomes confused.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Split VRAM won't have a valid offset, so just set an explicit limit
when the flag is given to trigger reallocation if necessary.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add a flag noting that a BO must be created using linear VRAM
and set this flag on all in kernel users where appropriate.
Hopefully I haven't missed anything.
v2: add it in a few more places, fix CPU mapping.
v3: rename to VRAM_CONTIGUOUS, fix typo in CS code.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to hard code the entire register to just
set/clear one bit.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use an address offset like other dce code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to hard code the entire register to just
set/clear one bit.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use an address offset like other dce code.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Enable multi crtcs for virtual display, user can set the number of crtcs
by amdgpu module parameter virtual_display.
v2: make timers per crtc
v3: agd: simplify implementation
Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Reviewed-By: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
No need to emulate all of the stuff for real hw.
v2: warning fix
Reviewed-By: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We handle the virtual interrupts from a timer so no
need to try an look like we are handling IV ring events.
Reviewed-By: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Virtual crtcs interrupts do not show up in the IV ring,
so it will never be called.
Reviewed-By: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Never used.
Reviewed-By: Emily Deng <Emily.Deng@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Convert DT component matching to use component_match_add_release().
Acked-by: Jyri Sarha <jsarha@ti.com>
Reviewed-by: Jyri Sarha <jsarha@ti.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: http://patchwork.freedesktop.org/patch/msgid/E1bwo6l-0005Io-Q1@rmk-PC.armlinux.org.uk
As well as knowing when the error occurred, it is more interesting to me
to know how long after booting the error occurred, and for good measure
record the time since last hw initialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025121602.1457-1-chris@chris-wilson.co.uk
The GuC log buffer flush work item has to do a register access to send the
ack to GuC and this work item, if not synced before suspend, can potentially
get executed after the GFX device is suspended. This work item function uses
rpm get/put calls around the Hw access, which covers the rpm suspend case
but for system suspend a sync would be required as kernel can potentially
schedule the work items even after some devices, including GFX, have been
put to suspend. But sync has to be done only for the system suspend case,
as sync along with rpm get/put can cause a deadlock for rpm suspend path.
To have the sync, but like a NOOP, for rpm suspend path also this work
item could have been queued from the irq handler only when the device is
runtime active & kept active while that work item is pending or getting
executed but an interrupt can come even after the device is out of use and
so can potentially lead to missing of this work item.
By marking the workqueue, dedicated for handling GuC log buffer flush
interrupts, as freezable we don't have to bother about flushing of this
work item from the suspend hooks, the pending work item if any will be
either executed before the suspend or scheduled later on resume. This way
the handling of log buffer flush work item can be kept same between system
suspend & rpm suspend.
Suggested-by: Imre Deak <imre.deak@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
As per the current i915 Driver load sequence, debugfs registration is done
at the end and so the relay channel debugfs file is also created after that
but the GuC firmware is loaded much earlier in the sequence.
As a result Driver could miss capturing the boot-time logs of GuC firmware
if there are flush interrupts from the GuC side.
Relay has a provision to support early logging where initially only relay
channel can be created, to have buffers for storing logs, and later on
channel can be associated with a debugfs file at appropriate time.
Have availed that, which allows Driver to capture boot time logs also,
which can be collected once Userspace comes up.
v2:
- Remove the couple of FIXMEs, as now the relay channel will be created
early before enabling the flush interrupts, so no possibility of relay
channel pointer being modified & read at the same time from 2 different
execution contexts.
- Rebase.
v3:
- Add a comment to justiy setting 'is_global' before the NULL check on the
parent directory dentry pointer.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
To ensure that we always get the up-to-date data from log buffer, its
better to access the buffer through an uncached CPU mapping. Also the way
buffer is accessed from GuC & Host side, manually doing cache flush may
not be effective always if cached CPU mapping is used. In order to avoid
any performance drop & have fast reads from the GuC log buffer, used SSE4.1
movntdqa based memcpy function i915_memcpy_from_wc, as copying using
movntqda from WC type memory is almost as fast as reading from WB memory.
This way log buffer sampling time will not get increased and so would be
able to deal with the flush interrupt storm when GuC is generating logs at
a very high rate.
Ideally SSE 4.1 should be present on all chipsets supporting GuC based
submisssions, but if not then logging will not be enabled.
v2: Rebase.
v3: Squash the WC type vmalloc mapping patch with this patch. (Chris)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This patch provides debugfs interface i915_guc_output_control for
on the fly enabling/disabling of logging in GuC firmware and controlling
the verbosity level of logs.
The value written to the file, should have bit 0 set to enable logging and
bits 4-7 should contain the verbosity info.
v2: Add a forceful flush, to collect left over logs, on disabling logging.
Useful for Validation.
v3: Besides minor cleanup, implement read method for the debugfs file and
set the guc_log_level to -1 when logging is disabled. (Tvrtko)
v4: Minor cleanup & rebase. (Tvrtko)
v5:
- Lock struct_mutex after the NULL check for guc log buffer vma. (Chris)
- Rebase.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends a flush interrupt to Host when the log buffer is half
full and at that time only it updates the log buffer state.
But in certain cases, as described below, it could be useful to have all
that even when log buffer is only partially full. For that there is a force
log buffer flush Host2GuC action supported by GuC firmware.
For Validation requirements, a forceful flush is needed to collect the
left over logs on disabling logging. The same can be done before proceeding
with GPU/GuC reset as there could be some data in log buffer which is yet
to be captured and those logs would be particularly useful to understand
that why the reset was initiated.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Added the dump of GuC log buffer to i915 error state, as the contents of
GuC log buffer would also be useful to determine that why the GPU reset
was triggered.
v2:
- For uniformity use existing helper function print_error_obj() to
dump out contents of GuC log buffer, pretty printing is better left
to userspace. (Chris)
- Skip the dumping of GuC log buffer when logging is disabled as it
won't be of any use.
- Rebase.
v3: Rebase.
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
In cases where GuC generate logs at a very high rate, correspondingly
the rate of flush interrupts is also very high.
So far total 8 pages were allocated for storing both ISR & DPC logs.
As per the half-full draining protocol followed by GuC, by doubling
the number of pages, the frequency of flush interrupts can be cut down
to almost half, which then helps in reducing the logging overhead.
So now allocating 8 pages apiece for ISR & DPC logs.
This also helps in reducing the output log file size, apart from
reducing the flush interrupt count. With the original settings,
44 KB was needed for one snapshot. With modified settings, 76 KB is
needed for a snapshot which will be equivalent to 2 snapshots of the
original setting. So 12KB saving, every 88 KB, over the original setting.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends an interrupt to flush the log buffer when it becomes
half full, so Driver doesn't really need to sample the complete buffer
and can just copy only the newly written data by GuC into the local
buffer, i.e. as per the read & write pointer values.
Moreover the flush interrupt would generally come for one type of log
buffer, when it becomes half full, so at that time the other 2 types of
log buffer would comparatively have much lesser unread data in them.
In case of overflow reported by GuC, Driver do need to copy the entire
buffer as the whole buffer would contain the unread data.
v2: Rebase.
v3: Fix the blooper of doing the copy twice. (Tvrtko)
v4: Add curlies for 'else' case also, matching the 'if'. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC firmware sends an interrupt to flush the log buffer when it
becomes half full. GuC firmware also tracks how many times the
buffer overflowed.
It would be useful to maintain a statistics of how many flush
interrupts were received and for which type of log buffer,
along with the overflow count of each buffer type.
Augmented i915_log_info debugfs to report back these statistics.
v2:
- Update the logic to detect multiple overflows between the 2
flush interrupts and also log a message for overflow (Tvrtko)
- Track the number of times there was no free sub buffer to capture
the GuC log buffer. (Tvrtko)
v3:
- Fix the printf field width for overflow counter, set it to 10 as per the
max value of u32, which takes 10 digits in decimal form. (Tvrtko)
v4:
- Move the log buffer overflow handling to a new function for better
readability. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
With the addition of new Host2GuC actions related to GuC logging, there
is a need of a lock to serialize them, as they can execute concurrently
with each other and also with other existing actions.
v2: Use mutex in place of spinlock to serialize, as sleep can happen
while waiting for the action's response from GuC. (Tvrtko)
v3: To conform to the general rules, acquire mutex before taking the
forcewake. (Tvrtko)
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Added a new debugfs interface '/sys/kernel/debug/dri/guc_log' for the
User to capture GuC firmware logs. Availed relay framework to implement
the interface, where Driver will have to just use a relay API to store
snapshots of the GuC log buffer in the buffer managed by relay.
The snapshot will be taken when GuC firmware sends a log buffer flush
interrupt and up to four snapshots could be stored in the relay buffer.
The relay buffer will be operated in a mode where it will overwrite the
data not yet collected by User.
Besides mmap method, through which User can directly access the relay
buffer contents, relay also supports the 'poll' method. Through the 'poll'
call on log file, User can come to know whenever a new snapshot of the
log buffer is taken by Driver, so can run in tandem with the Driver and
capture the logs in a sustained/streaming manner, without any loss of data.
v2: Defer the creation of relay channel & associated debugfs file, as
debugfs setup is now done at the end of i915 Driver load. (Chris)
v3:
- Switch to no-overwrite mode for relay.
- Fix the relay sub buffer switching sequence.
v4:
- Update i915 Kconfig to select RELAY config. (TvrtKo)
- Log a message when there is no sub buffer available to capture
the GuC log buffer. (Tvrtko)
- Increase the number of relay sub buffers to 8 from 4, to have
sufficient buffering for boot time logs
v5:
- Fix the alignment, indentation issues and some minor cleanup. (Tvrtko)
- Update the comment to elaborate on why a relay channel has to be
associated with the debugfs file. (Tvrtko)
v6:
- Move the write to 'is_global' after the NULL check on parent directory
dentry pointer. (Tvrtko)
v7: Add a BUG_ON to validate relay buffer allocation size. (Chris)
Testcase: igt/tools/intel_guc_logger
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Sourab Gupta <sourab.gupta@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
GuC ukernel sends an interrupt to Host to flush the log buffer
and expects Host to correspondingly update the read pointer
information in the state structure, once it has consumed the
log buffer contents by copying them to a file or buffer.
Even if Host couldn't copy the contents, it can still update the
read pointer so that logging state is not disturbed on GuC side.
v2:
- Use a dedicated workqueue for handling flush interrupt. (Tvrtko)
- Reduce the overall log buffer copying time by skipping the copy of
crash buffer area for regular cases and copying only the state
structure data in first page.
v3:
- Create a vmalloc mapping of log buffer. (Chris)
- Cover the flush acknowledgment under rpm get & put.(Chris)
- Revert the change of skipping the copy of crash dump area, as
not really needed, will be covered by subsequent patch.
v4:
- Destroy the wq under the same condition in which it was created,
pass dev_piv pointer instead of dev to newly added GuC function,
add more comments & rename variable for clarity. (Tvrtko)
v5:
- Allocate & destroy the dedicated wq, for handling flush interrupt,
from the setup/teardown routines of GuC logging. (Chris)
- Validate the log buffer size value retrieved from state structure
and do some minor cleanup. (Tvrtko)
- Fix error/warnings reported by checkpatch. (Tvrtko)
- Rebase.
v6:
- Remove the interrupts_enabled check from guc_capture_logs_work, need
to process that last work item also, queued just before disabling the
interrupt as log buffer flush interrupt handling is a bit different
case where GuC is actually expecting an ACK from host, which should be
provided to keep the logging going.
Sync against the work will be done by caller disabling the interrupt.
- Don't sample the log buffer size value from state structure, directly
use the expected value to move the pointer & do the copy and that cannot
go wrong (out of bounds) as Driver only allocated the log buffer and the
relay buffers. Driver should refrain from interpreting the log packet,
as much possible and let Userspace parser detect the anomaly. (Chris)
v7:
- Use switch statement instead of 'if else' for retrieving the GuC log
buffer size. (Tvrtko)
- Refactored the log buffer copying function and shortended the name of
couple of variables for better readability. (Tvrtko)
v8:
- Make the dedicated wq as a high priority one to further reduce the
turnaround time of handing log buffer flush event from GuC.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
There are certain types of interrupts which Host can receive from GuC.
GuC ukernel sends an interrupt to Host for certain events, like for
example retrieve/consume the logs generated by ukernel.
This patch adds support to receive interrupts from GuC but currently
enables & partially handles only the interrupt sent by GuC ukernel.
Future patches will add support for handling other interrupt types.
v2:
- Use common low level routines for PM IER/IIR programming (Chris)
- Rename interrupt functions to gen9_xxx from gen8_xxx (Chris)
- Replace disabling of wake ref asserts with rpm get/put (Chris)
v3:
- Update comments for more clarity. (Tvrtko)
- Remove the masking of GuC interrupt, which was kept masked till the
start of bottom half, its not really needed as there is only a
single instance of work item & wq is ordered. (Tvrtko)
v4:
- Rebase.
- Rename guc_events to pm_guc_events so as to be indicative of the
register/control block it is associated with. (Chris)
- Add handling for back to back log buffer flush interrupts.
v5:
- Move the read & clearing of register, containing Guc2Host message
bits, outside the irq spinlock. (Tvrtko)
v6:
- Move the log buffer flush interrupt related stuff to the following
patch so as to do only generic bits in this patch. (Tvrtko)
- Rebase.
v7:
- Remove the interrupts_enabled check from gen9_guc_irq_handler, want to
process that last interrupt also before disabling the interrupt, sync
against the work queued by irq handler will be done by caller disabling
the interrupt.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
So far PM IER/IIR/IMR registers were being used only for Turbo related
interrupts. But interrupts coming from GuC also use the same set.
As a precursor to supporting GuC interrupts, added new low level routines
so as to allow sharing the programming of PM IER/IIR/IMR registers between
Turbo & GuC.
Also similar to PM IMR, maintaining a bitmask for PM IER register, to allow
easy sharing of it between Turbo & GuC without involving a rmw operation.
v2:
- For appropriateness & avoid any ambiguity, rename old functions
enable/disable pm_irq to mask/unmask pm_irq and rename new functions
enable/disable pm_interrupts to enable/disable pm_irq. (Tvrtko)
- Use u32 in place of uint32_t. (Tvrtko)
v3:
- Rename the fields pm_irq_mask & pm_ier_mask and do some cleanup. (Chris)
- Rebase.
v4: Fix the inadvertent disabling of User interrupt for VECS ring causing
failure for certain IGTs.
v5: Use dev_priv with HAS_VEBOX macro. (Tvrtko)
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
So far there were 2 fields related to GuC logs in 'intel_guc' structure.
For the support of capturing GuC logs & storing them in a local buffer,
multiple new fields would have to be added. This warrants a separate
structure to contain the fields related to GuC logging state.
Added a new structure 'intel_guc_log' and instance of it inside
'intel_guc' structure.
v2: Rebase.
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
The first page of the GuC log buffer contains state info or meta data
which is required to parse the logs contained in the subsequent pages.
The structure representing the state info is added to interface file
as Driver would need to handle log buffer flush interrupts from GuC.
Added an enum for the different message/event types that can be send
by the GuC ukernel to Host.
Also added 2 new Host to GuC action types to inform GuC when Host has
flushed the log buffer and forcefuly cause the GuC to send a new
log buffer flush interrupt.
v2:
- Make documentation of log buffer state structure more elaborate &
rename LOGBUFFERFLUSH action to LOG_BUFFER_FLUSH for consistency.(Tvrtko)
v3: Add GuC log buffer layout diagram for more clarity.
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>