If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!
A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.
v2: Tvrtko dug into cond_resched() on x86 and found that it only
depended upon preempt_count and not tif_need_resched() - which means
that we would always call schedule() at that point.
Fixes: c81d46138d ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170404120531.10737-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
If the signal to park arrives before we sleep, then we need to check
kthread_should_park() before sleeping to avoid missing the signal.
Otherwise, if the signal arrives whilst we are processing completed
requests, we will reset the current->state back to TASK_INTERRUPTIBLE
and so miss the wakeup.
Fixes: fe3288b5da ("drm/i915: Park the breadcrumbs signaler across a GPU reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403105124.8969-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Rather than call intel_engine_cleanup() with a partially constructed
engine, unwind the error during intel_init_ring_common().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Or rather it is used only by intel_ring_pin() to extract the
drm_i915_private which we can easily pass in. As this is a relatively
rare operation, save the space in the struct, and as such it is even
break even in the extra code for passing around the parameter:
add/remove: 0/0 grow/shrink: 2/3 up/down: 15/-15 (0)
function old new delta
intel_init_ring_buffer 906 918 +12
execlists_context_pin 1308 1311 +3
mock_engine 407 403 -4
intel_engine_create_ring 367 363 -4
intel_ring_pin 326 319 -7
Total: Before=1261794, After=1261794, chg +0.00%
v2: Reorder intel_init_ring_buffer to keep the ring setup together:
add/remove: 0/0 grow/shrink: 2/3 up/down: 9/-15 (-6)
function old new delta
intel_init_ring_buffer 906 912 +6
execlists_context_pin 1308 1311 +3
mock_engine 407 403 -4
intel_engine_create_ring 367 363 -4
intel_ring_pin 326 319 -7
Total: Before=1261794, After=1261788, chg -0.00%
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170403113426.25707-1-chris@chris-wilson.co.uk
commit 8490ae207f ("drm/i915: Suppress busy status for engines if
wedged") moved the check for inflight requests to the
intel_engines_are_idle() check to protect the idle worker. However, the
request selftests were also checking the engine idle status and erroring
out if they did not become idle within a short period of time after the
final wait. In order to accommodate the new check, call retire requests
prior to the engine check so that we flush all the waits.
Fixes: 8490ae207f ("drm/i915: Suppress busy status for engines if wedged")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170331192121.10024-1-chris@chris-wilson.co.uk
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
We can rely on compiler to notify us if we miss any case.
This approach may also reduce driver size (reported ~4K).
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170331102652.177664-1-michal.wajdeczko@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We can merge the pair of loops over the engines and their timelines into
a single loop, making it easier to read and more consistent with the
commentary.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-6-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Having added the wait upon each engine to idle into the central
i915_gem_wait_for_idle(), we can remove the now redundant wait from
reset_all_global_seqno(). This has the advantage of removing the late
detection of an error (an engine still busy) which left the seqno reset
only partially complete (though it should be safe enough!).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-5-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Make i915_gem_wait_for_idle() be a little heavier in order to try and
guarantee that the GPU is indeed idle (by checking each engine
individually is idle, i.e. all writes are complete and the rings
stopped) after waiting for in-flight requests to be completed.
v2: And return the final error.
v3: Break the wait_for() out from under the WARN -- the macro expansion
is hideous and unreadable in the warning message
v4: If wait_for_engine() fails the result is catastrophic, mark the
device as wedged and wait for the repair team.
References: https://bugs.freedesktop.org/show_bug.cgi?id=98836
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-4-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As we now distinguish everywhere that can call
i915_gem_retire_requests() following a successful wait_for_idle, we can
remove the duplication by moving that call into i915_gem_wait_for_idle()
itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
There is no reason to separately check for valid fw path before
we try to fetch it. Let the fetch function take care of this.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330112115.120240-2-michal.wajdeczko@intel.com
This function is no longer used. Its functionality is covered
by intel_uc_fini_fw().
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cleanups of uc firmware structs from GuC and Huc are the same for both.
Move common code to the helper function to avoid duplication.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Some of the DRM_NOTE messages are just using "uC" without specifying
which uc they are related to. We can be more user friendly.
v2: moved to the header (Joonas)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Add description for existing parameter 'pipe' to fix the build
warning: ./drivers/gpu/drm/i915/intel_lpe_audio.c:342: warning: No
description found for parameter 'pipe'.
Signed-off-by: Tamara Diaconita <diaconita.tamara@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330115510.14054-1-diaconita.tamara@gmail.com
If the driver is wedged, HW state may be very inconsistent and
report that it is still busy, even though we have stopped using it. This
can lead to a double *ERROR* rather than a graceful cleanup after
wedging.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-2-chris@chris-wilson.co.uk
As we declare an engine as wedged, we mark all of its active requests as
in error. However, we don't want to mark successfully completed requests
as in error, which requires us to retire those requests first.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330145041.9005-1-chris@chris-wilson.co.uk
We pretty print the name of an engine in several places, mostly for
debug, but also in the GPU hang report. Using "ring" in the name is
archaic (we call those engines now to differentiate them from the
multiple rings of commands we execute on each engine), quite verbose and
often tautological. We run out of room in our GPU hang report for
instance if we have more than a couple of engines hung simultaneously.
Bit the bullet and update the strings to reflect the common internal names.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170330134820.12273-1-chris@chris-wilson.co.uk
Michał Winiarski pointed out that the debugging infrastructure (such as
trace_dma_fence_release) likes to pretty print the timeline name, long
after we have freed the timeline. Our timelines currently live as part of
the GTT (due to the strict ordering we currently use through each) which
belong to the context. We aim to free the context and release its
hardware resources as soon as we able to (i.e. when the last
fence/request using it has been signaled and retired). As the
.get_timeline_name is purely a debug feature, rather than extending the
lifetime of the context, or splitting it into many different release
phases just to keep the name around, replace the timeline name with a
constant after the fence has been signaled. This avoids the potential
use-after-free.
Reported-by: Krzysztof Olinski <krzysztof.e.olinski@intel.com>
Fixes: 80b204bce8 ("drm/i915: Enable multiple timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170330111614.29757-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Since commit 1233e2db19 ("drm/i915: Move object backing storage
manipulation to its own locking"), i915_gem_object_put_pages() and
specifically the i915_gem_gtt_finish_pages() may be called from outside
of the struct_mutex and so we can no longer pass I915_WAIT_LOCKED to
i915_gem_wait_for_idle.
Fixes: 1233e2db19 ("drm/i915: Move object backing storage manipulation to its own locking")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170330085341.20311-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The should happen as soon as possible, but always within the logic that
depends on it (and not interrupting the top-level driver control flow).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1490720027-23234-1-git-send-email-oscar.mateo@intel.com
The timeslice usage will determine vGPU whether has chance to
schedule or not at every vGPU switch checkpoint.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
vGPU resource is allocated by scheduler. To account for non-allocated
free cycles, we create an idle vGPU as the placeholder similar to idle task
concept, which is useful to handle some corner cases in scheduling policy.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This method tries to guarantee precision in second level, with the
adjustment conducted in every 100ms. At the end of each vGPU switch
calculate the sched time and subtract it from the time slice
allocated; the allocated time slice for every 100ms together with
remaining timeslice, will be used to decide how much timeslice
allocated to this vGPU in the next 100ms slice, with the end goal
to guarantee weight ratio in second level.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The weight defines proportional control of physical GPU resource
shared between vGPUs. So far the weight is tied to a specific vGPU
type, i.e when creating multiple vGPUs with different types, they
will inherit different weights.
e.g. The weight of type GVTg_V5_2 is 8, the weight of type GVTg_V5_4
is 4, so vGPU of type GVTg_V5_2 has double vGPU resource of vGPU type
GVTg_V5_4.
TODO: allow user control the weight setting in the future.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Factor out the scheduler to a more clear structure, the basic
logic is to find out next vGPU first and then schedule it.
vGPUs were ordered in a LRU list, scheduler scan from the LRU
list head and choose the first vGPU who has pending workload.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Add some statistic routine to collect the time when vGPU is
scheduled in/out and the time of the last ctx submission.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Currently the scheduler is triggered by delayed_work, which doesn't
provide precision at microsecond level. Move to hrtimer instead for
more accurate control.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The variable info is never NULL, which is checked by the caller. This
patch removes the redundant info NULL check logic.
Fixes: 695fbc08d8 ("drm/i915/gvt: replace the gvt_err with gvt_vgpu_err")
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
From commit d1a513be1f ("drm/i915/gvt: add resolution definition for vGPU
type"), small type has been restricted to small resolution, so not
require larger high GM size any more. Change to smaller 384M for more
VM creation with vGPU enabled which still perform reasonable workload.
Fixes: d1a513be1f ("drm/i915/gvt: add resolution definition for vGPU type")
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This reverts commit 6c943de668 ("drm/i915: Skip execlists_dequeue()
early if the list is empty").
The validity of using READ_ONCE there depends upon having a mb to
coordinate the assignment of engine->execlist_first inside
submit_request() and checking prior to taking the spinlock in
execlists_dequeue(). We wrote "the update to TASKLET_SCHED incurs a
memory barrier making this cross-cpu checking safe", but failed to
notice that this mb was *conditional* on the execlists being ready, i.e.
there wasn't the required mb when it was most necessary!
We could install an unconditional memory barrier to fixup the
READ_ONCE():
diff --git a/drivers/gpu/drm/i915/intel_lrc.c
b/drivers/gpu/drm/i915/intel_lrc.c
index 7dd732cb9f57..1ed164b16d44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -616,6 +616,7 @@ static void execlists_submit_request(struct
drm_i915_gem_request *request)
if (insert_request(&request->priotree, &engine->execlist_queue))
{
engine->execlist_first = &request->priotree.node;
+ smp_wmb();
if (execlists_elsp_ready(engine))
But we have opted to remove the race as it should be rarely effective,
and saves us having to explain the necessary memory barriers which we
quite clearly failed at.
Reported-and-tested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Fixes: 6c943de668 ("drm/i915: Skip execlists_dequeue() early if the list is empty")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170329100052.29505-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Unlocking is dangerous. In this case we combine an early update to the
out-of-queue request, because we know that it will be inserted into the
correct FIFO priority-ordered slot when it becomes ready in the future.
However, given sufficient enthusiasm, it may become ready as we are
continuing to reschedule, and so may gazump the FIFO if we have since
dropped its spinlock. The result is that it may be executed too early,
before its dependencies.
v2: Move all work into the second phase over the topological sort. This
removes the shortcut on the out-of-rbtree request to ensure that we only
adjust its priority after adjusting all of its dependencies.
Fixes: 20311bd350 ("drm/i915/scheduler: Execute requests in order of priorities")
Testcase: igt/gem_exec_whisper
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Link: http://patchwork.freedesktop.org/patch/msgid/20170327202143.7972-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
We can't sometimes use these macros in other headers due to
include and definition order. As i915_utils.h already contains
other helper macros move these macros there.
v2: checkpatch cleanup for WARN() macro.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170328084513.174200-1-michal.wajdeczko@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_shadow_wa_ctx is a field of intel_vgpu_workload. container_of() can
be used to refine the relation-ship between intel_shadow_wa_ctx and
intel_vgpu_workload. This patch removes the useless dereference.
v2. add "drm/i915/gvt" prefix. (Zhenyu)
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Turn on KBL WS platform support in gvt-g. More platforms would be
enabled, after validate.
Signed-off-by: Xu Han <xu.han@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This adds initial attribute group for mdev to hold vGPU related
for each mdev device, currently just vGPU id is shown.
v2: rename group name as "intel_vgpu"
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
GVT-g will emulate a fixed DPCD data to VM for DP/eDP panel. Update
this data to latest DP1.2 with the maximum lane bandwidth of 5.4G/s
to support 4K resolution in VM.
V3: modify patch comment
V2: add inline comment to describe the dpcd_fix_data.
Signed-off-by: Pei Zhang <pei.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>