2017-02-14 00:15:21 +07:00
|
|
|
/*
|
|
|
|
* Copyright © 2016 Intel Corporation
|
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2017-02-14 00:15:24 +07:00
|
|
|
#include <linux/prime_numbers.h>
|
|
|
|
|
2017-02-14 00:15:21 +07:00
|
|
|
#include "../i915_selftest.h"
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
#include "i915_random.h"
|
2019-01-22 05:20:47 +07:00
|
|
|
#include "igt_live_test.h"
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
#include "lib_sw_fence.h"
|
2017-02-14 00:15:21 +07:00
|
|
|
|
2017-02-23 14:44:18 +07:00
|
|
|
#include "mock_context.h"
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
#include "mock_drm.h"
|
2017-02-14 00:15:21 +07:00
|
|
|
#include "mock_gem_device.h"
|
|
|
|
|
|
|
|
static int igt_add_request(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request;
|
2017-02-14 00:15:21 +07:00
|
|
|
int err = -ENOMEM;
|
|
|
|
|
|
|
|
/* Basic preliminary test to create a request and let it loose! */
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-03-06 01:03:30 +07:00
|
|
|
request = mock_request(i915->engine[RCS0],
|
2017-02-14 00:15:21 +07:00
|
|
|
i915->kernel_context,
|
|
|
|
HZ / 10);
|
|
|
|
if (!request)
|
|
|
|
goto out_unlock;
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(request);
|
2017-02-14 00:15:21 +07:00
|
|
|
|
|
|
|
err = 0;
|
|
|
|
out_unlock:
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 00:15:22 +07:00
|
|
|
static int igt_wait_request(void *arg)
|
|
|
|
{
|
|
|
|
const long T = HZ / 4;
|
|
|
|
struct drm_i915_private *i915 = arg;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request;
|
2017-02-14 00:15:22 +07:00
|
|
|
int err = -EINVAL;
|
|
|
|
|
|
|
|
/* Submit a request, then wait upon it */
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-03-06 01:03:30 +07:00
|
|
|
request = mock_request(i915->engine[RCS0], i915->kernel_context, T);
|
2017-02-14 00:15:22 +07:00
|
|
|
if (!request) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, 0) != -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait (busy query) succeeded (expected timeout before submit!)\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, T) != -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait succeeded (expected timeout before submit!)\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_completed(request)) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request completed before submit!!\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(request);
|
2017-02-14 00:15:22 +07:00
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, 0) != -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait (busy query) succeeded (expected timeout after submit!)\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_completed(request)) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request completed immediately!\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, T / 2) != -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait succeeded (expected timeout!)\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, T) == -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait timed out!\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (!i915_request_completed(request)) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request not complete after waiting!\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(request, I915_WAIT_LOCKED, T) == -ETIME) {
|
2017-02-14 00:15:22 +07:00
|
|
|
pr_err("request wait timed out when already complete!\n");
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = 0;
|
|
|
|
out_unlock:
|
|
|
|
mock_device_flush(i915);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 00:15:23 +07:00
|
|
|
static int igt_fence_wait(void *arg)
|
|
|
|
{
|
|
|
|
const long T = HZ / 4;
|
|
|
|
struct drm_i915_private *i915 = arg;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request;
|
2017-02-14 00:15:23 +07:00
|
|
|
int err = -EINVAL;
|
|
|
|
|
|
|
|
/* Submit a request, treat it as a fence and wait upon it */
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-03-06 01:03:30 +07:00
|
|
|
request = mock_request(i915->engine[RCS0], i915->kernel_context, T);
|
2017-02-14 00:15:23 +07:00
|
|
|
if (!request) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_locked;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dma_fence_wait_timeout(&request->fence, false, T) != -ETIME) {
|
|
|
|
pr_err("fence wait success before submit (expected timeout)!\n");
|
2019-03-01 18:05:44 +07:00
|
|
|
goto out_locked;
|
2017-02-14 00:15:23 +07:00
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(request);
|
2017-02-14 00:15:23 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
if (dma_fence_is_signaled(&request->fence)) {
|
|
|
|
pr_err("fence signaled immediately!\n");
|
|
|
|
goto out_device;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dma_fence_wait_timeout(&request->fence, false, T / 2) != -ETIME) {
|
|
|
|
pr_err("fence wait success after submit (expected timeout)!\n");
|
|
|
|
goto out_device;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dma_fence_wait_timeout(&request->fence, false, T) <= 0) {
|
|
|
|
pr_err("fence wait timed out (expected success)!\n");
|
|
|
|
goto out_device;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!dma_fence_is_signaled(&request->fence)) {
|
|
|
|
pr_err("fence unsignaled after waiting!\n");
|
|
|
|
goto out_device;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (dma_fence_wait_timeout(&request->fence, false, T) <= 0) {
|
|
|
|
pr_err("fence wait timed out when complete (expected success)!\n");
|
|
|
|
goto out_device;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = 0;
|
|
|
|
out_device:
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
out_locked:
|
|
|
|
mock_device_flush(i915);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-23 14:44:18 +07:00
|
|
|
static int igt_request_rewind(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request, *vip;
|
2017-02-23 14:44:18 +07:00
|
|
|
struct i915_gem_context *ctx[2];
|
|
|
|
int err = -EINVAL;
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
ctx[0] = mock_context(i915, "A");
|
2019-03-06 01:03:30 +07:00
|
|
|
request = mock_request(i915->engine[RCS0], ctx[0], 2 * HZ);
|
2017-02-23 14:44:18 +07:00
|
|
|
if (!request) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto err_context_0;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_get(request);
|
|
|
|
i915_request_add(request);
|
2017-02-23 14:44:18 +07:00
|
|
|
|
|
|
|
ctx[1] = mock_context(i915, "B");
|
2019-03-06 01:03:30 +07:00
|
|
|
vip = mock_request(i915->engine[RCS0], ctx[1], 0);
|
2017-02-23 14:44:18 +07:00
|
|
|
if (!vip) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto err_context_1;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Simulate preemption by manual reordering */
|
|
|
|
if (!mock_cancel_request(request)) {
|
|
|
|
pr_err("failed to cancel request (already executed)!\n");
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(vip);
|
2017-02-23 14:44:18 +07:00
|
|
|
goto err_context_1;
|
|
|
|
}
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_get(vip);
|
|
|
|
i915_request_add(vip);
|
drm/i915: Use rcu instead of stop_machine in set_wedged
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.
This patch tries to address the locking abuse of stop_machine() from
commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 22 14:41:21 2016 +0000
drm/i915: Stop the machine as we install the wedged submit_request handler
Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.
I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.
Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:
- Without stop_machine we must hold the corresponding spinlock.
- Without stop_machine we must ensure that all requests are marked as
having failed with dma_fence_set_error() before we call it. That
means we need to split the nop request submission into two phases,
both synchronized with rcu:
1. Only stop submitting the requests to hw and mark them as failed.
2. After all pending requests in the scheduler/ring are suitably
marked up as failed and we can force complete them all, also force
complete by calling intel_engine_init_global_seqno().
This should fix the followwing lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
but task is already holding lock:
(&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #6 (&dev->struct_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_interruptible_nested+0x1b/0x20
i915_mutex_lock_interruptible+0x51/0x130 [i915]
i915_gem_fault+0x209/0x650 [i915]
__do_fault+0x1e/0x80
__handle_mm_fault+0xa08/0xed0
handle_mm_fault+0x156/0x300
__do_page_fault+0x2c5/0x570
do_page_fault+0x28/0x250
page_fault+0x22/0x30
-> #5 (&mm->mmap_sem){++++}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__might_fault+0x68/0x90
_copy_to_user+0x23/0x70
filldir+0xa5/0x120
dcache_readdir+0xf9/0x170
iterate_dir+0x69/0x1a0
SyS_getdents+0xa5/0x140
entry_SYSCALL_64_fastpath+0x1c/0xb1
-> #4 (&sb->s_type->i_mutex_key#5){++++}:
down_write+0x3b/0x70
handle_create+0xcb/0x1e0
devtmpfsd+0x139/0x180
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #3 ((complete)&req.done){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
wait_for_common+0x58/0x210
wait_for_completion+0x1d/0x20
devtmpfs_create_node+0x13d/0x160
device_add+0x5eb/0x620
device_create_groups_vargs+0xe0/0xf0
device_create+0x3a/0x40
msr_device_create+0x2b/0x40
cpuhp_invoke_callback+0xc9/0xbf0
cpuhp_thread_fun+0x17b/0x240
smpboot_thread_fn+0x18a/0x280
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #2 (cpuhp_state-up){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpuhp_issue_call+0x133/0x1c0
__cpuhp_setup_state_cpuslocked+0x139/0x2a0
__cpuhp_setup_state+0x46/0x60
page_writeback_init+0x43/0x67
pagecache_init+0x3d/0x42
start_kernel+0x3a8/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #1 (cpuhp_state_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_nested+0x1b/0x20
__cpuhp_setup_state_cpuslocked+0x53/0x2a0
__cpuhp_setup_state+0x46/0x60
page_alloc_init+0x28/0x30
start_kernel+0x145/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x430/0x840
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpus_read_lock+0x3d/0xb0
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
i915_handle_error+0x2d8/0x430 [i915]
hangcheck_declare_hang+0xd3/0xf0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
ret_from_fork+0x27/0x40
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->struct_mutex);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
3 locks held by kworker/3:4/562:
#0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
dump_stack+0x68/0x9f
print_circular_bug+0x235/0x3c0
? lockdep_init_map_crosslock+0x20/0x20
check_prev_add+0x430/0x840
? irq_work_queue+0x86/0xe0
? wake_up_klogd+0x53/0x70
__lock_acquire+0x1420/0x15e0
? __lock_acquire+0x1420/0x15e0
? lockdep_init_map_crosslock+0x20/0x20
lock_acquire+0xb0/0x200
? stop_machine+0x1c/0x40
? i915_gem_object_truncate+0x50/0x50 [i915]
cpus_read_lock+0x3d/0xb0
? stop_machine+0x1c/0x40
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
? gen8_gt_irq_ack+0x170/0x170 [i915]
? work_on_cpu_safe+0x60/0x60
i915_handle_error+0x2d8/0x430 [i915]
? vsnprintf+0xd1/0x4b0
? scnprintf+0x3a/0x70
hangcheck_declare_hang+0xd3/0xf0 [i915]
? intel_runtime_pm_put+0x56/0xa0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.
v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).
v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.
v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.
v6: Remove unused variable to appease CI.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
2017-10-11 16:10:19 +07:00
|
|
|
rcu_read_lock();
|
2017-02-23 14:44:18 +07:00
|
|
|
request->engine->submit_request(request);
|
drm/i915: Use rcu instead of stop_machine in set_wedged
stop_machine is not really a locking primitive we should use, except
when the hw folks tell us the hw is broken and that's the only way to
work around it.
This patch tries to address the locking abuse of stop_machine() from
commit 20e4933c478a1ca694b38fa4ac44d99e659941f5
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 22 14:41:21 2016 +0000
drm/i915: Stop the machine as we install the wedged submit_request handler
Chris said parts of the reasons for going with stop_machine() was that
it's no overhead for the fast-path. But these callbacks use irqsave
spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast.
To stay as close as possible to the stop_machine semantics we first
update all the submit function pointers to the nop handler, then call
synchronize_rcu() to make sure no new requests can be submitted. This
should give us exactly the huge barrier we want.
I pondered whether we should annotate engine->submit_request as __rcu
and use rcu_assign_pointer and rcu_dereference on it. But the reason
behind those is to make sure the compiler/cpu barriers are there for
when you have an actual data structure you point at, to make sure all
the writes are seen correctly on the read side. But we just have a
function pointer, and .text isn't changed, so no need for these
barriers and hence no need for annotations.
Unfortunately there's a complication with the call to
intel_engine_init_global_seqno:
- Without stop_machine we must hold the corresponding spinlock.
- Without stop_machine we must ensure that all requests are marked as
having failed with dma_fence_set_error() before we call it. That
means we need to split the nop request submission into two phases,
both synchronized with rcu:
1. Only stop submitting the requests to hw and mark them as failed.
2. After all pending requests in the scheduler/ring are suitably
marked up as failed and we can force complete them all, also force
complete by calling intel_engine_init_global_seqno().
This should fix the followwing lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
4.14.0-rc3-CI-CI_DRM_3179+ #1 Tainted: G U
------------------------------------------------------
kworker/3:4/562 is trying to acquire lock:
(cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff8113d4bc>] stop_machine+0x1c/0x40
but task is already holding lock:
(&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #6 (&dev->struct_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_interruptible_nested+0x1b/0x20
i915_mutex_lock_interruptible+0x51/0x130 [i915]
i915_gem_fault+0x209/0x650 [i915]
__do_fault+0x1e/0x80
__handle_mm_fault+0xa08/0xed0
handle_mm_fault+0x156/0x300
__do_page_fault+0x2c5/0x570
do_page_fault+0x28/0x250
page_fault+0x22/0x30
-> #5 (&mm->mmap_sem){++++}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__might_fault+0x68/0x90
_copy_to_user+0x23/0x70
filldir+0xa5/0x120
dcache_readdir+0xf9/0x170
iterate_dir+0x69/0x1a0
SyS_getdents+0xa5/0x140
entry_SYSCALL_64_fastpath+0x1c/0xb1
-> #4 (&sb->s_type->i_mutex_key#5){++++}:
down_write+0x3b/0x70
handle_create+0xcb/0x1e0
devtmpfsd+0x139/0x180
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #3 ((complete)&req.done){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
wait_for_common+0x58/0x210
wait_for_completion+0x1d/0x20
devtmpfs_create_node+0x13d/0x160
device_add+0x5eb/0x620
device_create_groups_vargs+0xe0/0xf0
device_create+0x3a/0x40
msr_device_create+0x2b/0x40
cpuhp_invoke_callback+0xc9/0xbf0
cpuhp_thread_fun+0x17b/0x240
smpboot_thread_fn+0x18a/0x280
kthread+0x152/0x190
ret_from_fork+0x27/0x40
-> #2 (cpuhp_state-up){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpuhp_issue_call+0x133/0x1c0
__cpuhp_setup_state_cpuslocked+0x139/0x2a0
__cpuhp_setup_state+0x46/0x60
page_writeback_init+0x43/0x67
pagecache_init+0x3d/0x42
start_kernel+0x3a8/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #1 (cpuhp_state_mutex){+.+.}:
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
__mutex_lock+0x86/0x9b0
mutex_lock_nested+0x1b/0x20
__cpuhp_setup_state_cpuslocked+0x53/0x2a0
__cpuhp_setup_state+0x46/0x60
page_alloc_init+0x28/0x30
start_kernel+0x145/0x3fc
x86_64_start_reservations+0x2a/0x2c
x86_64_start_kernel+0x6d/0x70
verify_cpu+0x0/0xfb
-> #0 (cpu_hotplug_lock.rw_sem){++++}:
check_prev_add+0x430/0x840
__lock_acquire+0x1420/0x15e0
lock_acquire+0xb0/0x200
cpus_read_lock+0x3d/0xb0
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
i915_handle_error+0x2d8/0x430 [i915]
hangcheck_declare_hang+0xd3/0xf0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
ret_from_fork+0x27/0x40
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock.rw_sem --> &mm->mmap_sem --> &dev->struct_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&dev->struct_mutex);
lock(&mm->mmap_sem);
lock(&dev->struct_mutex);
lock(cpu_hotplug_lock.rw_sem);
*** DEADLOCK ***
3 locks held by kworker/3:4/562:
#0: ("events_long"){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#1: ((&(&i915->gpu_error.hangcheck_work)->work)){+.+.}, at: [<ffffffff8109c64a>] process_one_work+0x1aa/0x660
#2: (&dev->struct_mutex){+.+.}, at: [<ffffffffa0136588>] i915_reset_device+0x1e8/0x260 [i915]
stack backtrace:
CPU: 3 PID: 562 Comm: kworker/3:4 Tainted: G U 4.14.0-rc3-CI-CI_DRM_3179+ #1
Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
Workqueue: events_long i915_hangcheck_elapsed [i915]
Call Trace:
dump_stack+0x68/0x9f
print_circular_bug+0x235/0x3c0
? lockdep_init_map_crosslock+0x20/0x20
check_prev_add+0x430/0x840
? irq_work_queue+0x86/0xe0
? wake_up_klogd+0x53/0x70
__lock_acquire+0x1420/0x15e0
? __lock_acquire+0x1420/0x15e0
? lockdep_init_map_crosslock+0x20/0x20
lock_acquire+0xb0/0x200
? stop_machine+0x1c/0x40
? i915_gem_object_truncate+0x50/0x50 [i915]
cpus_read_lock+0x3d/0xb0
? stop_machine+0x1c/0x40
stop_machine+0x1c/0x40
i915_gem_set_wedged+0x1a/0x20 [i915]
i915_reset+0xb9/0x230 [i915]
i915_reset_device+0x1f6/0x260 [i915]
? gen8_gt_irq_ack+0x170/0x170 [i915]
? work_on_cpu_safe+0x60/0x60
i915_handle_error+0x2d8/0x430 [i915]
? vsnprintf+0xd1/0x4b0
? scnprintf+0x3a/0x70
hangcheck_declare_hang+0xd3/0xf0 [i915]
? intel_runtime_pm_put+0x56/0xa0 [i915]
i915_hangcheck_elapsed+0x262/0x2d0 [i915]
process_one_work+0x233/0x660
worker_thread+0x4e/0x3b0
kthread+0x152/0x190
? process_one_work+0x660/0x660
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x27/0x40
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
Setting dangerous option reset - tainting kernel
i915 0000:00:02.0: Resetting chip after gpu hang
v2: Have 1 global synchronize_rcu() barrier across all engines, and
improve commit message.
v3: We need to protect the seqno update with the timeline spinlock (in
set_wedged) to avoid racing with other updates of the seqno, like we
already do in nop_submit_request (Chris).
v4: Use two-phase sequence to plug the race Chris spotted where we can
complete requests before they're marked up with -EIO.
v5: Review from Chris:
- simplify nop_submit_request.
- Add comment to rcu_read_lock section.
- Align comments with the new style.
v6: Remove unused variable to appease CI.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102886
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103096
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171011091019.1425-1-daniel.vetter@ffwll.ch
2017-10-11 16:10:19 +07:00
|
|
|
rcu_read_unlock();
|
2017-02-23 14:44:18 +07:00
|
|
|
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_wait(vip, 0, HZ) == -ETIME) {
|
2019-02-26 16:49:20 +07:00
|
|
|
pr_err("timed out waiting for high priority request\n");
|
2017-02-23 14:44:18 +07:00
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_completed(request)) {
|
2017-02-23 14:44:18 +07:00
|
|
|
pr_err("low priority request already completed\n");
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = 0;
|
|
|
|
err:
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_put(vip);
|
2017-02-23 14:44:18 +07:00
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
err_context_1:
|
|
|
|
mock_context_close(ctx[1]);
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_put(request);
|
2017-02-23 14:44:18 +07:00
|
|
|
err_context_0:
|
|
|
|
mock_context_close(ctx[0]);
|
|
|
|
mock_device_flush(i915);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
struct smoketest {
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
struct i915_gem_context **contexts;
|
|
|
|
atomic_long_t num_waits, num_fences;
|
|
|
|
int ncontexts, max_batch;
|
|
|
|
struct i915_request *(*request_alloc)(struct i915_gem_context *,
|
|
|
|
struct intel_engine_cs *);
|
|
|
|
};
|
|
|
|
|
|
|
|
static struct i915_request *
|
|
|
|
__mock_request_alloc(struct i915_gem_context *ctx,
|
|
|
|
struct intel_engine_cs *engine)
|
|
|
|
{
|
|
|
|
return mock_request(engine, ctx, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct i915_request *
|
|
|
|
__live_request_alloc(struct i915_gem_context *ctx,
|
|
|
|
struct intel_engine_cs *engine)
|
|
|
|
{
|
|
|
|
return i915_request_alloc(engine, ctx);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int __igt_breadcrumbs_smoketest(void *arg)
|
|
|
|
{
|
|
|
|
struct smoketest *t = arg;
|
|
|
|
struct mutex * const BKL = &t->engine->i915->drm.struct_mutex;
|
|
|
|
const unsigned int max_batch = min(t->ncontexts, t->max_batch) - 1;
|
|
|
|
const unsigned int total = 4 * t->ncontexts + 1;
|
|
|
|
unsigned int num_waits = 0, num_fences = 0;
|
|
|
|
struct i915_request **requests;
|
|
|
|
I915_RND_STATE(prng);
|
|
|
|
unsigned int *order;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* A very simple test to catch the most egregious of list handling bugs.
|
|
|
|
*
|
|
|
|
* At its heart, we simply create oodles of requests running across
|
|
|
|
* multiple kthreads and enable signaling on them, for the sole purpose
|
|
|
|
* of stressing our breadcrumb handling. The only inspection we do is
|
|
|
|
* that the fences were marked as signaled.
|
|
|
|
*/
|
|
|
|
|
|
|
|
requests = kmalloc_array(total, sizeof(*requests), GFP_KERNEL);
|
|
|
|
if (!requests)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
order = i915_random_order(total, &prng);
|
|
|
|
if (!order) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
goto out_requests;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (!kthread_should_stop()) {
|
|
|
|
struct i915_sw_fence *submit, *wait;
|
|
|
|
unsigned int n, count;
|
|
|
|
|
|
|
|
submit = heap_fence_create(GFP_KERNEL);
|
|
|
|
if (!submit) {
|
|
|
|
err = -ENOMEM;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
wait = heap_fence_create(GFP_KERNEL);
|
|
|
|
if (!wait) {
|
|
|
|
i915_sw_fence_commit(submit);
|
|
|
|
heap_fence_put(submit);
|
|
|
|
err = ENOMEM;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_random_reorder(order, total, &prng);
|
|
|
|
count = 1 + i915_prandom_u32_max_state(max_batch, &prng);
|
|
|
|
|
|
|
|
for (n = 0; n < count; n++) {
|
|
|
|
struct i915_gem_context *ctx =
|
|
|
|
t->contexts[order[n] % t->ncontexts];
|
|
|
|
struct i915_request *rq;
|
|
|
|
|
|
|
|
mutex_lock(BKL);
|
|
|
|
|
|
|
|
rq = t->request_alloc(ctx, t->engine);
|
|
|
|
if (IS_ERR(rq)) {
|
|
|
|
mutex_unlock(BKL);
|
|
|
|
err = PTR_ERR(rq);
|
|
|
|
count = n;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = i915_sw_fence_await_sw_fence_gfp(&rq->submit,
|
|
|
|
submit,
|
|
|
|
GFP_KERNEL);
|
|
|
|
|
|
|
|
requests[n] = i915_request_get(rq);
|
|
|
|
i915_request_add(rq);
|
|
|
|
|
|
|
|
mutex_unlock(BKL);
|
|
|
|
|
|
|
|
if (err >= 0)
|
|
|
|
err = i915_sw_fence_await_dma_fence(wait,
|
|
|
|
&rq->fence,
|
|
|
|
0,
|
|
|
|
GFP_KERNEL);
|
|
|
|
|
|
|
|
if (err < 0) {
|
|
|
|
i915_request_put(rq);
|
|
|
|
count = n;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_sw_fence_commit(submit);
|
|
|
|
i915_sw_fence_commit(wait);
|
|
|
|
|
|
|
|
if (!wait_event_timeout(wait->wait,
|
|
|
|
i915_sw_fence_done(wait),
|
|
|
|
HZ / 2)) {
|
|
|
|
struct i915_request *rq = requests[count - 1];
|
|
|
|
|
|
|
|
pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n",
|
|
|
|
count,
|
|
|
|
rq->fence.context, rq->fence.seqno,
|
|
|
|
t->engine->name);
|
|
|
|
i915_gem_set_wedged(t->engine->i915);
|
|
|
|
GEM_BUG_ON(!i915_request_completed(rq));
|
|
|
|
i915_sw_fence_wait(wait);
|
|
|
|
err = -EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (n = 0; n < count; n++) {
|
|
|
|
struct i915_request *rq = requests[n];
|
|
|
|
|
|
|
|
if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
|
|
|
|
&rq->fence.flags)) {
|
|
|
|
pr_err("%llu:%llu was not signaled!\n",
|
|
|
|
rq->fence.context, rq->fence.seqno);
|
|
|
|
err = -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_request_put(rq);
|
|
|
|
}
|
|
|
|
|
|
|
|
heap_fence_put(wait);
|
|
|
|
heap_fence_put(submit);
|
|
|
|
|
|
|
|
if (err < 0)
|
|
|
|
break;
|
|
|
|
|
|
|
|
num_fences += count;
|
|
|
|
num_waits++;
|
|
|
|
|
|
|
|
cond_resched();
|
|
|
|
}
|
|
|
|
|
|
|
|
atomic_long_add(num_fences, &t->num_fences);
|
|
|
|
atomic_long_add(num_waits, &t->num_waits);
|
|
|
|
|
|
|
|
kfree(order);
|
|
|
|
out_requests:
|
|
|
|
kfree(requests);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int mock_breadcrumbs_smoketest(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
|
|
|
struct smoketest t = {
|
2019-03-06 01:03:30 +07:00
|
|
|
.engine = i915->engine[RCS0],
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
.ncontexts = 1024,
|
|
|
|
.max_batch = 1024,
|
|
|
|
.request_alloc = __mock_request_alloc
|
|
|
|
};
|
|
|
|
unsigned int ncpus = num_online_cpus();
|
|
|
|
struct task_struct **threads;
|
|
|
|
unsigned int n;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Smoketest our breadcrumb/signal handling for requests across multiple
|
|
|
|
* threads. A very simple test to only catch the most egregious of bugs.
|
|
|
|
* See __igt_breadcrumbs_smoketest();
|
|
|
|
*/
|
|
|
|
|
|
|
|
threads = kmalloc_array(ncpus, sizeof(*threads), GFP_KERNEL);
|
|
|
|
if (!threads)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
t.contexts =
|
|
|
|
kmalloc_array(t.ncontexts, sizeof(*t.contexts), GFP_KERNEL);
|
|
|
|
if (!t.contexts) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_threads;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&t.engine->i915->drm.struct_mutex);
|
|
|
|
for (n = 0; n < t.ncontexts; n++) {
|
|
|
|
t.contexts[n] = mock_context(t.engine->i915, "mock");
|
|
|
|
if (!t.contexts[n]) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_contexts;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&t.engine->i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
for (n = 0; n < ncpus; n++) {
|
|
|
|
threads[n] = kthread_run(__igt_breadcrumbs_smoketest,
|
|
|
|
&t, "igt/%d", n);
|
|
|
|
if (IS_ERR(threads[n])) {
|
|
|
|
ret = PTR_ERR(threads[n]);
|
|
|
|
ncpus = n;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
get_task_struct(threads[n]);
|
|
|
|
}
|
|
|
|
|
|
|
|
msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
|
|
|
|
|
|
|
|
for (n = 0; n < ncpus; n++) {
|
|
|
|
int err;
|
|
|
|
|
|
|
|
err = kthread_stop(threads[n]);
|
|
|
|
if (err < 0 && !ret)
|
|
|
|
ret = err;
|
|
|
|
|
|
|
|
put_task_struct(threads[n]);
|
|
|
|
}
|
|
|
|
pr_info("Completed %lu waits for %lu fence across %d cpus\n",
|
|
|
|
atomic_long_read(&t.num_waits),
|
|
|
|
atomic_long_read(&t.num_fences),
|
|
|
|
ncpus);
|
|
|
|
|
|
|
|
mutex_lock(&t.engine->i915->drm.struct_mutex);
|
|
|
|
out_contexts:
|
|
|
|
for (n = 0; n < t.ncontexts; n++) {
|
|
|
|
if (!t.contexts[n])
|
|
|
|
break;
|
|
|
|
mock_context_close(t.contexts[n]);
|
|
|
|
}
|
|
|
|
mutex_unlock(&t.engine->i915->drm.struct_mutex);
|
|
|
|
kfree(t.contexts);
|
|
|
|
out_threads:
|
|
|
|
kfree(threads);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
int i915_request_mock_selftests(void)
|
2017-02-14 00:15:21 +07:00
|
|
|
{
|
|
|
|
static const struct i915_subtest tests[] = {
|
|
|
|
SUBTEST(igt_add_request),
|
2017-02-14 00:15:22 +07:00
|
|
|
SUBTEST(igt_wait_request),
|
2017-02-14 00:15:23 +07:00
|
|
|
SUBTEST(igt_fence_wait),
|
2017-02-23 14:44:18 +07:00
|
|
|
SUBTEST(igt_request_rewind),
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
SUBTEST(mock_breadcrumbs_smoketest),
|
2017-02-14 00:15:21 +07:00
|
|
|
};
|
|
|
|
struct drm_i915_private *i915;
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_wakeref_t wakeref;
|
2019-01-14 21:21:23 +07:00
|
|
|
int err = 0;
|
2017-02-14 00:15:21 +07:00
|
|
|
|
|
|
|
i915 = mock_gem_device();
|
|
|
|
if (!i915)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2019-01-14 21:21:23 +07:00
|
|
|
with_intel_runtime_pm(i915, wakeref)
|
|
|
|
err = i915_subtests(tests, i915);
|
2019-01-14 21:21:22 +07:00
|
|
|
|
2018-06-18 18:01:54 +07:00
|
|
|
drm_dev_put(&i915->drm);
|
2017-02-14 00:15:21 +07:00
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
2017-02-14 00:15:24 +07:00
|
|
|
|
|
|
|
static int live_nop_request(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
|
|
|
struct intel_engine_cs *engine;
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_wakeref_t wakeref;
|
2019-01-22 05:20:47 +07:00
|
|
|
struct igt_live_test t;
|
2017-02-14 00:15:24 +07:00
|
|
|
unsigned int id;
|
2017-11-15 05:33:46 +07:00
|
|
|
int err = -ENODEV;
|
2017-02-14 00:15:24 +07:00
|
|
|
|
|
|
|
/* Submit various sized batches of empty requests, to each engine
|
|
|
|
* (individually), and wait for the batch to complete. We can check
|
|
|
|
* the overhead of submitting requests to the hardware.
|
|
|
|
*/
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-01-14 21:21:22 +07:00
|
|
|
wakeref = intel_runtime_pm_get(i915);
|
2017-02-14 00:15:24 +07:00
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
2018-06-14 19:49:23 +07:00
|
|
|
struct i915_request *request = NULL;
|
2017-02-14 00:15:24 +07:00
|
|
|
unsigned long n, prime;
|
2018-06-14 19:49:23 +07:00
|
|
|
IGT_TIMEOUT(end_time);
|
2017-02-14 00:15:24 +07:00
|
|
|
ktime_t times[2] = {};
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_begin(&t, i915, __func__, engine->name);
|
2017-02-14 00:15:24 +07:00
|
|
|
if (err)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
for_each_prime_number_from(prime, 1, 8192) {
|
|
|
|
times[1] = ktime_get_raw();
|
|
|
|
|
|
|
|
for (n = 0; n < prime; n++) {
|
2018-02-21 16:56:36 +07:00
|
|
|
request = i915_request_alloc(engine,
|
|
|
|
i915->kernel_context);
|
2017-02-14 00:15:24 +07:00
|
|
|
if (IS_ERR(request)) {
|
|
|
|
err = PTR_ERR(request);
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* This space is left intentionally blank.
|
|
|
|
*
|
|
|
|
* We do not actually want to perform any
|
|
|
|
* action with this request, we just want
|
|
|
|
* to measure the latency in allocation
|
|
|
|
* and submission of our breadcrumbs -
|
|
|
|
* ensuring that the bare request is sufficient
|
|
|
|
* for the system to work (i.e. proper HEAD
|
|
|
|
* tracking of the rings, interrupt handling,
|
|
|
|
* etc). It also gives us the lowest bounds
|
|
|
|
* for latency.
|
|
|
|
*/
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(request);
|
2017-02-14 00:15:24 +07:00
|
|
|
}
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_wait(request,
|
2017-02-14 00:15:24 +07:00
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
|
|
|
|
times[1] = ktime_sub(ktime_get_raw(), times[1]);
|
|
|
|
if (prime == 1)
|
|
|
|
times[0] = times[1];
|
|
|
|
|
|
|
|
if (__igt_timeout(end_time, NULL))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_end(&t);
|
2017-02-14 00:15:24 +07:00
|
|
|
if (err)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
pr_info("Request latencies on %s: 1 = %lluns, %lu = %lluns\n",
|
|
|
|
engine->name,
|
|
|
|
ktime_to_ns(times[0]),
|
|
|
|
prime, div64_u64(ktime_to_ns(times[1]), prime));
|
|
|
|
}
|
|
|
|
|
|
|
|
out_unlock:
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_runtime_pm_put(i915, wakeref);
|
2017-02-14 00:15:24 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 00:15:27 +07:00
|
|
|
static struct i915_vma *empty_batch(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
struct i915_vma *vma;
|
|
|
|
u32 *cmd;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
|
|
|
|
if (IS_ERR(obj))
|
|
|
|
return ERR_CAST(obj);
|
|
|
|
|
|
|
|
cmd = i915_gem_object_pin_map(obj, I915_MAP_WB);
|
|
|
|
if (IS_ERR(cmd)) {
|
|
|
|
err = PTR_ERR(cmd);
|
|
|
|
goto err;
|
|
|
|
}
|
2017-09-26 22:34:09 +07:00
|
|
|
|
2017-02-14 00:15:27 +07:00
|
|
|
*cmd = MI_BATCH_BUFFER_END;
|
2017-09-26 22:34:09 +07:00
|
|
|
i915_gem_chipset_flush(i915);
|
|
|
|
|
2017-02-14 00:15:27 +07:00
|
|
|
i915_gem_object_unpin_map(obj);
|
|
|
|
|
|
|
|
err = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
2018-06-05 22:37:58 +07:00
|
|
|
vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
|
2017-02-14 00:15:27 +07:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
err = PTR_ERR(vma);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_GLOBAL);
|
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
return vma;
|
|
|
|
|
|
|
|
err:
|
|
|
|
i915_gem_object_put(obj);
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
static struct i915_request *
|
2017-02-14 00:15:27 +07:00
|
|
|
empty_request(struct intel_engine_cs *engine,
|
|
|
|
struct i915_vma *batch)
|
|
|
|
{
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request;
|
2017-02-14 00:15:27 +07:00
|
|
|
int err;
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
request = i915_request_alloc(engine, engine->i915->kernel_context);
|
2017-02-14 00:15:27 +07:00
|
|
|
if (IS_ERR(request))
|
|
|
|
return request;
|
|
|
|
|
|
|
|
err = engine->emit_bb_start(request,
|
|
|
|
batch->node.start,
|
|
|
|
batch->node.size,
|
|
|
|
I915_DISPATCH_SECURE);
|
|
|
|
if (err)
|
|
|
|
goto out_request;
|
|
|
|
|
|
|
|
out_request:
|
2018-06-12 17:51:35 +07:00
|
|
|
i915_request_add(request);
|
2017-02-14 00:15:27 +07:00
|
|
|
return err ? ERR_PTR(err) : request;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int live_empty_request(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
|
|
|
struct intel_engine_cs *engine;
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_wakeref_t wakeref;
|
2019-01-22 05:20:47 +07:00
|
|
|
struct igt_live_test t;
|
2017-02-14 00:15:27 +07:00
|
|
|
struct i915_vma *batch;
|
|
|
|
unsigned int id;
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/* Submit various sized batches of empty requests, to each engine
|
|
|
|
* (individually), and wait for the batch to complete. We can check
|
|
|
|
* the overhead of submitting requests to the hardware.
|
|
|
|
*/
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-01-14 21:21:22 +07:00
|
|
|
wakeref = intel_runtime_pm_get(i915);
|
2017-02-14 00:15:27 +07:00
|
|
|
|
|
|
|
batch = empty_batch(i915);
|
|
|
|
if (IS_ERR(batch)) {
|
|
|
|
err = PTR_ERR(batch);
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
IGT_TIMEOUT(end_time);
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request;
|
2017-02-14 00:15:27 +07:00
|
|
|
unsigned long n, prime;
|
|
|
|
ktime_t times[2] = {};
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_begin(&t, i915, __func__, engine->name);
|
2017-02-14 00:15:27 +07:00
|
|
|
if (err)
|
|
|
|
goto out_batch;
|
|
|
|
|
|
|
|
/* Warmup / preload */
|
|
|
|
request = empty_request(engine, batch);
|
|
|
|
if (IS_ERR(request)) {
|
|
|
|
err = PTR_ERR(request);
|
|
|
|
goto out_batch;
|
|
|
|
}
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_wait(request,
|
2017-02-14 00:15:27 +07:00
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
|
|
|
|
for_each_prime_number_from(prime, 1, 8192) {
|
|
|
|
times[1] = ktime_get_raw();
|
|
|
|
|
|
|
|
for (n = 0; n < prime; n++) {
|
|
|
|
request = empty_request(engine, batch);
|
|
|
|
if (IS_ERR(request)) {
|
|
|
|
err = PTR_ERR(request);
|
|
|
|
goto out_batch;
|
|
|
|
}
|
|
|
|
}
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_wait(request,
|
2017-02-14 00:15:27 +07:00
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
|
|
|
|
times[1] = ktime_sub(ktime_get_raw(), times[1]);
|
|
|
|
if (prime == 1)
|
|
|
|
times[0] = times[1];
|
|
|
|
|
|
|
|
if (__igt_timeout(end_time, NULL))
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_end(&t);
|
2017-02-14 00:15:27 +07:00
|
|
|
if (err)
|
|
|
|
goto out_batch;
|
|
|
|
|
|
|
|
pr_info("Batch latencies on %s: 1 = %lluns, %lu = %lluns\n",
|
|
|
|
engine->name,
|
|
|
|
ktime_to_ns(times[0]),
|
|
|
|
prime, div64_u64(ktime_to_ns(times[1]), prime));
|
|
|
|
}
|
|
|
|
|
|
|
|
out_batch:
|
|
|
|
i915_vma_unpin(batch);
|
|
|
|
i915_vma_put(batch);
|
|
|
|
out_unlock:
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_runtime_pm_put(i915, wakeref);
|
2017-02-14 00:15:27 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 00:15:25 +07:00
|
|
|
static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct i915_gem_context *ctx = i915->kernel_context;
|
2018-06-05 22:37:58 +07:00
|
|
|
struct i915_address_space *vm =
|
|
|
|
ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm;
|
2017-02-14 00:15:25 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
const int gen = INTEL_GEN(i915);
|
|
|
|
struct i915_vma *vma;
|
|
|
|
u32 *cmd;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
obj = i915_gem_object_create_internal(i915, PAGE_SIZE);
|
|
|
|
if (IS_ERR(obj))
|
|
|
|
return ERR_CAST(obj);
|
|
|
|
|
|
|
|
vma = i915_vma_instance(obj, vm, NULL);
|
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
err = PTR_ERR(vma);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = i915_vma_pin(vma, 0, 0, PIN_USER);
|
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
2017-04-12 18:01:11 +07:00
|
|
|
err = i915_gem_object_set_to_wc_domain(obj, true);
|
2017-02-14 00:15:25 +07:00
|
|
|
if (err)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
cmd = i915_gem_object_pin_map(obj, I915_MAP_WC);
|
|
|
|
if (IS_ERR(cmd)) {
|
|
|
|
err = PTR_ERR(cmd);
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (gen >= 8) {
|
|
|
|
*cmd++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
|
|
|
|
*cmd++ = lower_32_bits(vma->node.start);
|
|
|
|
*cmd++ = upper_32_bits(vma->node.start);
|
|
|
|
} else if (gen >= 6) {
|
|
|
|
*cmd++ = MI_BATCH_BUFFER_START | 1 << 8;
|
|
|
|
*cmd++ = lower_32_bits(vma->node.start);
|
|
|
|
} else {
|
2018-07-05 22:47:56 +07:00
|
|
|
*cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT;
|
2017-02-14 00:15:25 +07:00
|
|
|
*cmd++ = lower_32_bits(vma->node.start);
|
|
|
|
}
|
|
|
|
*cmd++ = MI_BATCH_BUFFER_END; /* terminate early in case of error */
|
2017-09-26 22:34:09 +07:00
|
|
|
i915_gem_chipset_flush(i915);
|
2017-02-14 00:15:25 +07:00
|
|
|
|
|
|
|
i915_gem_object_unpin_map(obj);
|
|
|
|
|
|
|
|
return vma;
|
|
|
|
|
|
|
|
err:
|
|
|
|
i915_gem_object_put(obj);
|
|
|
|
return ERR_PTR(err);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int recursive_batch_resolve(struct i915_vma *batch)
|
|
|
|
{
|
|
|
|
u32 *cmd;
|
|
|
|
|
|
|
|
cmd = i915_gem_object_pin_map(batch->obj, I915_MAP_WC);
|
|
|
|
if (IS_ERR(cmd))
|
|
|
|
return PTR_ERR(cmd);
|
|
|
|
|
|
|
|
*cmd = MI_BATCH_BUFFER_END;
|
2017-09-26 22:34:09 +07:00
|
|
|
i915_gem_chipset_flush(batch->vm->i915);
|
2017-02-14 00:15:25 +07:00
|
|
|
|
|
|
|
i915_gem_object_unpin_map(batch->obj);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int live_all_engines(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
|
|
|
struct intel_engine_cs *engine;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request[I915_NUM_ENGINES];
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_wakeref_t wakeref;
|
2019-01-22 05:20:47 +07:00
|
|
|
struct igt_live_test t;
|
2017-02-14 00:15:25 +07:00
|
|
|
struct i915_vma *batch;
|
|
|
|
unsigned int id;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/* Check we can submit requests to all engines simultaneously. We
|
|
|
|
* send a recursive batch to each engine - checking that we don't
|
|
|
|
* block doing so, and that they don't complete too soon.
|
|
|
|
*/
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-01-14 21:21:22 +07:00
|
|
|
wakeref = intel_runtime_pm_get(i915);
|
2017-02-14 00:15:25 +07:00
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_begin(&t, i915, __func__, "");
|
2017-02-14 00:15:25 +07:00
|
|
|
if (err)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
batch = recursive_batch(i915);
|
|
|
|
if (IS_ERR(batch)) {
|
|
|
|
err = PTR_ERR(batch);
|
|
|
|
pr_err("%s: Unable to create batch, err=%d\n", __func__, err);
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
2018-02-21 16:56:36 +07:00
|
|
|
request[id] = i915_request_alloc(engine, i915->kernel_context);
|
2017-02-14 00:15:25 +07:00
|
|
|
if (IS_ERR(request[id])) {
|
|
|
|
err = PTR_ERR(request[id]);
|
|
|
|
pr_err("%s: Request allocation failed with err=%d\n",
|
|
|
|
__func__, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = engine->emit_bb_start(request[id],
|
|
|
|
batch->node.start,
|
|
|
|
batch->node.size,
|
|
|
|
0);
|
|
|
|
GEM_BUG_ON(err);
|
|
|
|
request[id]->batch = batch;
|
|
|
|
|
|
|
|
if (!i915_gem_object_has_active_reference(batch->obj)) {
|
|
|
|
i915_gem_object_get(batch->obj);
|
|
|
|
i915_gem_object_set_active_reference(batch->obj);
|
|
|
|
}
|
|
|
|
|
2018-07-06 17:39:44 +07:00
|
|
|
err = i915_vma_move_to_active(batch, request[id], 0);
|
|
|
|
GEM_BUG_ON(err);
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_get(request[id]);
|
|
|
|
i915_request_add(request[id]);
|
2017-02-14 00:15:25 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_completed(request[id])) {
|
2017-02-14 00:15:25 +07:00
|
|
|
pr_err("%s(%s): request completed too early!\n",
|
|
|
|
__func__, engine->name);
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
err = recursive_batch_resolve(batch);
|
|
|
|
if (err) {
|
|
|
|
pr_err("%s: failed to resolve batch, err=%d\n", __func__, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
long timeout;
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
timeout = i915_request_wait(request[id],
|
2017-02-14 00:15:25 +07:00
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
if (timeout < 0) {
|
|
|
|
err = timeout;
|
|
|
|
pr_err("%s: error waiting for request on %s, err=%d\n",
|
|
|
|
__func__, engine->name, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
GEM_BUG_ON(!i915_request_completed(request[id]));
|
|
|
|
i915_request_put(request[id]);
|
2017-02-14 00:15:25 +07:00
|
|
|
request[id] = NULL;
|
|
|
|
}
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_end(&t);
|
2017-02-14 00:15:25 +07:00
|
|
|
|
|
|
|
out_request:
|
|
|
|
for_each_engine(engine, i915, id)
|
|
|
|
if (request[id])
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_put(request[id]);
|
2017-02-14 00:15:25 +07:00
|
|
|
i915_vma_unpin(batch);
|
|
|
|
i915_vma_put(batch);
|
|
|
|
out_unlock:
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_runtime_pm_put(i915, wakeref);
|
2017-02-14 00:15:25 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-02-14 00:15:26 +07:00
|
|
|
static int live_sequential_engines(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
2018-02-21 16:56:36 +07:00
|
|
|
struct i915_request *request[I915_NUM_ENGINES] = {};
|
|
|
|
struct i915_request *prev = NULL;
|
2017-02-14 00:15:26 +07:00
|
|
|
struct intel_engine_cs *engine;
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_wakeref_t wakeref;
|
2019-01-22 05:20:47 +07:00
|
|
|
struct igt_live_test t;
|
2017-02-14 00:15:26 +07:00
|
|
|
unsigned int id;
|
|
|
|
int err;
|
|
|
|
|
|
|
|
/* Check we can submit requests to all engines sequentially, such
|
|
|
|
* that each successive request waits for the earlier ones. This
|
|
|
|
* tests that we don't execute requests out of order, even though
|
|
|
|
* they are running on independent engines.
|
|
|
|
*/
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2019-01-14 21:21:22 +07:00
|
|
|
wakeref = intel_runtime_pm_get(i915);
|
2017-02-14 00:15:26 +07:00
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_begin(&t, i915, __func__, "");
|
2017-02-14 00:15:26 +07:00
|
|
|
if (err)
|
|
|
|
goto out_unlock;
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
struct i915_vma *batch;
|
|
|
|
|
|
|
|
batch = recursive_batch(i915);
|
|
|
|
if (IS_ERR(batch)) {
|
|
|
|
err = PTR_ERR(batch);
|
|
|
|
pr_err("%s: Unable to create batch for %s, err=%d\n",
|
|
|
|
__func__, engine->name, err);
|
|
|
|
goto out_unlock;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
request[id] = i915_request_alloc(engine, i915->kernel_context);
|
2017-02-14 00:15:26 +07:00
|
|
|
if (IS_ERR(request[id])) {
|
|
|
|
err = PTR_ERR(request[id]);
|
|
|
|
pr_err("%s: Request allocation failed for %s with err=%d\n",
|
|
|
|
__func__, engine->name, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (prev) {
|
2018-02-21 16:56:36 +07:00
|
|
|
err = i915_request_await_dma_fence(request[id],
|
|
|
|
&prev->fence);
|
2017-02-14 00:15:26 +07:00
|
|
|
if (err) {
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_add(request[id]);
|
2017-02-14 00:15:26 +07:00
|
|
|
pr_err("%s: Request await failed for %s with err=%d\n",
|
|
|
|
__func__, engine->name, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
err = engine->emit_bb_start(request[id],
|
|
|
|
batch->node.start,
|
|
|
|
batch->node.size,
|
|
|
|
0);
|
|
|
|
GEM_BUG_ON(err);
|
|
|
|
request[id]->batch = batch;
|
|
|
|
|
2018-07-06 17:39:44 +07:00
|
|
|
err = i915_vma_move_to_active(batch, request[id], 0);
|
|
|
|
GEM_BUG_ON(err);
|
|
|
|
|
2017-02-14 00:15:26 +07:00
|
|
|
i915_gem_object_set_active_reference(batch->obj);
|
|
|
|
i915_vma_get(batch);
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_get(request[id]);
|
|
|
|
i915_request_add(request[id]);
|
2017-02-14 00:15:26 +07:00
|
|
|
|
|
|
|
prev = request[id];
|
|
|
|
}
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
long timeout;
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
if (i915_request_completed(request[id])) {
|
2017-02-14 00:15:26 +07:00
|
|
|
pr_err("%s(%s): request completed too early!\n",
|
|
|
|
__func__, engine->name);
|
|
|
|
err = -EINVAL;
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = recursive_batch_resolve(request[id]->batch);
|
|
|
|
if (err) {
|
|
|
|
pr_err("%s: failed to resolve batch, err=%d\n",
|
|
|
|
__func__, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
timeout = i915_request_wait(request[id],
|
2017-02-14 00:15:26 +07:00
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
|
|
|
if (timeout < 0) {
|
|
|
|
err = timeout;
|
|
|
|
pr_err("%s: error waiting for request on %s, err=%d\n",
|
|
|
|
__func__, engine->name, err);
|
|
|
|
goto out_request;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
GEM_BUG_ON(!i915_request_completed(request[id]));
|
2017-02-14 00:15:26 +07:00
|
|
|
}
|
|
|
|
|
2019-01-22 05:20:47 +07:00
|
|
|
err = igt_live_test_end(&t);
|
2017-02-14 00:15:26 +07:00
|
|
|
|
|
|
|
out_request:
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
u32 *cmd;
|
|
|
|
|
|
|
|
if (!request[id])
|
|
|
|
break;
|
|
|
|
|
|
|
|
cmd = i915_gem_object_pin_map(request[id]->batch->obj,
|
|
|
|
I915_MAP_WC);
|
|
|
|
if (!IS_ERR(cmd)) {
|
|
|
|
*cmd = MI_BATCH_BUFFER_END;
|
2017-09-26 22:34:09 +07:00
|
|
|
i915_gem_chipset_flush(i915);
|
|
|
|
|
2017-02-14 00:15:26 +07:00
|
|
|
i915_gem_object_unpin_map(request[id]->batch->obj);
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_vma_put(request[id]->batch);
|
2018-02-21 16:56:36 +07:00
|
|
|
i915_request_put(request[id]);
|
2017-02-14 00:15:26 +07:00
|
|
|
}
|
|
|
|
out_unlock:
|
2019-01-14 21:21:22 +07:00
|
|
|
intel_runtime_pm_put(i915, wakeref);
|
2017-02-14 00:15:26 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
static int
|
|
|
|
max_batches(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
|
|
|
|
{
|
|
|
|
struct i915_request *rq;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Before execlists, all contexts share the same ringbuffer. With
|
|
|
|
* execlists, each context/engine has a separate ringbuffer and
|
|
|
|
* for the purposes of this test, inexhaustible.
|
|
|
|
*
|
|
|
|
* For the global ringbuffer though, we have to be very careful
|
|
|
|
* that we do not wrap while preventing the execution of requests
|
|
|
|
* with a unsignaled fence.
|
|
|
|
*/
|
|
|
|
if (HAS_EXECLISTS(ctx->i915))
|
|
|
|
return INT_MAX;
|
|
|
|
|
|
|
|
rq = i915_request_alloc(engine, ctx);
|
|
|
|
if (IS_ERR(rq)) {
|
|
|
|
ret = PTR_ERR(rq);
|
|
|
|
} else {
|
|
|
|
int sz;
|
|
|
|
|
|
|
|
ret = rq->ring->size - rq->reserved_space;
|
|
|
|
i915_request_add(rq);
|
|
|
|
|
|
|
|
sz = rq->ring->emit - rq->head;
|
|
|
|
if (sz < 0)
|
|
|
|
sz += rq->ring->size;
|
|
|
|
ret /= sz;
|
|
|
|
ret /= 2; /* leave half spare, in case of emergency! */
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int live_breadcrumbs_smoketest(void *arg)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = arg;
|
|
|
|
struct smoketest t[I915_NUM_ENGINES];
|
|
|
|
unsigned int ncpus = num_online_cpus();
|
|
|
|
unsigned long num_waits, num_fences;
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
struct task_struct **threads;
|
|
|
|
struct igt_live_test live;
|
|
|
|
enum intel_engine_id id;
|
|
|
|
intel_wakeref_t wakeref;
|
|
|
|
struct drm_file *file;
|
|
|
|
unsigned int n;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Smoketest our breadcrumb/signal handling for requests across multiple
|
|
|
|
* threads. A very simple test to only catch the most egregious of bugs.
|
|
|
|
* See __igt_breadcrumbs_smoketest();
|
|
|
|
*
|
|
|
|
* On real hardware this time.
|
|
|
|
*/
|
|
|
|
|
|
|
|
wakeref = intel_runtime_pm_get(i915);
|
|
|
|
|
|
|
|
file = mock_file(i915);
|
|
|
|
if (IS_ERR(file)) {
|
|
|
|
ret = PTR_ERR(file);
|
|
|
|
goto out_rpm;
|
|
|
|
}
|
|
|
|
|
|
|
|
threads = kcalloc(ncpus * I915_NUM_ENGINES,
|
|
|
|
sizeof(*threads),
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!threads) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_file;
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(&t[0], 0, sizeof(t[0]));
|
|
|
|
t[0].request_alloc = __live_request_alloc;
|
|
|
|
t[0].ncontexts = 64;
|
|
|
|
t[0].contexts = kmalloc_array(t[0].ncontexts,
|
|
|
|
sizeof(*t[0].contexts),
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!t[0].contexts) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_threads;
|
|
|
|
}
|
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
for (n = 0; n < t[0].ncontexts; n++) {
|
|
|
|
t[0].contexts[n] = live_context(i915, file);
|
|
|
|
if (!t[0].contexts[n]) {
|
|
|
|
ret = -ENOMEM;
|
|
|
|
goto out_contexts;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = igt_live_test_begin(&live, i915, __func__, "");
|
|
|
|
if (ret)
|
|
|
|
goto out_contexts;
|
|
|
|
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
t[id] = t[0];
|
|
|
|
t[id].engine = engine;
|
|
|
|
t[id].max_batch = max_batches(t[0].contexts[0], engine);
|
|
|
|
if (t[id].max_batch < 0) {
|
|
|
|
ret = t[id].max_batch;
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
goto out_flush;
|
|
|
|
}
|
|
|
|
/* One ring interleaved between requests from all cpus */
|
|
|
|
t[id].max_batch /= num_online_cpus() + 1;
|
|
|
|
pr_debug("Limiting batches to %d requests on %s\n",
|
|
|
|
t[id].max_batch, engine->name);
|
|
|
|
|
|
|
|
for (n = 0; n < ncpus; n++) {
|
|
|
|
struct task_struct *tsk;
|
|
|
|
|
|
|
|
tsk = kthread_run(__igt_breadcrumbs_smoketest,
|
|
|
|
&t[id], "igt/%d.%d", id, n);
|
|
|
|
if (IS_ERR(tsk)) {
|
|
|
|
ret = PTR_ERR(tsk);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
goto out_flush;
|
|
|
|
}
|
|
|
|
|
|
|
|
get_task_struct(tsk);
|
|
|
|
threads[id * ncpus + n] = tsk;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
msleep(jiffies_to_msecs(i915_selftest.timeout_jiffies));
|
|
|
|
|
|
|
|
out_flush:
|
|
|
|
num_waits = 0;
|
|
|
|
num_fences = 0;
|
|
|
|
for_each_engine(engine, i915, id) {
|
|
|
|
for (n = 0; n < ncpus; n++) {
|
|
|
|
struct task_struct *tsk = threads[id * ncpus + n];
|
|
|
|
int err;
|
|
|
|
|
|
|
|
if (!tsk)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
err = kthread_stop(tsk);
|
|
|
|
if (err < 0 && !ret)
|
|
|
|
ret = err;
|
|
|
|
|
|
|
|
put_task_struct(tsk);
|
|
|
|
}
|
|
|
|
|
|
|
|
num_waits += atomic_long_read(&t[id].num_waits);
|
|
|
|
num_fences += atomic_long_read(&t[id].num_fences);
|
|
|
|
}
|
|
|
|
pr_info("Completed %lu waits for %lu fences across %d engines and %d cpus\n",
|
2019-03-06 01:03:30 +07:00
|
|
|
num_waits, num_fences, RUNTIME_INFO(i915)->num_engines, ncpus);
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
ret = igt_live_test_end(&live) ?: ret;
|
|
|
|
out_contexts:
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
kfree(t[0].contexts);
|
|
|
|
out_threads:
|
|
|
|
kfree(threads);
|
|
|
|
out_file:
|
|
|
|
mock_file_free(i915, file);
|
|
|
|
out_rpm:
|
|
|
|
intel_runtime_pm_put(i915, wakeref);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-02-21 16:56:36 +07:00
|
|
|
int i915_request_live_selftests(struct drm_i915_private *i915)
|
2017-02-14 00:15:24 +07:00
|
|
|
{
|
|
|
|
static const struct i915_subtest tests[] = {
|
|
|
|
SUBTEST(live_nop_request),
|
2017-02-14 00:15:25 +07:00
|
|
|
SUBTEST(live_all_engines),
|
2017-02-14 00:15:26 +07:00
|
|
|
SUBTEST(live_sequential_engines),
|
2017-02-14 00:15:27 +07:00
|
|
|
SUBTEST(live_empty_request),
|
drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c725816 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.
To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.
Before commit 688e6c725816, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c725816 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)
The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).
Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.
v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.
References: 688e6c725816 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-30 03:52:29 +07:00
|
|
|
SUBTEST(live_breadcrumbs_smoketest),
|
2017-02-14 00:15:24 +07:00
|
|
|
};
|
2018-07-06 13:53:10 +07:00
|
|
|
|
2019-02-20 21:56:37 +07:00
|
|
|
if (i915_terminally_wedged(i915))
|
2018-07-06 13:53:10 +07:00
|
|
|
return 0;
|
|
|
|
|
2017-02-14 00:15:24 +07:00
|
|
|
return i915_subtests(tests, i915);
|
|
|
|
}
|