2008-07-31 02:06:12 +07:00
|
|
|
/*
|
2015-03-18 16:46:04 +07:00
|
|
|
* Copyright © 2008-2015 Intel Corporation
|
2008-07-31 02:06:12 +07:00
|
|
|
*
|
|
|
|
* Permission is hereby granted, free of charge, to any person obtaining a
|
|
|
|
* copy of this software and associated documentation files (the "Software"),
|
|
|
|
* to deal in the Software without restriction, including without limitation
|
|
|
|
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
|
|
|
* and/or sell copies of the Software, and to permit persons to whom the
|
|
|
|
* Software is furnished to do so, subject to the following conditions:
|
|
|
|
*
|
|
|
|
* The above copyright notice and this permission notice (including the next
|
|
|
|
* paragraph) shall be included in all copies or substantial portions of the
|
|
|
|
* Software.
|
|
|
|
*
|
|
|
|
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
|
|
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
|
|
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
|
|
|
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
|
|
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
|
|
|
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
|
|
|
* IN THE SOFTWARE.
|
|
|
|
*
|
|
|
|
* Authors:
|
|
|
|
* Eric Anholt <eric@anholt.net>
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2012-10-03 00:01:07 +07:00
|
|
|
#include <drm/drmP.h>
|
2013-07-25 02:07:52 +07:00
|
|
|
#include <drm/drm_vma_manager.h>
|
2012-10-03 00:01:07 +07:00
|
|
|
#include <drm/i915_drm.h>
|
2008-07-31 02:06:12 +07:00
|
|
|
#include "i915_drv.h"
|
2015-02-10 18:05:49 +07:00
|
|
|
#include "i915_vgpu.h"
|
2009-08-25 17:15:50 +07:00
|
|
|
#include "i915_trace.h"
|
2009-08-18 03:31:43 +07:00
|
|
|
#include "intel_drv.h"
|
2016-08-04 22:32:35 +07:00
|
|
|
#include "intel_frontbuffer.h"
|
2016-04-13 21:03:25 +07:00
|
|
|
#include "intel_mocs.h"
|
2016-11-15 03:41:05 +07:00
|
|
|
#include <linux/dma-fence-array.h>
|
2016-07-20 15:21:15 +07:00
|
|
|
#include <linux/reservation.h>
|
2011-06-28 06:18:18 +07:00
|
|
|
#include <linux/shmem_fs.h>
|
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
|
|
|
#include <linux/slab.h>
|
2016-11-22 21:41:21 +07:00
|
|
|
#include <linux/stop_machine.h>
|
2008-07-31 02:06:12 +07:00
|
|
|
#include <linux/swap.h>
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 05:24:08 +07:00
|
|
|
#include <linux/pci.h>
|
2012-05-10 20:25:09 +07:00
|
|
|
#include <linux/dma-buf.h>
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
static void i915_gem_flush_free_objects(struct drm_i915_private *i915);
|
2010-11-09 02:18:58 +07:00
|
|
|
static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
|
2015-01-21 20:53:48 +07:00
|
|
|
static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
|
2012-04-17 21:31:31 +07:00
|
|
|
|
2013-08-08 20:41:03 +07:00
|
|
|
static bool cpu_cache_is_coherent(struct drm_device *dev,
|
|
|
|
enum i915_cache_level level)
|
|
|
|
{
|
2016-11-04 21:42:44 +07:00
|
|
|
return HAS_LLC(to_i915(dev)) || level != I915_CACHE_NONE;
|
2013-08-08 20:41:03 +07:00
|
|
|
}
|
|
|
|
|
2013-08-09 18:26:45 +07:00
|
|
|
static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
|
|
|
|
return false;
|
|
|
|
|
2013-08-09 18:26:45 +07:00
|
|
|
if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
|
|
|
|
return true;
|
|
|
|
|
|
|
|
return obj->pin_display;
|
|
|
|
}
|
|
|
|
|
2016-06-10 15:53:01 +07:00
|
|
|
static int
|
2016-10-28 19:58:39 +07:00
|
|
|
insert_mappable_node(struct i915_ggtt *ggtt,
|
2016-06-10 15:53:01 +07:00
|
|
|
struct drm_mm_node *node, u32 size)
|
|
|
|
{
|
|
|
|
memset(node, 0, sizeof(*node));
|
2016-10-28 19:58:39 +07:00
|
|
|
return drm_mm_insert_node_in_range_generic(&ggtt->base.mm, node,
|
2016-12-05 21:29:36 +07:00
|
|
|
size, 0,
|
|
|
|
I915_COLOR_UNEVICTABLE,
|
2016-10-28 19:58:39 +07:00
|
|
|
0, ggtt->mappable_end,
|
2016-06-10 15:53:01 +07:00
|
|
|
DRM_MM_SEARCH_DEFAULT,
|
|
|
|
DRM_MM_CREATE_DEFAULT);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
remove_mappable_node(struct drm_mm_node *node)
|
|
|
|
{
|
|
|
|
drm_mm_remove_node(node);
|
|
|
|
}
|
|
|
|
|
2010-09-30 17:46:12 +07:00
|
|
|
/* some bookkeeping */
|
|
|
|
static void i915_gem_info_add_obj(struct drm_i915_private *dev_priv,
|
2016-10-18 19:02:48 +07:00
|
|
|
u64 size)
|
2010-09-30 17:46:12 +07:00
|
|
|
{
|
2013-07-25 03:40:23 +07:00
|
|
|
spin_lock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 17:46:12 +07:00
|
|
|
dev_priv->mm.object_count++;
|
|
|
|
dev_priv->mm.object_memory += size;
|
2013-07-25 03:40:23 +07:00
|
|
|
spin_unlock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 17:46:12 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
|
2016-10-18 19:02:48 +07:00
|
|
|
u64 size)
|
2010-09-30 17:46:12 +07:00
|
|
|
{
|
2013-07-25 03:40:23 +07:00
|
|
|
spin_lock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 17:46:12 +07:00
|
|
|
dev_priv->mm.object_count--;
|
|
|
|
dev_priv->mm.object_memory -= size;
|
2013-07-25 03:40:23 +07:00
|
|
|
spin_unlock(&dev_priv->mm.object_stat_lock);
|
2010-09-30 17:46:12 +07:00
|
|
|
}
|
|
|
|
|
2011-01-26 22:55:56 +07:00
|
|
|
static int
|
2012-11-14 23:14:05 +07:00
|
|
|
i915_gem_wait_for_error(struct i915_gpu_error *error)
|
2010-09-25 16:19:17 +07:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:32 +07:00
|
|
|
might_sleep();
|
|
|
|
|
2016-04-13 23:35:05 +07:00
|
|
|
if (!i915_reset_in_progress(error))
|
2010-09-25 16:19:17 +07:00
|
|
|
return 0;
|
|
|
|
|
2012-07-05 03:18:41 +07:00
|
|
|
/*
|
|
|
|
* Only wait 10 seconds for the gpu reset to complete to avoid hanging
|
|
|
|
* userspace. If it takes that long something really bad is going on and
|
|
|
|
* we should simply try to bail out and fail as gracefully as possible.
|
|
|
|
*/
|
2012-11-15 23:17:22 +07:00
|
|
|
ret = wait_event_interruptible_timeout(error->reset_queue,
|
2016-04-13 23:35:05 +07:00
|
|
|
!i915_reset_in_progress(error),
|
2016-10-28 19:58:24 +07:00
|
|
|
I915_RESET_TIMEOUT);
|
2012-07-05 03:18:41 +07:00
|
|
|
if (ret == 0) {
|
|
|
|
DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
|
|
|
|
return -EIO;
|
|
|
|
} else if (ret < 0) {
|
2010-09-25 16:19:17 +07:00
|
|
|
return ret;
|
2016-04-13 23:35:05 +07:00
|
|
|
} else {
|
|
|
|
return 0;
|
2012-07-05 03:18:41 +07:00
|
|
|
}
|
2010-09-25 16:19:17 +07:00
|
|
|
}
|
|
|
|
|
2010-11-26 01:00:26 +07:00
|
|
|
int i915_mutex_lock_interruptible(struct drm_device *dev)
|
2010-09-25 17:22:51 +07:00
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2010-09-25 17:22:51 +07:00
|
|
|
int ret;
|
|
|
|
|
2012-11-14 23:14:05 +07:00
|
|
|
ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
|
2010-09-25 17:22:51 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = mutex_lock_interruptible(&dev->struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2010-09-25 16:19:17 +07:00
|
|
|
|
2008-10-23 11:40:13 +07:00
|
|
|
int
|
|
|
|
i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-10-23 11:40:13 +07:00
|
|
|
{
|
2016-03-30 20:57:10 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2016-03-18 15:42:57 +07:00
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2016-03-30 20:57:10 +07:00
|
|
|
struct drm_i915_gem_get_aperture *args = data;
|
2015-07-01 17:51:10 +07:00
|
|
|
struct i915_vma *vma;
|
2010-11-24 19:23:44 +07:00
|
|
|
size_t pinned;
|
2008-10-23 11:40:13 +07:00
|
|
|
|
2010-11-24 19:23:44 +07:00
|
|
|
pinned = 0;
|
2010-09-30 17:46:12 +07:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2016-02-26 18:03:19 +07:00
|
|
|
list_for_each_entry(vma, &ggtt->base.active_list, vm_link)
|
2016-08-04 22:32:30 +07:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 17:51:10 +07:00
|
|
|
pinned += vma->node.size;
|
2016-02-26 18:03:19 +07:00
|
|
|
list_for_each_entry(vma, &ggtt->base.inactive_list, vm_link)
|
2016-08-04 22:32:30 +07:00
|
|
|
if (i915_vma_is_pinned(vma))
|
2015-07-01 17:51:10 +07:00
|
|
|
pinned += vma->node.size;
|
2010-09-30 17:46:12 +07:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2008-10-23 11:40:13 +07:00
|
|
|
|
2016-03-30 20:57:10 +07:00
|
|
|
args->aper_size = ggtt->base.total;
|
2011-08-17 02:34:10 +07:00
|
|
|
args->aper_available_size = args->aper_size - pinned;
|
2010-11-24 19:23:44 +07:00
|
|
|
|
2008-10-23 11:40:13 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
static struct sg_table *
|
2014-11-04 19:51:40 +07:00
|
|
|
i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj)
|
2014-05-21 18:42:56 +07:00
|
|
|
{
|
2015-12-05 11:45:44 +07:00
|
|
|
struct address_space *mapping = obj->base.filp->f_mapping;
|
2014-11-04 19:51:40 +07:00
|
|
|
char *vaddr = obj->phys_handle->vaddr;
|
|
|
|
struct sg_table *st;
|
|
|
|
struct scatterlist *sg;
|
|
|
|
int i;
|
2014-05-21 18:42:56 +07:00
|
|
|
|
2014-11-04 19:51:40 +07:00
|
|
|
if (WARN_ON(i915_gem_object_needs_bit17_swizzle(obj)))
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(-EINVAL);
|
2014-11-04 19:51:40 +07:00
|
|
|
|
|
|
|
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
|
|
|
|
struct page *page;
|
|
|
|
char *src;
|
|
|
|
|
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
|
|
|
if (IS_ERR(page))
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_CAST(page);
|
2014-11-04 19:51:40 +07:00
|
|
|
|
|
|
|
src = kmap_atomic(page);
|
|
|
|
memcpy(vaddr, src, PAGE_SIZE);
|
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
|
|
|
kunmap_atomic(src);
|
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 19:29:47 +07:00
|
|
|
put_page(page);
|
2014-11-04 19:51:40 +07:00
|
|
|
vaddr += PAGE_SIZE;
|
|
|
|
}
|
|
|
|
|
2016-05-06 21:40:21 +07:00
|
|
|
i915_gem_chipset_flush(to_i915(obj->base.dev));
|
2014-11-04 19:51:40 +07:00
|
|
|
|
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (st == NULL)
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2014-11-04 19:51:40 +07:00
|
|
|
|
|
|
|
if (sg_alloc_table(st, 1, GFP_KERNEL)) {
|
|
|
|
kfree(st);
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2014-11-04 19:51:40 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
sg = st->sgl;
|
|
|
|
sg->offset = 0;
|
|
|
|
sg->length = obj->base.size;
|
2014-05-21 18:42:56 +07:00
|
|
|
|
2014-11-04 19:51:40 +07:00
|
|
|
sg_dma_address(sg) = obj->phys_handle->busaddr;
|
|
|
|
sg_dma_len(sg) = obj->base.size;
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
return st;
|
2014-11-04 19:51:40 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
2016-11-11 21:58:09 +07:00
|
|
|
__i915_gem_object_release_shmem(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
2014-11-04 19:51:40 +07:00
|
|
|
{
|
2016-10-28 19:58:35 +07:00
|
|
|
GEM_BUG_ON(obj->mm.madv == __I915_MADV_PURGED);
|
2014-05-21 18:42:56 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv == I915_MADV_DONTNEED)
|
|
|
|
obj->mm.dirty = false;
|
2014-11-04 19:51:40 +07:00
|
|
|
|
2016-11-19 04:17:47 +07:00
|
|
|
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0 &&
|
|
|
|
!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
|
2016-11-11 21:58:09 +07:00
|
|
|
drm_clflush_sg(pages);
|
2016-10-28 19:58:36 +07:00
|
|
|
|
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
|
|
|
{
|
2016-11-11 21:58:09 +07:00
|
|
|
__i915_gem_object_release_shmem(obj, pages);
|
2016-10-28 19:58:36 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.dirty) {
|
2015-12-05 11:45:44 +07:00
|
|
|
struct address_space *mapping = obj->base.filp->f_mapping;
|
2014-11-04 19:51:40 +07:00
|
|
|
char *vaddr = obj->phys_handle->vaddr;
|
2014-05-21 18:42:56 +07:00
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < obj->base.size / PAGE_SIZE; i++) {
|
2014-11-04 19:51:40 +07:00
|
|
|
struct page *page;
|
|
|
|
char *dst;
|
|
|
|
|
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
|
|
|
if (IS_ERR(page))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
dst = kmap_atomic(page);
|
|
|
|
drm_clflush_virt_range(vaddr, PAGE_SIZE);
|
|
|
|
memcpy(dst, vaddr, PAGE_SIZE);
|
|
|
|
kunmap_atomic(dst);
|
|
|
|
|
|
|
|
set_page_dirty(page);
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv == I915_MADV_WILLNEED)
|
2014-05-21 18:42:56 +07:00
|
|
|
mark_page_accessed(page);
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 19:29:47 +07:00
|
|
|
put_page(page);
|
2014-05-21 18:42:56 +07:00
|
|
|
vaddr += PAGE_SIZE;
|
|
|
|
}
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.dirty = false;
|
2014-05-21 18:42:56 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
sg_free_table(pages);
|
|
|
|
kfree(pages);
|
2014-11-04 19:51:40 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void
|
|
|
|
i915_gem_object_release_phys(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
drm_pci_free(obj->base.dev, obj->phys_handle);
|
2016-10-28 19:58:35 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2014-11-04 19:51:40 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static const struct drm_i915_gem_object_ops i915_gem_phys_ops = {
|
|
|
|
.get_pages = i915_gem_object_get_pages_phys,
|
|
|
|
.put_pages = i915_gem_object_put_pages_phys,
|
|
|
|
.release = i915_gem_object_release_phys,
|
|
|
|
};
|
|
|
|
|
2016-08-15 00:44:40 +07:00
|
|
|
int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
|
2016-08-04 13:52:27 +07:00
|
|
|
{
|
|
|
|
struct i915_vma *vma;
|
|
|
|
LIST_HEAD(still_in_list);
|
2016-08-15 00:44:41 +07:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2016-08-04 13:52:27 +07:00
|
|
|
|
2016-08-15 00:44:41 +07:00
|
|
|
/* Closed vma are removed from the obj->vma_list - but they may
|
|
|
|
* still have an active binding on the object. To remove those we
|
|
|
|
* must wait for all rendering to complete to the object (as unbinding
|
|
|
|
* must anyway), and retire the requests.
|
2016-08-04 13:52:27 +07:00
|
|
|
*/
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
I915_WAIT_ALL,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2016-08-15 00:44:41 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
i915_gem_retire_requests(to_i915(obj->base.dev));
|
|
|
|
|
2016-08-04 13:52:27 +07:00
|
|
|
while ((vma = list_first_entry_or_null(&obj->vma_list,
|
|
|
|
struct i915_vma,
|
|
|
|
obj_link))) {
|
|
|
|
list_move_tail(&vma->obj_link, &still_in_list);
|
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
list_splice(&still_in_list, &obj->vma_list);
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
static long
|
|
|
|
i915_gem_object_wait_fence(struct dma_fence *fence,
|
|
|
|
unsigned int flags,
|
|
|
|
long timeout,
|
|
|
|
struct intel_rps_client *rps)
|
2016-08-04 22:32:40 +07:00
|
|
|
{
|
2016-10-28 19:58:27 +07:00
|
|
|
struct drm_i915_gem_request *rq;
|
2016-08-04 22:32:40 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
BUILD_BUG_ON(I915_WAIT_INTERRUPTIBLE != 0x1);
|
2016-08-04 22:32:40 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
|
|
|
|
return timeout;
|
|
|
|
|
|
|
|
if (!dma_fence_is_i915(fence))
|
|
|
|
return dma_fence_wait_timeout(fence,
|
|
|
|
flags & I915_WAIT_INTERRUPTIBLE,
|
|
|
|
timeout);
|
|
|
|
|
|
|
|
rq = to_request(fence);
|
|
|
|
if (i915_gem_request_completed(rq))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
/* This client is about to stall waiting for the GPU. In many cases
|
|
|
|
* this is undesirable and limits the throughput of the system, as
|
|
|
|
* many clients cannot continue processing user input/output whilst
|
|
|
|
* blocked. RPS autotuning may take tens of milliseconds to respond
|
|
|
|
* to the GPU load and thus incurs additional latency for the client.
|
|
|
|
* We can circumvent that by promoting the GPU frequency to maximum
|
|
|
|
* before we wait. This makes the GPU throttle up much more quickly
|
|
|
|
* (good for benchmarks and user experience, e.g. window animations),
|
|
|
|
* but at a cost of spending more power processing the workload
|
|
|
|
* (bad for battery). Not all clients even want their results
|
|
|
|
* immediately and for them we should just let the GPU select its own
|
|
|
|
* frequency to maximise efficiency. To prevent a single client from
|
|
|
|
* forcing the clocks too high for the whole system, we only allow
|
|
|
|
* each client to waitboost once in a busy period.
|
|
|
|
*/
|
|
|
|
if (rps) {
|
|
|
|
if (INTEL_GEN(rq->i915) >= 6)
|
|
|
|
gen6_rps_boost(rq->i915, rps, rq->emitted_jiffies);
|
|
|
|
else
|
|
|
|
rps = NULL;
|
2016-08-04 22:32:40 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
timeout = i915_wait_request(rq, flags, timeout);
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (flags & I915_WAIT_LOCKED && i915_gem_request_completed(rq))
|
|
|
|
i915_gem_request_retire_upto(rq);
|
|
|
|
|
2016-11-01 17:03:16 +07:00
|
|
|
if (rps && rq->global_seqno == intel_engine_last_submit(rq->engine)) {
|
2016-10-28 19:58:27 +07:00
|
|
|
/* The GPU is now idle and this client has stalled.
|
|
|
|
* Since no other client has submitted a request in the
|
|
|
|
* meantime, assume that this client is the only one
|
|
|
|
* supplying work to the GPU but is unable to keep that
|
|
|
|
* work supplied because it is waiting. Since the GPU is
|
|
|
|
* then never kept fully busy, RPS autoclocking will
|
|
|
|
* keep the clocks relatively low, causing further delays.
|
|
|
|
* Compensate by giving the synchronous client credit for
|
|
|
|
* a waitboost next time.
|
|
|
|
*/
|
|
|
|
spin_lock(&rq->i915->rps.client_lock);
|
|
|
|
list_del_init(&rps->link);
|
|
|
|
spin_unlock(&rq->i915->rps.client_lock);
|
|
|
|
}
|
|
|
|
|
|
|
|
return timeout;
|
|
|
|
}
|
|
|
|
|
|
|
|
static long
|
|
|
|
i915_gem_object_wait_reservation(struct reservation_object *resv,
|
|
|
|
unsigned int flags,
|
|
|
|
long timeout,
|
|
|
|
struct intel_rps_client *rps)
|
|
|
|
{
|
|
|
|
struct dma_fence *excl;
|
|
|
|
|
|
|
|
if (flags & I915_WAIT_ALL) {
|
|
|
|
struct dma_fence **shared;
|
|
|
|
unsigned int count, i;
|
2016-08-04 22:32:40 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = reservation_object_get_fences_rcu(resv,
|
|
|
|
&excl, &count, &shared);
|
2016-08-04 22:32:40 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
timeout = i915_gem_object_wait_fence(shared[i],
|
|
|
|
flags, timeout,
|
|
|
|
rps);
|
|
|
|
if (timeout <= 0)
|
|
|
|
break;
|
2016-08-04 22:32:40 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
dma_fence_put(shared[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (; i < count; i++)
|
|
|
|
dma_fence_put(shared[i]);
|
|
|
|
kfree(shared);
|
|
|
|
} else {
|
|
|
|
excl = reservation_object_get_excl_rcu(resv);
|
2016-08-04 22:32:40 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
if (excl && timeout > 0)
|
|
|
|
timeout = i915_gem_object_wait_fence(excl, flags, timeout, rps);
|
|
|
|
|
|
|
|
dma_fence_put(excl);
|
|
|
|
|
|
|
|
return timeout;
|
2016-08-04 22:32:40 +07:00
|
|
|
}
|
|
|
|
|
2016-11-15 03:41:05 +07:00
|
|
|
static void __fence_set_priority(struct dma_fence *fence, int prio)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_request *rq;
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
|
|
|
|
if (!dma_fence_is_i915(fence))
|
|
|
|
return;
|
|
|
|
|
|
|
|
rq = to_request(fence);
|
|
|
|
engine = rq->engine;
|
|
|
|
if (!engine->schedule)
|
|
|
|
return;
|
|
|
|
|
|
|
|
engine->schedule(rq, prio);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void fence_set_priority(struct dma_fence *fence, int prio)
|
|
|
|
{
|
|
|
|
/* Recurse once into a fence-array */
|
|
|
|
if (dma_fence_is_array(fence)) {
|
|
|
|
struct dma_fence_array *array = to_dma_fence_array(fence);
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < array->num_fences; i++)
|
|
|
|
__fence_set_priority(array->fences[i], prio);
|
|
|
|
} else {
|
|
|
|
__fence_set_priority(fence, prio);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int flags,
|
|
|
|
int prio)
|
|
|
|
{
|
|
|
|
struct dma_fence *excl;
|
|
|
|
|
|
|
|
if (flags & I915_WAIT_ALL) {
|
|
|
|
struct dma_fence **shared;
|
|
|
|
unsigned int count, i;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = reservation_object_get_fences_rcu(obj->resv,
|
|
|
|
&excl, &count, &shared);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
for (i = 0; i < count; i++) {
|
|
|
|
fence_set_priority(shared[i], prio);
|
|
|
|
dma_fence_put(shared[i]);
|
|
|
|
}
|
|
|
|
|
|
|
|
kfree(shared);
|
|
|
|
} else {
|
|
|
|
excl = reservation_object_get_excl_rcu(obj->resv);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (excl) {
|
|
|
|
fence_set_priority(excl, prio);
|
|
|
|
dma_fence_put(excl);
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
/**
|
|
|
|
* Waits for rendering to the object to be completed
|
|
|
|
* @obj: i915 gem object
|
|
|
|
* @flags: how to wait (under a lock, for all rendering or just for writes etc)
|
|
|
|
* @timeout: how long to wait
|
|
|
|
* @rps: client (user process) to charge for any waitboosting
|
2016-08-04 22:32:40 +07:00
|
|
|
*/
|
2016-10-28 19:58:27 +07:00
|
|
|
int
|
|
|
|
i915_gem_object_wait(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int flags,
|
|
|
|
long timeout,
|
|
|
|
struct intel_rps_client *rps)
|
2016-08-04 22:32:40 +07:00
|
|
|
{
|
2016-10-28 19:58:27 +07:00
|
|
|
might_sleep();
|
|
|
|
#if IS_ENABLED(CONFIG_LOCKDEP)
|
|
|
|
GEM_BUG_ON(debug_locks &&
|
|
|
|
!!lockdep_is_held(&obj->base.dev->struct_mutex) !=
|
|
|
|
!!(flags & I915_WAIT_LOCKED));
|
|
|
|
#endif
|
|
|
|
GEM_BUG_ON(timeout < 0);
|
2016-08-04 22:32:40 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
timeout = i915_gem_object_wait_reservation(obj->resv,
|
|
|
|
flags, timeout,
|
|
|
|
rps);
|
2016-10-28 19:58:27 +07:00
|
|
|
return timeout < 0 ? timeout : 0;
|
2016-08-04 22:32:40 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static struct intel_rps_client *to_rps_client(struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_file_private *fpriv = file->driver_priv;
|
|
|
|
|
|
|
|
return &fpriv->rps;
|
|
|
|
}
|
|
|
|
|
2014-05-21 18:42:56 +07:00
|
|
|
int
|
|
|
|
i915_gem_object_attach_phys(struct drm_i915_gem_object *obj,
|
|
|
|
int align)
|
|
|
|
{
|
|
|
|
drm_dma_handle_t *phys;
|
2014-11-04 19:51:40 +07:00
|
|
|
int ret;
|
2014-05-21 18:42:56 +07:00
|
|
|
|
|
|
|
if (obj->phys_handle) {
|
|
|
|
if ((unsigned long)obj->phys_handle->vaddr & (align -1))
|
|
|
|
return -EBUSY;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv != I915_MADV_WILLNEED)
|
2014-05-21 18:42:56 +07:00
|
|
|
return -EFAULT;
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-08-04 13:52:28 +07:00
|
|
|
ret = i915_gem_object_unbind(obj);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-11-01 19:11:34 +07:00
|
|
|
__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
|
2016-10-28 19:58:36 +07:00
|
|
|
if (obj->mm.pages)
|
|
|
|
return -EBUSY;
|
2014-11-04 19:51:40 +07:00
|
|
|
|
2014-05-21 18:42:56 +07:00
|
|
|
/* create a new object */
|
|
|
|
phys = drm_pci_alloc(obj->base.dev, obj->base.size, align);
|
|
|
|
if (!phys)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
obj->phys_handle = phys;
|
2014-11-04 19:51:40 +07:00
|
|
|
obj->ops = &i915_gem_phys_ops;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
return i915_gem_object_pin_pages(obj);
|
2014-05-21 18:42:56 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pwrite *args,
|
2016-10-28 19:58:36 +07:00
|
|
|
struct drm_file *file)
|
2014-05-21 18:42:56 +07:00
|
|
|
{
|
|
|
|
struct drm_device *dev = obj->base.dev;
|
|
|
|
void *vaddr = obj->phys_handle->vaddr + args->offset;
|
2016-04-26 22:32:27 +07:00
|
|
|
char __user *user_data = u64_to_user_ptr(args->data_ptr);
|
2016-10-28 19:58:27 +07:00
|
|
|
int ret;
|
2014-11-04 19:51:40 +07:00
|
|
|
|
|
|
|
/* We manually control the domain here and pretend that it
|
|
|
|
* remains coherent i.e. in the GTT domain, like shmem_pwrite.
|
|
|
|
*/
|
2016-10-28 19:58:27 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
I915_WAIT_ALL,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
2016-10-28 19:58:36 +07:00
|
|
|
to_rps_client(file));
|
2014-11-04 19:51:40 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2014-05-21 18:42:56 +07:00
|
|
|
|
2015-06-19 01:43:24 +07:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2014-05-21 18:42:56 +07:00
|
|
|
if (__copy_from_user_inatomic_nocache(vaddr, user_data, args->size)) {
|
|
|
|
unsigned long unwritten;
|
|
|
|
|
|
|
|
/* The physical object once assigned is fixed for the lifetime
|
|
|
|
* of the obj, so we can safely drop the lock and continue
|
|
|
|
* to access vaddr.
|
|
|
|
*/
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
unwritten = copy_from_user(vaddr, user_data, args->size);
|
|
|
|
mutex_lock(&dev->struct_mutex);
|
2015-02-14 02:23:45 +07:00
|
|
|
if (unwritten) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto out;
|
|
|
|
}
|
2014-05-21 18:42:56 +07:00
|
|
|
}
|
|
|
|
|
2014-11-04 19:51:40 +07:00
|
|
|
drm_clflush_virt_range(vaddr, args->size);
|
2016-05-06 21:40:21 +07:00
|
|
|
i915_gem_chipset_flush(to_i915(dev));
|
2015-02-14 02:23:45 +07:00
|
|
|
|
|
|
|
out:
|
2015-07-08 06:28:51 +07:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2015-02-14 02:23:45 +07:00
|
|
|
return ret;
|
2014-05-21 18:42:56 +07:00
|
|
|
}
|
|
|
|
|
2016-12-01 21:16:36 +07:00
|
|
|
void *i915_gem_object_alloc(struct drm_i915_private *dev_priv)
|
2012-11-15 18:32:30 +07:00
|
|
|
{
|
2015-04-07 22:20:57 +07:00
|
|
|
return kmem_cache_zalloc(dev_priv->objects, GFP_KERNEL);
|
2012-11-15 18:32:30 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void i915_gem_object_free(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2015-04-07 22:20:57 +07:00
|
|
|
kmem_cache_free(dev_priv->objects, obj);
|
2012-11-15 18:32:30 +07:00
|
|
|
}
|
|
|
|
|
2011-02-07 09:16:14 +07:00
|
|
|
static int
|
|
|
|
i915_gem_create(struct drm_file *file,
|
2016-12-01 21:16:37 +07:00
|
|
|
struct drm_i915_private *dev_priv,
|
2011-02-07 09:16:14 +07:00
|
|
|
uint64_t size,
|
|
|
|
uint32_t *handle_p)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2009-08-23 16:40:55 +07:00
|
|
|
int ret;
|
|
|
|
u32 handle;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2011-02-07 09:16:14 +07:00
|
|
|
size = roundup(size, PAGE_SIZE);
|
2011-09-14 19:14:28 +07:00
|
|
|
if (size == 0)
|
|
|
|
return -EINVAL;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
|
|
|
/* Allocate the new object */
|
2016-12-01 21:16:37 +07:00
|
|
|
obj = i915_gem_object_create(dev_priv, size);
|
2016-04-25 19:32:13 +07:00
|
|
|
if (IS_ERR(obj))
|
|
|
|
return PTR_ERR(obj);
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
ret = drm_gem_handle_create(file, &obj->base, &handle);
|
2010-10-14 19:20:40 +07:00
|
|
|
/* drop reference from allocate - handle holds it now */
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2013-07-25 04:25:03 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
2010-10-14 19:20:40 +07:00
|
|
|
|
2011-02-07 09:16:14 +07:00
|
|
|
*handle_p = handle;
|
2008-07-31 02:06:12 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-02-07 09:16:14 +07:00
|
|
|
int
|
|
|
|
i915_gem_dumb_create(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
|
|
|
struct drm_mode_create_dumb *args)
|
|
|
|
{
|
|
|
|
/* have to work out size/pitch and return them */
|
2013-10-19 04:48:24 +07:00
|
|
|
args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
|
2011-02-07 09:16:14 +07:00
|
|
|
args->size = args->pitch * args->height;
|
2016-12-01 21:16:37 +07:00
|
|
|
return i915_gem_create(file, to_i915(dev),
|
2014-12-24 10:11:17 +07:00
|
|
|
args->size, &args->handle);
|
2011-02-07 09:16:14 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Creates a new mm object and returns a handle to it.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2011-02-07 09:16:14 +07:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_create_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
2016-12-01 21:16:37 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2011-02-07 09:16:14 +07:00
|
|
|
struct drm_i915_gem_create *args = data;
|
2012-04-23 21:50:50 +07:00
|
|
|
|
2016-12-01 21:16:37 +07:00
|
|
|
i915_gem_flush_free_objects(dev_priv);
|
2016-10-28 19:58:42 +07:00
|
|
|
|
2016-12-01 21:16:37 +07:00
|
|
|
return i915_gem_create(file, dev_priv,
|
2014-12-24 10:11:17 +07:00
|
|
|
args->size, &args->handle);
|
2011-02-07 09:16:14 +07:00
|
|
|
}
|
|
|
|
|
2011-12-14 19:57:32 +07:00
|
|
|
static inline int
|
|
|
|
__copy_to_user_swizzled(char __user *cpu_vaddr,
|
|
|
|
const char *gpu_vaddr, int gpu_offset,
|
|
|
|
int length)
|
|
|
|
{
|
|
|
|
int ret, cpu_offset = 0;
|
|
|
|
|
|
|
|
while (length > 0) {
|
|
|
|
int cacheline_end = ALIGN(gpu_offset + 1, 64);
|
|
|
|
int this_length = min(cacheline_end - gpu_offset, length);
|
|
|
|
int swizzled_gpu_offset = gpu_offset ^ 64;
|
|
|
|
|
|
|
|
ret = __copy_to_user(cpu_vaddr + cpu_offset,
|
|
|
|
gpu_vaddr + swizzled_gpu_offset,
|
|
|
|
this_length);
|
|
|
|
if (ret)
|
|
|
|
return ret + length;
|
|
|
|
|
|
|
|
cpu_offset += this_length;
|
|
|
|
gpu_offset += this_length;
|
|
|
|
length -= this_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 19:57:31 +07:00
|
|
|
static inline int
|
2012-04-17 04:07:47 +07:00
|
|
|
__copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
|
|
|
|
const char __user *cpu_vaddr,
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 19:57:31 +07:00
|
|
|
int length)
|
|
|
|
{
|
|
|
|
int ret, cpu_offset = 0;
|
|
|
|
|
|
|
|
while (length > 0) {
|
|
|
|
int cacheline_end = ALIGN(gpu_offset + 1, 64);
|
|
|
|
int this_length = min(cacheline_end - gpu_offset, length);
|
|
|
|
int swizzled_gpu_offset = gpu_offset ^ 64;
|
|
|
|
|
|
|
|
ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
|
|
|
|
cpu_vaddr + cpu_offset,
|
|
|
|
this_length);
|
|
|
|
if (ret)
|
|
|
|
return ret + length;
|
|
|
|
|
|
|
|
cpu_offset += this_length;
|
|
|
|
gpu_offset += this_length;
|
|
|
|
length -= this_length;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-02-19 01:15:45 +07:00
|
|
|
/*
|
|
|
|
* Pins the specified object's pages and synchronizes the object with
|
|
|
|
* GPU accesses. Sets needs_clflush to non-zero if the caller should
|
|
|
|
* flush the object from the CPU cache.
|
|
|
|
*/
|
|
|
|
int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
|
2016-08-18 23:16:47 +07:00
|
|
|
unsigned int *needs_clflush)
|
2014-02-19 01:15:45 +07:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2014-02-19 01:15:45 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
*needs_clflush = 0;
|
2016-08-18 23:16:47 +07:00
|
|
|
if (!i915_gem_object_has_struct_page(obj))
|
|
|
|
return -ENODEV;
|
2014-02-19 01:15:45 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2016-07-20 15:21:15 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-18 23:16:50 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-18 23:16:48 +07:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
|
|
|
|
2016-08-18 23:16:47 +07:00
|
|
|
/* If we're not in the cpu read domain, set ourself into the gtt
|
|
|
|
* read domain and manually flush cachelines (if required). This
|
|
|
|
* optimizes for the case when the gpu will dirty the data
|
|
|
|
* anyway again before the next pread happens.
|
|
|
|
*/
|
|
|
|
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
|
2014-02-19 01:15:45 +07:00
|
|
|
*needs_clflush = !cpu_cache_is_coherent(obj->base.dev,
|
|
|
|
obj->cache_level);
|
2016-08-18 23:16:47 +07:00
|
|
|
|
|
|
|
if (*needs_clflush && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, false);
|
2016-08-18 23:16:50 +07:00
|
|
|
if (ret)
|
|
|
|
goto err_unpin;
|
|
|
|
|
2016-08-18 23:16:47 +07:00
|
|
|
*needs_clflush = 0;
|
2014-02-19 01:15:45 +07:00
|
|
|
}
|
|
|
|
|
2016-08-18 23:16:50 +07:00
|
|
|
/* return with the pages pinned */
|
2016-08-18 23:16:47 +07:00
|
|
|
return 0;
|
2016-08-18 23:16:50 +07:00
|
|
|
|
|
|
|
err_unpin:
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
return ret;
|
2016-08-18 23:16:47 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int *needs_clflush)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2016-08-18 23:16:47 +07:00
|
|
|
*needs_clflush = 0;
|
|
|
|
if (!i915_gem_object_has_struct_page(obj))
|
|
|
|
return -ENODEV;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
I915_WAIT_ALL,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2016-08-18 23:16:47 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-18 23:16:50 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-08-18 23:16:48 +07:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
|
|
|
|
2016-08-18 23:16:47 +07:00
|
|
|
/* If we're not in the cpu write domain, set ourself into the
|
|
|
|
* gtt write domain and manually flush cachelines (as required).
|
|
|
|
* This optimizes for the case when the gpu will use the data
|
|
|
|
* right away and we therefore have to clflush anyway.
|
|
|
|
*/
|
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
|
|
|
|
*needs_clflush |= cpu_write_needs_clflush(obj) << 1;
|
|
|
|
|
|
|
|
/* Same trick applies to invalidate partially written cachelines read
|
|
|
|
* before writing.
|
|
|
|
*/
|
|
|
|
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU))
|
|
|
|
*needs_clflush |= !cpu_cache_is_coherent(obj->base.dev,
|
|
|
|
obj->cache_level);
|
|
|
|
|
|
|
|
if (*needs_clflush && !static_cpu_has(X86_FEATURE_CLFLUSH)) {
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
2016-08-18 23:16:50 +07:00
|
|
|
if (ret)
|
|
|
|
goto err_unpin;
|
|
|
|
|
2016-08-18 23:16:47 +07:00
|
|
|
*needs_clflush = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((*needs_clflush & CLFLUSH_AFTER) == 0)
|
|
|
|
obj->cache_dirty = true;
|
|
|
|
|
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.dirty = true;
|
2016-08-18 23:16:50 +07:00
|
|
|
/* return with the pages pinned */
|
2016-08-18 23:16:47 +07:00
|
|
|
return 0;
|
2016-08-18 23:16:50 +07:00
|
|
|
|
|
|
|
err_unpin:
|
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
return ret;
|
2014-02-19 01:15:45 +07:00
|
|
|
}
|
|
|
|
|
2012-03-26 00:47:42 +07:00
|
|
|
static void
|
|
|
|
shmem_clflush_swizzled_range(char *addr, unsigned long length,
|
|
|
|
bool swizzled)
|
|
|
|
{
|
2012-03-26 00:47:43 +07:00
|
|
|
if (unlikely(swizzled)) {
|
2012-03-26 00:47:42 +07:00
|
|
|
unsigned long start = (unsigned long) addr;
|
|
|
|
unsigned long end = (unsigned long) addr + length;
|
|
|
|
|
|
|
|
/* For swizzling simply ensure that we always flush both
|
|
|
|
* channels. Lame, but simple and it works. Swizzled
|
|
|
|
* pwrite/pread is far from a hotpath - current userspace
|
|
|
|
* doesn't use it at all. */
|
|
|
|
start = round_down(start, 128);
|
|
|
|
end = round_up(end, 128);
|
|
|
|
|
|
|
|
drm_clflush_virt_range((void *)start, end - start);
|
|
|
|
} else {
|
|
|
|
drm_clflush_virt_range(addr, length);
|
|
|
|
}
|
|
|
|
|
|
|
|
}
|
|
|
|
|
2012-03-26 00:47:40 +07:00
|
|
|
/* Only difference to the fast-path function is that this can handle bit17
|
|
|
|
* and uses non-atomic copy and kmap functions. */
|
|
|
|
static int
|
2016-10-28 19:58:39 +07:00
|
|
|
shmem_pread_slow(struct page *page, int offset, int length,
|
2012-03-26 00:47:40 +07:00
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling, bool needs_clflush)
|
|
|
|
{
|
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
vaddr = kmap(page);
|
|
|
|
if (needs_clflush)
|
2016-10-28 19:58:39 +07:00
|
|
|
shmem_clflush_swizzled_range(vaddr + offset, length,
|
2012-03-26 00:47:42 +07:00
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 00:47:40 +07:00
|
|
|
|
|
|
|
if (page_do_bit17_swizzling)
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = __copy_to_user_swizzled(user_data, vaddr, offset, length);
|
2012-03-26 00:47:40 +07:00
|
|
|
else
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = __copy_to_user(user_data, vaddr + offset, length);
|
2012-03-26 00:47:40 +07:00
|
|
|
kunmap(page);
|
|
|
|
|
2012-09-05 03:02:56 +07:00
|
|
|
return ret ? - EFAULT : 0;
|
2012-03-26 00:47:40 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
static int
|
|
|
|
shmem_pread(struct page *page, int offset, int length, char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling, bool needs_clflush)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = -ENODEV;
|
|
|
|
if (!page_do_bit17_swizzling) {
|
|
|
|
char *vaddr = kmap_atomic(page);
|
|
|
|
|
|
|
|
if (needs_clflush)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, length);
|
|
|
|
ret = __copy_to_user_inatomic(user_data, vaddr + offset, length);
|
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
}
|
|
|
|
if (ret == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return shmem_pread_slow(page, offset, length, user_data,
|
|
|
|
page_do_bit17_swizzling, needs_clflush);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pread(struct drm_i915_gem_object *obj,
|
|
|
|
struct drm_i915_gem_pread *args)
|
|
|
|
{
|
|
|
|
char __user *user_data;
|
|
|
|
u64 remain;
|
|
|
|
unsigned int obj_do_bit17_swizzling;
|
|
|
|
unsigned int needs_clflush;
|
|
|
|
unsigned int idx, offset;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
obj_do_bit17_swizzling = 0;
|
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
|
|
|
obj_do_bit17_swizzling = BIT(17);
|
|
|
|
|
|
|
|
ret = mutex_lock_interruptible(&obj->base.dev->struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
ret = i915_gem_obj_prepare_shmem_read(obj, &needs_clflush);
|
|
|
|
mutex_unlock(&obj->base.dev->struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
remain = args->size;
|
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
|
|
|
int length;
|
|
|
|
|
|
|
|
length = remain;
|
|
|
|
if (offset + length > PAGE_SIZE)
|
|
|
|
length = PAGE_SIZE - offset;
|
|
|
|
|
|
|
|
ret = shmem_pread(page, offset, length, user_data,
|
|
|
|
page_to_phys(page) & obj_do_bit17_swizzling,
|
|
|
|
needs_clflush);
|
|
|
|
if (ret)
|
|
|
|
break;
|
|
|
|
|
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
i915_gem_obj_finish_shmem_access(obj);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline bool
|
|
|
|
gtt_user_read(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
{
|
|
|
|
void *vaddr;
|
2016-10-28 19:58:39 +07:00
|
|
|
unsigned long unwritten;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
|
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2016-10-28 19:58:39 +07:00
|
|
|
vaddr = (void __force *)io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_to_user_inatomic(user_data, vaddr + offset, length);
|
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
|
|
|
vaddr = (void __force *)
|
|
|
|
io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_to_user(user_data, vaddr + offset, length);
|
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
2016-10-28 19:58:39 +07:00
|
|
|
i915_gem_gtt_pread(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pread *args)
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
{
|
2016-10-28 19:58:39 +07:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
struct drm_mm_node node;
|
2016-10-28 19:58:39 +07:00
|
|
|
struct i915_vma *vma;
|
|
|
|
void __user *user_data;
|
|
|
|
u64 remain, offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
intel_runtime_pm_get(i915);
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
|
|
|
PIN_MAPPABLE | PIN_NONBLOCK);
|
2016-08-18 23:16:45 +07:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-18 23:17:00 +07:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-18 23:16:45 +07:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 16:49:06 +07:00
|
|
|
if (IS_ERR(vma)) {
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:39 +07:00
|
|
|
goto out_unlock;
|
|
|
|
GEM_BUG_ON(!node.allocated);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, false);
|
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = args->offset;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
|
|
|
|
while (remain > 0) {
|
|
|
|
/* Operation in this page
|
|
|
|
*
|
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
|
|
|
*/
|
|
|
|
u32 page_base = node.start;
|
|
|
|
unsigned page_offset = offset_in_page(offset);
|
|
|
|
unsigned page_length = PAGE_SIZE - page_offset;
|
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.insert_page(&ggtt->base,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
2016-10-28 19:58:39 +07:00
|
|
|
node.start, I915_CACHE_NONE, 0);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
wmb();
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2016-10-28 19:58:39 +07:00
|
|
|
|
|
|
|
if (gtt_user_read(&ggtt->mappable, page_base, page_offset,
|
|
|
|
user_data, page_length)) {
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
out_unpin:
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.clear_range(&ggtt->base,
|
2016-10-13 19:02:40 +07:00
|
|
|
node.start, node.size);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 16:49:06 +07:00
|
|
|
i915_vma_unpin(vma);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
}
|
2016-10-28 19:58:39 +07:00
|
|
|
out_unlock:
|
|
|
|
intel_runtime_pm_put(i915);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
2012-09-05 03:02:56 +07:00
|
|
|
|
2009-03-11 01:44:52 +07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
/**
|
|
|
|
* Reads data from the object referenced by handle.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2008-07-31 02:06:12 +07:00
|
|
|
*
|
|
|
|
* On error, the contents of *data are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pread_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pread *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 19:58:39 +07:00
|
|
|
int ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2010-11-17 16:10:42 +07:00
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!access_ok(VERIFY_WRITE,
|
2016-04-26 22:32:27 +07:00
|
|
|
u64_to_user_ptr(args->data_ptr),
|
2010-11-17 16:10:42 +07:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 16:14:16 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2010-09-27 02:21:44 +07:00
|
|
|
/* Bounds check source. */
|
2010-11-09 02:18:58 +07:00
|
|
|
if (args->offset > obj->base.size ||
|
|
|
|
args->size > obj->base.size - args->offset) {
|
2010-09-27 02:50:05 +07:00
|
|
|
ret = -EINVAL;
|
2016-10-28 19:58:39 +07:00
|
|
|
goto out;
|
2010-09-27 02:50:05 +07:00
|
|
|
}
|
|
|
|
|
2011-02-03 18:57:46 +07:00
|
|
|
trace_i915_gem_object_pread(obj, args->offset, args->size);
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
to_rps_client(file));
|
2016-08-05 16:14:16 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:39 +07:00
|
|
|
goto out;
|
2016-08-05 16:14:16 +07:00
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 16:14:16 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:39 +07:00
|
|
|
goto out;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = i915_gem_shmem_pread(obj, args);
|
2016-10-24 19:42:15 +07:00
|
|
|
if (ret == -EFAULT || ret == -ENODEV)
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = i915_gem_gtt_pread(obj, args);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
|
2016-10-28 19:58:39 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
out:
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2009-03-11 01:44:52 +07:00
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2008-10-31 09:38:48 +07:00
|
|
|
/* This is the fast write path which cannot handle
|
|
|
|
* page faults in the source data
|
2008-10-21 04:16:43 +07:00
|
|
|
*/
|
2008-10-31 09:38:48 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
static inline bool
|
|
|
|
ggtt_write(struct io_mapping *mapping,
|
|
|
|
loff_t base, int offset,
|
|
|
|
char __user *user_data, int length)
|
2008-10-21 04:16:43 +07:00
|
|
|
{
|
2012-04-17 04:07:47 +07:00
|
|
|
void *vaddr;
|
2008-10-31 09:38:48 +07:00
|
|
|
unsigned long unwritten;
|
2008-10-21 04:16:43 +07:00
|
|
|
|
2012-04-17 04:07:47 +07:00
|
|
|
/* We can use the cpu mem copy function because this is X86. */
|
2016-10-28 19:58:40 +07:00
|
|
|
vaddr = (void __force *)io_mapping_map_atomic_wc(mapping, base);
|
|
|
|
unwritten = __copy_from_user_inatomic_nocache(vaddr + offset,
|
2008-10-31 09:38:48 +07:00
|
|
|
user_data, length);
|
2016-10-28 19:58:40 +07:00
|
|
|
io_mapping_unmap_atomic(vaddr);
|
|
|
|
if (unwritten) {
|
|
|
|
vaddr = (void __force *)
|
|
|
|
io_mapping_map_wc(mapping, base, PAGE_SIZE);
|
|
|
|
unwritten = copy_from_user(vaddr + offset, user_data, length);
|
|
|
|
io_mapping_unmap(vaddr);
|
|
|
|
}
|
2016-10-28 19:58:39 +07:00
|
|
|
|
|
|
|
return unwritten;
|
|
|
|
}
|
|
|
|
|
2009-03-09 23:42:23 +07:00
|
|
|
/**
|
|
|
|
* This is the fast pwrite path, where we copy the data directly from the
|
|
|
|
* user into the GTT, uncached.
|
2016-10-28 19:58:40 +07:00
|
|
|
* @obj: i915 GEM object
|
2016-06-03 20:02:17 +07:00
|
|
|
* @args: pwrite arguments structure
|
2009-03-09 23:42:23 +07:00
|
|
|
*/
|
2008-07-31 02:06:12 +07:00
|
|
|
static int
|
2016-10-28 19:58:40 +07:00
|
|
|
i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2016-10-28 19:58:40 +07:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
2016-06-10 15:53:01 +07:00
|
|
|
struct i915_ggtt *ggtt = &i915->ggtt;
|
|
|
|
struct drm_mm_node node;
|
2016-10-28 19:58:40 +07:00
|
|
|
struct i915_vma *vma;
|
|
|
|
u64 remain, offset;
|
|
|
|
void __user *user_data;
|
2016-06-10 15:53:01 +07:00
|
|
|
int ret;
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2012-03-26 00:47:35 +07:00
|
|
|
|
2016-10-24 19:42:15 +07:00
|
|
|
intel_runtime_pm_get(i915);
|
2016-08-15 16:49:06 +07:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
|
2016-08-04 22:32:34 +07:00
|
|
|
PIN_MAPPABLE | PIN_NONBLOCK);
|
2016-08-18 23:16:45 +07:00
|
|
|
if (!IS_ERR(vma)) {
|
|
|
|
node.start = i915_ggtt_offset(vma);
|
|
|
|
node.allocated = false;
|
2016-08-18 23:17:00 +07:00
|
|
|
ret = i915_vma_put_fence(vma);
|
2016-08-18 23:16:45 +07:00
|
|
|
if (ret) {
|
|
|
|
i915_vma_unpin(vma);
|
|
|
|
vma = ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
}
|
2016-08-15 16:49:06 +07:00
|
|
|
if (IS_ERR(vma)) {
|
2016-10-28 19:58:39 +07:00
|
|
|
ret = insert_mappable_node(ggtt, &node, PAGE_SIZE);
|
2016-06-10 15:53:01 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:40 +07:00
|
|
|
goto out_unlock;
|
|
|
|
GEM_BUG_ON(!node.allocated);
|
2016-06-10 15:53:01 +07:00
|
|
|
}
|
2012-03-26 00:47:35 +07:00
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, true);
|
|
|
|
if (ret)
|
|
|
|
goto out_unpin;
|
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
2016-08-18 23:16:43 +07:00
|
|
|
intel_fb_obj_invalidate(obj, ORIGIN_CPU);
|
2015-02-14 02:23:45 +07:00
|
|
|
|
2016-06-10 15:53:01 +07:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
offset = args->offset;
|
|
|
|
remain = args->size;
|
|
|
|
while (remain) {
|
2008-07-31 02:06:12 +07:00
|
|
|
/* Operation in this page
|
|
|
|
*
|
2008-10-31 09:38:48 +07:00
|
|
|
* page_base = page offset within aperture
|
|
|
|
* page_offset = offset within page
|
|
|
|
* page_length = bytes to copy for this page
|
2008-07-31 02:06:12 +07:00
|
|
|
*/
|
2016-06-10 15:53:01 +07:00
|
|
|
u32 page_base = node.start;
|
2016-10-28 19:58:39 +07:00
|
|
|
unsigned int page_offset = offset_in_page(offset);
|
|
|
|
unsigned int page_length = PAGE_SIZE - page_offset;
|
2016-06-10 15:53:01 +07:00
|
|
|
page_length = remain < page_length ? remain : page_length;
|
|
|
|
if (node.allocated) {
|
|
|
|
wmb(); /* flush the write before we modify the GGTT */
|
|
|
|
ggtt->base.insert_page(&ggtt->base,
|
|
|
|
i915_gem_object_get_dma_address(obj, offset >> PAGE_SHIFT),
|
|
|
|
node.start, I915_CACHE_NONE, 0);
|
|
|
|
wmb(); /* flush modifications to the GGTT (insert_page) */
|
|
|
|
} else {
|
|
|
|
page_base += offset & PAGE_MASK;
|
|
|
|
}
|
2008-10-31 09:38:48 +07:00
|
|
|
/* If we get a fault while copying data, then (presumably) our
|
2009-03-09 23:42:23 +07:00
|
|
|
* source page isn't available. Return the error and we'll
|
|
|
|
* retry in the slow path.
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
* If the object is non-shmem backed, we retry again with the
|
|
|
|
* path that handles page fault.
|
2008-10-31 09:38:48 +07:00
|
|
|
*/
|
2016-10-28 19:58:40 +07:00
|
|
|
if (ggtt_write(&ggtt->mappable, page_base, page_offset,
|
|
|
|
user_data, page_length)) {
|
|
|
|
ret = -EFAULT;
|
|
|
|
break;
|
2012-03-26 00:47:35 +07:00
|
|
|
}
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2008-10-31 09:38:48 +07:00
|
|
|
remain -= page_length;
|
|
|
|
user_data += page_length;
|
|
|
|
offset += page_length;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
2016-08-18 23:16:43 +07:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2016-10-28 19:58:40 +07:00
|
|
|
|
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
2012-03-26 00:47:35 +07:00
|
|
|
out_unpin:
|
2016-06-10 15:53:01 +07:00
|
|
|
if (node.allocated) {
|
|
|
|
wmb();
|
|
|
|
ggtt->base.clear_range(&ggtt->base,
|
2016-10-13 19:02:40 +07:00
|
|
|
node.start, node.size);
|
2016-06-10 15:53:01 +07:00
|
|
|
remove_mappable_node(&node);
|
|
|
|
} else {
|
2016-08-15 16:49:06 +07:00
|
|
|
i915_vma_unpin(vma);
|
2016-06-10 15:53:01 +07:00
|
|
|
}
|
2016-10-28 19:58:40 +07:00
|
|
|
out_unlock:
|
2016-10-24 19:42:15 +07:00
|
|
|
intel_runtime_pm_put(i915);
|
2016-10-28 19:58:40 +07:00
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
2009-03-09 23:42:23 +07:00
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2008-10-03 02:24:47 +07:00
|
|
|
static int
|
2016-10-28 19:58:40 +07:00
|
|
|
shmem_pwrite_slow(struct page *page, int offset, int length,
|
2012-03-26 00:47:40 +07:00
|
|
|
char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2012-03-26 00:47:40 +07:00
|
|
|
char *vaddr;
|
|
|
|
int ret;
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2012-03-26 00:47:40 +07:00
|
|
|
vaddr = kmap(page);
|
2012-03-26 00:47:43 +07:00
|
|
|
if (unlikely(needs_clflush_before || page_do_bit17_swizzling))
|
2016-10-28 19:58:40 +07:00
|
|
|
shmem_clflush_swizzled_range(vaddr + offset, length,
|
2012-03-26 00:47:42 +07:00
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 00:47:40 +07:00
|
|
|
if (page_do_bit17_swizzling)
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = __copy_from_user_swizzled(vaddr, offset, user_data,
|
|
|
|
length);
|
2012-03-26 00:47:40 +07:00
|
|
|
else
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = __copy_from_user(vaddr + offset, user_data, length);
|
2012-03-26 00:47:40 +07:00
|
|
|
if (needs_clflush_after)
|
2016-10-28 19:58:40 +07:00
|
|
|
shmem_clflush_swizzled_range(vaddr + offset, length,
|
2012-03-26 00:47:42 +07:00
|
|
|
page_do_bit17_swizzling);
|
2012-03-26 00:47:40 +07:00
|
|
|
kunmap(page);
|
2009-03-10 03:42:30 +07:00
|
|
|
|
2012-09-05 03:02:55 +07:00
|
|
|
return ret ? -EFAULT : 0;
|
2009-03-10 03:42:30 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
/* Per-page copy function for the shmem pwrite fastpath.
|
|
|
|
* Flushes invalid cachelines before writing to the target if
|
|
|
|
* needs_clflush_before is set and flushes out any written cachelines after
|
|
|
|
* writing if needs_clflush is set.
|
|
|
|
*/
|
2009-03-10 03:42:30 +07:00
|
|
|
static int
|
2016-10-28 19:58:40 +07:00
|
|
|
shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
|
|
|
|
bool page_do_bit17_swizzling,
|
|
|
|
bool needs_clflush_before,
|
|
|
|
bool needs_clflush_after)
|
2009-03-10 03:42:30 +07:00
|
|
|
{
|
2016-10-28 19:58:40 +07:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = -ENODEV;
|
|
|
|
if (!page_do_bit17_swizzling) {
|
|
|
|
char *vaddr = kmap_atomic(page);
|
|
|
|
|
|
|
|
if (needs_clflush_before)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
|
|
|
ret = __copy_from_user_inatomic(vaddr + offset, user_data, len);
|
|
|
|
if (needs_clflush_after)
|
|
|
|
drm_clflush_virt_range(vaddr + offset, len);
|
|
|
|
|
|
|
|
kunmap_atomic(vaddr);
|
|
|
|
}
|
|
|
|
if (ret == 0)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
return shmem_pwrite_slow(page, offset, len, user_data,
|
|
|
|
page_do_bit17_swizzling,
|
|
|
|
needs_clflush_before,
|
|
|
|
needs_clflush_after);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_pwrite *args)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
void __user *user_data;
|
|
|
|
u64 remain;
|
|
|
|
unsigned int obj_do_bit17_swizzling;
|
|
|
|
unsigned int partial_cacheline_write;
|
2016-08-18 23:16:47 +07:00
|
|
|
unsigned int needs_clflush;
|
2016-10-28 19:58:40 +07:00
|
|
|
unsigned int offset, idx;
|
|
|
|
int ret;
|
2009-03-10 03:42:30 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = mutex_lock_interruptible(&i915->drm.struct_mutex);
|
2012-09-05 03:02:55 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = i915_gem_obj_prepare_shmem_write(obj, &needs_clflush);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
obj_do_bit17_swizzling = 0;
|
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
|
|
|
obj_do_bit17_swizzling = BIT(17);
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
/* If we don't overwrite a cacheline completely we need to be
|
|
|
|
* careful to have up-to-date data by first clflushing. Don't
|
|
|
|
* overcomplicate things and flush the entire patch.
|
|
|
|
*/
|
|
|
|
partial_cacheline_write = 0;
|
|
|
|
if (needs_clflush & CLFLUSH_BEFORE)
|
|
|
|
partial_cacheline_write = boot_cpu_data.x86_clflush_size - 1;
|
2012-06-01 21:20:22 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
user_data = u64_to_user_ptr(args->data_ptr);
|
|
|
|
remain = args->size;
|
|
|
|
offset = offset_in_page(args->offset);
|
|
|
|
for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
|
|
|
|
struct page *page = i915_gem_object_get_page(obj, idx);
|
|
|
|
int length;
|
2009-03-10 03:42:30 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
length = remain;
|
|
|
|
if (offset + length > PAGE_SIZE)
|
|
|
|
length = PAGE_SIZE - offset;
|
2012-09-05 03:02:55 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = shmem_pwrite(page, offset, length, user_data,
|
|
|
|
page_to_phys(page) & obj_do_bit17_swizzling,
|
|
|
|
(offset | length) & partial_cacheline_write,
|
|
|
|
needs_clflush & CLFLUSH_AFTER);
|
2012-09-05 03:02:55 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:40 +07:00
|
|
|
break;
|
2012-09-05 03:02:55 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
remain -= length;
|
|
|
|
user_data += length;
|
|
|
|
offset = 0;
|
drm/i915: rewrite shmem_pwrite_slow to use copy_from_user
... instead of get_user_pages, because that fails on non page-backed
user addresses like e.g. a gtt mapping of a bo.
To get there essentially copy the vfs read path into pagecache. We
can't call that right away because we have to take care of bit17
swizzling. To not deadlock with our own pagefault handler we need
to completely drop struct_mutex, reducing the atomicty-guarantees
of our userspace abi. Implications for racing with other gem ioctl:
- execbuf, pwrite, pread: Due to -EFAULT fallback to slow paths there's
already the risk of the pwrite call not being atomic, no degration.
- read/write access to mmaps: already fully racy, no degration.
- set_tiling: Calling set_tiling while reading/writing is already
pretty much undefined, now it just got a bit worse. set_tiling is
only called by libdrm on unused/new bos, so no problem.
- set_domain: When changing to the gtt domain while copying (without any
read/write access, e.g. for synchronization), we might leave unflushed
data in the cpu caches. The clflush_object at the end of pwrite_slow
takes care of this problem.
- truncating of purgeable objects: the shmem_read_mapping_page call could
reinstate backing storage for truncated objects. The check at the end
of pwrite_slow takes care of this.
v2:
- add missing intel_gtt_chipset_flush
- add __ to copy_from_user_swizzled as suggest by Chris Wilson.
v3: Fixup bit17 swizzling, it swizzled the wrong pages.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2011-12-14 19:57:31 +07:00
|
|
|
}
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2015-07-08 06:28:51 +07:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
2016-10-28 19:58:40 +07:00
|
|
|
i915_gem_obj_finish_shmem_access(obj);
|
2009-03-10 03:42:30 +07:00
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Writes data to the object referenced by handle.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 02:06:12 +07:00
|
|
|
*
|
|
|
|
* On error, the contents of the buffer that were to be modified are undefined.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
|
2010-10-14 21:03:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_pwrite *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2010-11-17 16:10:42 +07:00
|
|
|
int ret;
|
|
|
|
|
|
|
|
if (args->size == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
if (!access_ok(VERIFY_READ,
|
2016-04-26 22:32:27 +07:00
|
|
|
u64_to_user_ptr(args->data_ptr),
|
2010-11-17 16:10:42 +07:00
|
|
|
args->size))
|
|
|
|
return -EFAULT;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 16:14:16 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2010-09-27 02:21:44 +07:00
|
|
|
/* Bounds check destination. */
|
2010-11-09 02:18:58 +07:00
|
|
|
if (args->offset > obj->base.size ||
|
|
|
|
args->size > obj->base.size - args->offset) {
|
2010-09-27 02:50:05 +07:00
|
|
|
ret = -EINVAL;
|
2016-08-05 16:14:16 +07:00
|
|
|
goto err;
|
2010-09-27 02:50:05 +07:00
|
|
|
}
|
|
|
|
|
2011-02-03 18:57:46 +07:00
|
|
|
trace_i915_gem_object_pwrite(obj, args->offset, args->size);
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_ALL,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
to_rps_client(file));
|
2016-08-05 16:14:16 +07:00
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2016-08-05 16:14:16 +07:00
|
|
|
if (ret)
|
2016-10-28 19:58:40 +07:00
|
|
|
goto err;
|
2016-08-05 16:14:16 +07:00
|
|
|
|
2012-03-26 00:47:35 +07:00
|
|
|
ret = -EFAULT;
|
2008-07-31 02:06:12 +07:00
|
|
|
/* We can only do the GTT pwrite on untiled buffers, as otherwise
|
|
|
|
* it would end up going through the fenced access, and we'll get
|
|
|
|
* different detiling behavior between reading and writing.
|
|
|
|
* pread/pwrite currently are reading and writing from the CPU
|
|
|
|
* perspective, requiring manual detiling by the client.
|
|
|
|
*/
|
2016-06-20 21:05:52 +07:00
|
|
|
if (!i915_gem_object_has_struct_page(obj) ||
|
2016-10-24 19:42:15 +07:00
|
|
|
cpu_write_needs_clflush(obj))
|
2012-03-26 00:47:35 +07:00
|
|
|
/* Note that the gtt paths might fail with non-page-backed user
|
|
|
|
* pointers (e.g. gtt mappings when moving data between
|
2016-10-24 19:42:15 +07:00
|
|
|
* textures). Fallback to the shmem path in that case.
|
|
|
|
*/
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = i915_gem_gtt_pwrite_fast(obj, args);
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-07-17 00:42:36 +07:00
|
|
|
if (ret == -EFAULT || ret == -ENOSPC) {
|
2014-11-04 19:51:40 +07:00
|
|
|
if (obj->phys_handle)
|
|
|
|
ret = i915_gem_phys_pwrite(obj, args, file);
|
drm/i915: Support for pread/pwrite from/to non shmem backed objects
This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface. This will cover objects backed by stolen memory as well
as other non-shmem backed objects.
v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)
v3: Rebased to the latest drm-intel-nightly (Ankit)
v4: Moved page base & offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)
v5: Enabled pread/pwrite for all non-shmem backed objects including
without tiling restrictions (Ankit)
v6: Using pwrite_fast for non-shmem backed objects as well (Chris)
v7: Updated commit message, Renamed i915_gem_gtt_read to i915_gem_gtt_copy,
added pwrite slow path for non-shmem backed objects (Chris/Tvrtko)
v8: Updated v7 commit message, mutex unlock around pwrite slow path for
non-shmem backed objects (Tvrtko)
v9: Corrected check during pread_ioctl, to avoid shmem_pread being
called for non-shmem backed objects (Tvrtko)
v10: Moved the write_domain check to needs_clflush and tiling mode check
to pwrite_fast (Chris)
v11: Use pwrite_fast fallback for all objects (shmem and non-shmem backed),
call fast_user_write regardless of pagefault in previous iteration
v12: Use page-by-page copy for slow user access too (Chris)
v13: Handled EFAULT, Avoid use of WARN_ON, put_fence only if whole obj
pinned (Chris)
v14: Corrected datatypes/initializations (Tvrtko)
Testcase: igt/gem_stolen, igt/gem_pread, igt/gem_pwrite
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1465548783-19712-1-git-send-email-ankitprasad.r.sharma@intel.com
2016-06-10 15:53:03 +07:00
|
|
|
else
|
2016-10-28 19:58:40 +07:00
|
|
|
ret = i915_gem_shmem_pwrite(obj, args);
|
2014-11-04 19:51:40 +07:00
|
|
|
}
|
2011-12-14 19:57:30 +07:00
|
|
|
|
2016-10-28 19:58:40 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-08-05 16:14:16 +07:00
|
|
|
err:
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2016-08-05 16:14:16 +07:00
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-08-18 23:16:44 +07:00
|
|
|
static inline enum fb_op_origin
|
2016-06-18 00:46:39 +07:00
|
|
|
write_origin(struct drm_i915_gem_object *obj, unsigned domain)
|
|
|
|
{
|
2016-08-18 23:17:04 +07:00
|
|
|
return (domain == I915_GEM_DOMAIN_GTT ?
|
|
|
|
obj->frontbuffer_ggtt_origin : ORIGIN_CPU);
|
2016-06-18 00:46:39 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:41 +07:00
|
|
|
static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915;
|
|
|
|
struct list_head *list;
|
|
|
|
struct i915_vma *vma;
|
|
|
|
|
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
|
|
|
if (!i915_vma_is_ggtt(vma))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (i915_vma_is_active(vma))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
|
|
|
|
}
|
|
|
|
|
|
|
|
i915 = to_i915(obj->base.dev);
|
|
|
|
list = obj->bind_count ? &i915->mm.bound_list : &i915->mm.unbound_list;
|
2016-11-02 17:16:04 +07:00
|
|
|
list_move_tail(&obj->global_link, list);
|
2016-10-28 19:58:41 +07:00
|
|
|
}
|
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
/**
|
2008-11-11 01:53:25 +07:00
|
|
|
* Called when user space prepares to use an object with the CPU, either
|
|
|
|
* through the mmap ioctl's mapping or a GTT mapping.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 02:06:12 +07:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_set_domain *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-11-11 01:53:25 +07:00
|
|
|
uint32_t read_domains = args->read_domains;
|
|
|
|
uint32_t write_domain = args->write_domain;
|
2016-10-28 19:58:41 +07:00
|
|
|
int err;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2008-11-11 01:53:25 +07:00
|
|
|
/* Only handle setting domains to types used by the CPU. */
|
2016-08-05 16:14:07 +07:00
|
|
|
if ((write_domain | read_domains) & I915_GEM_GPU_DOMAINS)
|
2008-11-11 01:53:25 +07:00
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
/* Having something in the write domain implies it's in the read
|
|
|
|
* domain, and only that read domain. Enforce that in the request.
|
|
|
|
*/
|
|
|
|
if (write_domain != 0 && read_domains != write_domain)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 16:14:07 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2012-08-24 15:35:09 +07:00
|
|
|
/* Try to flush the object off the GPU without holding the lock.
|
|
|
|
* We will repeat the flush holding the lock in the normal manner
|
|
|
|
* to catch cases where we are gazumped.
|
|
|
|
*/
|
2016-10-28 19:58:41 +07:00
|
|
|
err = i915_gem_object_wait(obj,
|
2016-10-28 19:58:27 +07:00
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
(write_domain ? I915_WAIT_ALL : 0),
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
to_rps_client(file));
|
2016-10-28 19:58:41 +07:00
|
|
|
if (err)
|
2016-10-28 19:58:43 +07:00
|
|
|
goto out;
|
2016-08-05 16:14:07 +07:00
|
|
|
|
2016-10-28 19:58:41 +07:00
|
|
|
/* Flush and acquire obj->pages so that we are coherent through
|
|
|
|
* direct access in memory with previous cached writes through
|
|
|
|
* shmemfs and that our cache domain tracking remains valid.
|
|
|
|
* For example, if the obj->filp was moved to swap without us
|
|
|
|
* being notified and releasing the pages, we would mistakenly
|
|
|
|
* continue to assume that the obj remained out of the CPU cached
|
|
|
|
* domain.
|
|
|
|
*/
|
|
|
|
err = i915_gem_object_pin_pages(obj);
|
|
|
|
if (err)
|
2016-10-28 19:58:43 +07:00
|
|
|
goto out;
|
2016-10-28 19:58:41 +07:00
|
|
|
|
|
|
|
err = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (err)
|
2016-10-28 19:58:43 +07:00
|
|
|
goto out_unpin;
|
2012-08-24 15:35:09 +07:00
|
|
|
|
2015-01-02 17:59:29 +07:00
|
|
|
if (read_domains & I915_GEM_DOMAIN_GTT)
|
2016-10-28 19:58:41 +07:00
|
|
|
err = i915_gem_object_set_to_gtt_domain(obj, write_domain != 0);
|
2015-01-02 17:59:29 +07:00
|
|
|
else
|
2016-10-28 19:58:41 +07:00
|
|
|
err = i915_gem_object_set_to_cpu_domain(obj, write_domain != 0);
|
2008-11-11 01:53:25 +07:00
|
|
|
|
2016-10-28 19:58:41 +07:00
|
|
|
/* And bump the LRU for this access */
|
|
|
|
i915_gem_object_bump_inactive_ggtt(obj);
|
2015-06-27 00:35:16 +07:00
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2016-08-05 16:14:07 +07:00
|
|
|
|
2016-10-28 19:58:41 +07:00
|
|
|
if (write_domain != 0)
|
|
|
|
intel_fb_obj_invalidate(obj, write_origin(obj, write_domain));
|
|
|
|
|
2016-10-28 19:58:43 +07:00
|
|
|
out_unpin:
|
2016-10-28 19:58:41 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-10-28 19:58:43 +07:00
|
|
|
out:
|
|
|
|
i915_gem_object_put(obj);
|
2016-10-28 19:58:41 +07:00
|
|
|
return err;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
|
|
|
* Called when user space has done writes to this buffer
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 02:06:12 +07:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_sw_finish *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-08-05 16:14:19 +07:00
|
|
|
int err = 0;
|
2010-10-17 15:45:41 +07:00
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
2016-08-05 16:14:19 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
|
|
|
/* Pinned buffers may be scanout, so flush the cache */
|
2016-08-05 16:14:19 +07:00
|
|
|
if (READ_ONCE(obj->pin_display)) {
|
|
|
|
err = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (!err) {
|
|
|
|
i915_gem_object_flush_cpu_write_domain(obj);
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
}
|
|
|
|
}
|
2008-11-15 04:35:19 +07:00
|
|
|
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2016-08-05 16:14:19 +07:00
|
|
|
return err;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/**
|
2016-06-03 20:02:17 +07:00
|
|
|
* i915_gem_mmap_ioctl - Maps the contents of an object, returning the address
|
|
|
|
* it is mapped to.
|
|
|
|
* @dev: drm device
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file
|
2008-07-31 02:06:12 +07:00
|
|
|
*
|
|
|
|
* While the mapping holds a reference on the contents of the object, it doesn't
|
|
|
|
* imply a ref on the object itself.
|
2014-10-16 17:28:18 +07:00
|
|
|
*
|
|
|
|
* IMPORTANT:
|
|
|
|
*
|
|
|
|
* DRM driver writers who look a this function as an example for how to do GEM
|
|
|
|
* mmap support, please don't implement mmap support like here. The modern way
|
|
|
|
* to implement DRM mmap support is with an mmap offset ioctl (like
|
|
|
|
* i915_gem_mmap_gtt) and then using the mmap syscall on the DRM fd directly.
|
|
|
|
* That way debug tooling like valgrind will understand what's going on, hiding
|
|
|
|
* the mmap call in a driver private ioctl will break that. The i915 driver only
|
|
|
|
* does cpu mmaps this way because we didn't know better.
|
2008-07-31 02:06:12 +07:00
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_mmap *args = data;
|
2016-07-20 19:31:51 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-07-31 02:06:12 +07:00
|
|
|
unsigned long addr;
|
|
|
|
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 17:59:30 +07:00
|
|
|
if (args->flags & ~(I915_MMAP_WC))
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-03-29 22:42:01 +07:00
|
|
|
if (args->flags & I915_MMAP_WC && !boot_cpu_has(X86_FEATURE_PAT))
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 17:59:30 +07:00
|
|
|
return -ENODEV;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
|
|
|
if (!obj)
|
2010-08-04 20:19:46 +07:00
|
|
|
return -ENOENT;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2012-05-10 20:25:09 +07:00
|
|
|
/* prime objects have no backing filp to GEM mmap
|
|
|
|
* pages from.
|
|
|
|
*/
|
2016-07-20 19:31:51 +07:00
|
|
|
if (!obj->base.filp) {
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2012-05-10 20:25:09 +07:00
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
addr = vm_mmap(obj->base.filp, 0, args->size,
|
2008-07-31 02:06:12 +07:00
|
|
|
PROT_READ | PROT_WRITE, MAP_SHARED,
|
|
|
|
args->offset);
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 17:59:30 +07:00
|
|
|
if (args->flags & I915_MMAP_WC) {
|
|
|
|
struct mm_struct *mm = current->mm;
|
|
|
|
struct vm_area_struct *vma;
|
|
|
|
|
2016-05-24 06:26:11 +07:00
|
|
|
if (down_write_killable(&mm->mmap_sem)) {
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2016-05-24 06:26:11 +07:00
|
|
|
return -EINTR;
|
|
|
|
}
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 17:59:30 +07:00
|
|
|
vma = find_vma(mm, addr);
|
|
|
|
if (vma)
|
|
|
|
vma->vm_page_prot =
|
|
|
|
pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
|
|
|
|
else
|
|
|
|
addr = -ENOMEM;
|
|
|
|
up_write(&mm->mmap_sem);
|
2016-06-18 00:46:39 +07:00
|
|
|
|
|
|
|
/* This may race, but that's ok, it only gets set */
|
2016-08-18 23:17:04 +07:00
|
|
|
WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU);
|
drm/i915: Support creation of unbound wc user mappings for objects
This patch provides support to create write-combining virtual mappings of
GEM object. It intends to provide the same funtionality of 'mmap_gtt'
interface without the constraints and contention of a limited aperture
space, but requires clients handles the linear to tile conversion on their
own. This is for improving the CPU write operation performance, as with such
mapping, writes and reads are almost 50% faster than with mmap_gtt. Similar
to the GTT mmapping, unlike the regular CPU mmapping, it avoids the cache
flush after update from CPU side, when object is passed onto GPU. This
type of mapping is specially useful in case of sub-region update,
i.e. when only a portion of the object is to be updated. Using a CPU mmap
in such cases would normally incur a clflush of the whole object, and
using a GTT mmapping would likely require eviction of an active object or
fence and thus stall. The write-combining CPU mmap avoids both.
To ensure the cache coherency, before using this mapping, the GTT domain
has been reused here. This provides the required cache flush if the object
is in CPU domain or synchronization against the concurrent rendering.
Although the access through an uncached mmap should automatically
invalidate the cache lines, this may not be true for non-temporal write
instructions and also not all pages of the object may be updated at any
given point of time through this mapping. Having a call to get_pages in
set_to_gtt_domain function, as added in the earlier patch 'drm/i915:
Broaden application of set-domain(GTT)', would guarantee the clflush and
so there will be no cachelines holding the data for the object before it
is accessed through this map.
The drm_i915_gem_mmap structure (for the DRM_I915_GEM_MMAP_IOCTL) has been
extended with a new flags field (defaulting to 0 for existent users). In
order for userspace to detect the extended ioctl, a new parameter
I915_PARAM_MMAP_VERSION has been added for versioning the ioctl interface.
v2: Fix error handling, invalid flag detection, renaming (ickle)
v3: Rebase to latest drm-intel-nightly codebase
The new mmapping is exercised by igt/gem_mmap_wc,
igt/gem_concurrent_blit and igt/gem_gtt_speed.
Change-Id: Ie883942f9e689525f72fe9a8d3780c3a9faa769a
Signed-off-by: Akash Goel <akash.goel@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2015-01-02 17:59:30 +07:00
|
|
|
}
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2008-07-31 02:06:12 +07:00
|
|
|
if (IS_ERR((void *)addr))
|
|
|
|
return addr;
|
|
|
|
|
|
|
|
args->addr_ptr = (uint64_t) addr;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-08-18 23:17:01 +07:00
|
|
|
static unsigned int tile_row_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
u64 size;
|
|
|
|
|
|
|
|
size = i915_gem_object_get_stride(obj);
|
|
|
|
size *= i915_gem_object_get_tiling(obj) == I915_TILING_Y ? 32 : 8;
|
|
|
|
|
|
|
|
return size >> PAGE_SHIFT;
|
|
|
|
}
|
|
|
|
|
2016-08-26 01:05:19 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_mmap_gtt_version - report the current feature set for GTT mmaps
|
|
|
|
*
|
|
|
|
* A history of the GTT mmap interface:
|
|
|
|
*
|
|
|
|
* 0 - Everything had to fit into the GTT. Both parties of a memcpy had to
|
|
|
|
* aligned and suitable for fencing, and still fit into the available
|
|
|
|
* mappable space left by the pinned display objects. A classic problem
|
|
|
|
* we called the page-fault-of-doom where we would ping-pong between
|
|
|
|
* two objects that could not fit inside the GTT and so the memcpy
|
|
|
|
* would page one object in at the expense of the other between every
|
|
|
|
* single byte.
|
|
|
|
*
|
|
|
|
* 1 - Objects can be any size, and have any compatible fencing (X Y, or none
|
|
|
|
* as set via i915_gem_set_tiling() [DRM_I915_GEM_SET_TILING]). If the
|
|
|
|
* object is too large for the available space (or simply too large
|
|
|
|
* for the mappable aperture!), a view is created instead and faulted
|
|
|
|
* into userspace. (This view is aligned and sized appropriately for
|
|
|
|
* fenced access.)
|
|
|
|
*
|
|
|
|
* Restrictions:
|
|
|
|
*
|
|
|
|
* * snoopable objects cannot be accessed via the GTT. It can cause machine
|
|
|
|
* hangs on some architectures, corruption on others. An attempt to service
|
|
|
|
* a GTT page fault from a snoopable object will generate a SIGBUS.
|
|
|
|
*
|
|
|
|
* * the object must be able to fit into RAM (physical memory, though no
|
|
|
|
* limited to the mappable aperture).
|
|
|
|
*
|
|
|
|
*
|
|
|
|
* Caveats:
|
|
|
|
*
|
|
|
|
* * a new GTT page fault will synchronize rendering from the GPU and flush
|
|
|
|
* all data to system memory. Subsequent access will not be synchronized.
|
|
|
|
*
|
|
|
|
* * all mappings are revoked on runtime device suspend.
|
|
|
|
*
|
|
|
|
* * there are only 8, 16 or 32 fence registers to share between all users
|
|
|
|
* (older machines require fence register for display and blitter access
|
|
|
|
* as well). Contention of the fence registers will cause the previous users
|
|
|
|
* to be unmapped and any new access will generate new page faults.
|
|
|
|
*
|
|
|
|
* * running out of memory while servicing a fault may generate a SIGBUS,
|
|
|
|
* rather than the expected SIGSEGV.
|
|
|
|
*/
|
|
|
|
int i915_gem_mmap_gtt_version(void)
|
|
|
|
{
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
2008-11-13 01:03:55 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_fault - fault a page into the GTT
|
2016-08-15 16:49:06 +07:00
|
|
|
* @area: CPU VMA in question
|
2015-09-15 19:58:44 +07:00
|
|
|
* @vmf: fault info
|
2008-11-13 01:03:55 +07:00
|
|
|
*
|
|
|
|
* The fault handler is set up by drm_gem_mmap() when a object is GTT mapped
|
|
|
|
* from userspace. The fault handler takes care of binding the object to
|
|
|
|
* the GTT (if needed), allocating and programming a fence register (again,
|
|
|
|
* only if needed based on whether the old reg is still valid or the object
|
|
|
|
* is tiled) and inserting a new PTE into the faulting process.
|
|
|
|
*
|
|
|
|
* Note that the faulting process may involve evicting existing objects
|
|
|
|
* from the GTT and/or fence registers to make room. So performance may
|
|
|
|
* suffer if the GTT working set is large or there are few fence registers
|
|
|
|
* left.
|
2016-08-26 01:05:19 +07:00
|
|
|
*
|
|
|
|
* The current feature set supported by i915_gem_fault() and thus GTT mmaps
|
|
|
|
* is exposed via I915_PARAM_MMAP_GTT_VERSION (see i915_gem_mmap_gtt_version).
|
2008-11-13 01:03:55 +07:00
|
|
|
*/
|
2016-08-15 16:49:06 +07:00
|
|
|
int i915_gem_fault(struct vm_area_struct *area, struct vm_fault *vmf)
|
2008-11-13 01:03:55 +07:00
|
|
|
{
|
2016-08-18 23:17:01 +07:00
|
|
|
#define MIN_CHUNK_PAGES ((1 << 20) >> PAGE_SHIFT) /* 1 MiB */
|
2016-08-15 16:49:06 +07:00
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(area->vm_private_data);
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_device *dev = obj->base.dev;
|
2016-03-30 20:57:10 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
|
|
|
struct i915_ggtt *ggtt = &dev_priv->ggtt;
|
2016-08-05 16:14:07 +07:00
|
|
|
bool write = !!(vmf->flags & FAULT_FLAG_WRITE);
|
2016-08-15 16:49:06 +07:00
|
|
|
struct i915_vma *vma;
|
2008-11-13 01:03:55 +07:00
|
|
|
pgoff_t page_offset;
|
2016-08-18 23:17:05 +07:00
|
|
|
unsigned int flags;
|
2016-08-05 16:14:07 +07:00
|
|
|
int ret;
|
2013-11-28 03:20:34 +07:00
|
|
|
|
2008-11-13 01:03:55 +07:00
|
|
|
/* We don't use vmf->pgoff since that has the fake offset */
|
2016-08-15 16:49:06 +07:00
|
|
|
page_offset = ((unsigned long)vmf->virtual_address - area->vm_start) >>
|
2008-11-13 01:03:55 +07:00
|
|
|
PAGE_SHIFT;
|
|
|
|
|
2011-02-03 18:57:46 +07:00
|
|
|
trace_i915_gem_object_fault(obj, page_offset, true, write);
|
|
|
|
|
2014-02-08 03:37:06 +07:00
|
|
|
/* Try to flush the object off the GPU first without holding the lock.
|
2016-08-05 16:14:07 +07:00
|
|
|
* Upon acquiring the lock, we will perform our sanity checks and then
|
2014-02-08 03:37:06 +07:00
|
|
|
* repeat the flush holding the lock in the normal manner to catch cases
|
|
|
|
* where we are gazumped.
|
|
|
|
*/
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2014-02-08 03:37:06 +07:00
|
|
|
if (ret)
|
2016-08-05 16:14:07 +07:00
|
|
|
goto err;
|
|
|
|
|
2016-10-28 19:58:41 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-08-05 16:14:07 +07:00
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
|
|
|
goto err_rpm;
|
2014-02-08 03:37:06 +07:00
|
|
|
|
2012-12-16 19:43:36 +07:00
|
|
|
/* Access to snoopable pages through the GTT is incoherent. */
|
2016-11-04 21:42:44 +07:00
|
|
|
if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(dev_priv)) {
|
2014-05-28 22:16:41 +07:00
|
|
|
ret = -EFAULT;
|
2016-08-05 16:14:07 +07:00
|
|
|
goto err_unlock;
|
2012-12-16 19:43:36 +07:00
|
|
|
}
|
|
|
|
|
2016-08-18 23:17:05 +07:00
|
|
|
/* If the object is smaller than a couple of partial vma, it is
|
|
|
|
* not worth only creating a single partial vma - we may as well
|
|
|
|
* clear enough space for the full object.
|
|
|
|
*/
|
|
|
|
flags = PIN_MAPPABLE;
|
|
|
|
if (obj->base.size > 2 * MIN_CHUNK_PAGES << PAGE_SHIFT)
|
|
|
|
flags |= PIN_NONBLOCK | PIN_NONFAULT;
|
|
|
|
|
2016-08-18 23:17:02 +07:00
|
|
|
/* Now pin it into the GTT as needed */
|
2016-08-18 23:17:05 +07:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, flags);
|
2016-08-18 23:17:02 +07:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
struct i915_ggtt_view view;
|
2016-08-18 23:17:01 +07:00
|
|
|
unsigned int chunk_size;
|
|
|
|
|
2016-08-18 23:17:02 +07:00
|
|
|
/* Use a partial view if it is bigger than available space */
|
2016-08-18 23:17:01 +07:00
|
|
|
chunk_size = MIN_CHUNK_PAGES;
|
|
|
|
if (i915_gem_object_is_tiled(obj))
|
2016-11-07 17:54:43 +07:00
|
|
|
chunk_size = roundup(chunk_size, tile_row_pages(obj));
|
2015-05-08 18:37:39 +07:00
|
|
|
|
2015-05-06 18:36:09 +07:00
|
|
|
memset(&view, 0, sizeof(view));
|
|
|
|
view.type = I915_GGTT_VIEW_PARTIAL;
|
|
|
|
view.params.partial.offset = rounddown(page_offset, chunk_size);
|
|
|
|
view.params.partial.size =
|
2016-08-18 23:17:02 +07:00
|
|
|
min_t(unsigned int, chunk_size,
|
2016-10-11 16:06:56 +07:00
|
|
|
vma_pages(area) - view.params.partial.offset);
|
2015-05-06 18:36:09 +07:00
|
|
|
|
2016-08-18 23:17:03 +07:00
|
|
|
/* If the partial covers the entire object, just create a
|
|
|
|
* normal VMA.
|
|
|
|
*/
|
|
|
|
if (chunk_size >= obj->base.size >> PAGE_SHIFT)
|
|
|
|
view.type = I915_GGTT_VIEW_NORMAL;
|
|
|
|
|
2016-08-18 23:17:04 +07:00
|
|
|
/* Userspace is now writing through an untracked VMA, abandon
|
|
|
|
* all hope that the hardware is able to track future writes.
|
|
|
|
*/
|
|
|
|
obj->frontbuffer_ggtt_origin = ORIGIN_CPU;
|
|
|
|
|
2016-08-18 23:17:02 +07:00
|
|
|
vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, PIN_MAPPABLE);
|
|
|
|
}
|
2016-08-15 16:49:06 +07:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
ret = PTR_ERR(vma);
|
2016-08-05 16:14:07 +07:00
|
|
|
goto err_unlock;
|
2016-08-15 16:49:06 +07:00
|
|
|
}
|
2010-10-28 20:44:08 +07:00
|
|
|
|
2012-11-20 17:45:17 +07:00
|
|
|
ret = i915_gem_object_set_to_gtt_domain(obj, write);
|
|
|
|
if (ret)
|
2016-08-05 16:14:07 +07:00
|
|
|
goto err_unpin;
|
2012-02-16 05:50:22 +07:00
|
|
|
|
2016-08-18 23:17:00 +07:00
|
|
|
ret = i915_vma_get_fence(vma);
|
2010-11-10 23:40:20 +07:00
|
|
|
if (ret)
|
2016-08-05 16:14:07 +07:00
|
|
|
goto err_unpin;
|
2010-08-08 03:45:03 +07:00
|
|
|
|
2016-10-24 19:42:14 +07:00
|
|
|
/* Mark as being mmapped into userspace for later revocation */
|
2016-10-24 19:42:15 +07:00
|
|
|
assert_rpm_wakelock_held(dev_priv);
|
2016-10-24 19:42:14 +07:00
|
|
|
if (list_empty(&obj->userfault_link))
|
|
|
|
list_add(&obj->userfault_link, &dev_priv->mm.userfault_list);
|
|
|
|
|
2014-06-10 18:14:40 +07:00
|
|
|
/* Finally, remap it using the new GTT offset */
|
2016-08-19 22:54:28 +07:00
|
|
|
ret = remap_io_mapping(area,
|
|
|
|
area->vm_start + (vma->ggtt_view.params.partial.offset << PAGE_SHIFT),
|
|
|
|
(ggtt->mappable_base + vma->node.start) >> PAGE_SHIFT,
|
|
|
|
min_t(u64, vma->size, area->vm_end - area->vm_start),
|
|
|
|
&ggtt->mappable);
|
2016-08-18 23:17:02 +07:00
|
|
|
|
2016-08-05 16:14:07 +07:00
|
|
|
err_unpin:
|
2016-08-15 16:49:06 +07:00
|
|
|
__i915_vma_unpin(vma);
|
2016-08-05 16:14:07 +07:00
|
|
|
err_unlock:
|
2008-11-13 01:03:55 +07:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2016-08-05 16:14:07 +07:00
|
|
|
err_rpm:
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
2016-10-28 19:58:41 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2016-08-05 16:14:07 +07:00
|
|
|
err:
|
2008-11-13 01:03:55 +07:00
|
|
|
switch (ret) {
|
2011-02-07 20:09:31 +07:00
|
|
|
case -EIO:
|
2014-09-04 14:36:18 +07:00
|
|
|
/*
|
|
|
|
* We eat errors when the gpu is terminally wedged to avoid
|
|
|
|
* userspace unduly crashing (gl has no provisions for mmaps to
|
|
|
|
* fail). But any other -EIO isn't ours (e.g. swap in failure)
|
|
|
|
* and so needs to be reported.
|
|
|
|
*/
|
|
|
|
if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
|
2013-11-28 03:20:34 +07:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
|
|
|
}
|
2010-11-07 16:18:22 +07:00
|
|
|
case -EAGAIN:
|
2013-09-12 22:57:28 +07:00
|
|
|
/*
|
|
|
|
* EAGAIN means the gpu is hung and we'll wait for the error
|
|
|
|
* handler to reset everything when re-faulting in
|
|
|
|
* i915_mutex_lock_interruptible.
|
2011-02-07 20:09:31 +07:00
|
|
|
*/
|
2009-09-23 06:43:56 +07:00
|
|
|
case 0:
|
|
|
|
case -ERESTARTSYS:
|
2011-02-12 03:31:19 +07:00
|
|
|
case -EINTR:
|
2012-10-03 21:15:26 +07:00
|
|
|
case -EBUSY:
|
|
|
|
/*
|
|
|
|
* EBUSY is ok: this just means that another thread
|
|
|
|
* already did the job.
|
|
|
|
*/
|
2013-11-28 03:20:34 +07:00
|
|
|
ret = VM_FAULT_NOPAGE;
|
|
|
|
break;
|
2008-11-13 01:03:55 +07:00
|
|
|
case -ENOMEM:
|
2013-11-28 03:20:34 +07:00
|
|
|
ret = VM_FAULT_OOM;
|
|
|
|
break;
|
2012-10-17 16:17:16 +07:00
|
|
|
case -ENOSPC:
|
2014-01-31 18:34:57 +07:00
|
|
|
case -EFAULT:
|
2013-11-28 03:20:34 +07:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
2008-11-13 01:03:55 +07:00
|
|
|
default:
|
2012-10-17 16:17:16 +07:00
|
|
|
WARN_ONCE(ret, "unhandled error in i915_gem_fault: %i\n", ret);
|
2013-11-28 03:20:34 +07:00
|
|
|
ret = VM_FAULT_SIGBUS;
|
|
|
|
break;
|
2008-11-13 01:03:55 +07:00
|
|
|
}
|
2013-11-28 03:20:34 +07:00
|
|
|
return ret;
|
2008-11-13 01:03:55 +07:00
|
|
|
}
|
|
|
|
|
2009-07-10 14:18:50 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_release_mmap - remove physical page mappings
|
|
|
|
* @obj: obj in question
|
|
|
|
*
|
tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.
Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-11-14 22:09:05 +07:00
|
|
|
* Preserve the reservation of the mmapping with the DRM core code, but
|
2009-07-10 14:18:50 +07:00
|
|
|
* relinquish ownership of the pages back to the system.
|
|
|
|
*
|
|
|
|
* It is vital that we remove the page mapping if we have mapped a tiled
|
|
|
|
* object through the GTT and then lose the fence register due to
|
|
|
|
* resource pressure. Similarly if the object has been moved out of the
|
|
|
|
* aperture, than pages mapped into userspace must be revoked. Removing the
|
|
|
|
* mapping will then trigger a page fault on the next user access, allowing
|
|
|
|
* fixup by i915_gem_fault().
|
|
|
|
*/
|
2009-07-11 03:02:26 +07:00
|
|
|
void
|
2010-11-09 02:18:58 +07:00
|
|
|
i915_gem_release_mmap(struct drm_i915_gem_object *obj)
|
2009-07-10 14:18:50 +07:00
|
|
|
{
|
2016-10-24 19:42:14 +07:00
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
|
2016-04-13 23:35:12 +07:00
|
|
|
/* Serialisation between user GTT access and our code depends upon
|
|
|
|
* revoking the CPU's PTE whilst the mutex is held. The next user
|
|
|
|
* pagefault then has to wait until we release the mutex.
|
2016-10-24 19:42:15 +07:00
|
|
|
*
|
|
|
|
* Note that RPM complicates somewhat by adding an additional
|
|
|
|
* requirement that operations to the GGTT be made holding the RPM
|
|
|
|
* wakeref.
|
2016-04-13 23:35:12 +07:00
|
|
|
*/
|
2016-10-24 19:42:14 +07:00
|
|
|
lockdep_assert_held(&i915->drm.struct_mutex);
|
2016-10-24 19:42:15 +07:00
|
|
|
intel_runtime_pm_get(i915);
|
2016-04-13 23:35:12 +07:00
|
|
|
|
2016-10-24 19:42:16 +07:00
|
|
|
if (list_empty(&obj->userfault_link))
|
2016-10-24 19:42:15 +07:00
|
|
|
goto out;
|
2009-07-10 14:18:50 +07:00
|
|
|
|
2016-10-24 19:42:16 +07:00
|
|
|
list_del_init(&obj->userfault_link);
|
2014-01-03 20:24:19 +07:00
|
|
|
drm_vma_node_unmap(&obj->base.vma_node,
|
|
|
|
obj->base.dev->anon_inode->i_mapping);
|
2016-04-13 23:35:12 +07:00
|
|
|
|
|
|
|
/* Ensure that the CPU's PTE are revoked and there are not outstanding
|
|
|
|
* memory transactions from userspace before we return. The TLB
|
|
|
|
* flushing implied above by changing the PTE above *should* be
|
|
|
|
* sufficient, an extra barrier here just provides us with a bit
|
|
|
|
* of paranoid documentation about our requirement to serialise
|
|
|
|
* memory writes before touching registers / GSM.
|
|
|
|
*/
|
|
|
|
wmb();
|
2016-10-24 19:42:15 +07:00
|
|
|
|
|
|
|
out:
|
|
|
|
intel_runtime_pm_put(i915);
|
2009-07-10 14:18:50 +07:00
|
|
|
}
|
|
|
|
|
2016-10-24 19:42:18 +07:00
|
|
|
void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv)
|
2014-06-16 14:57:44 +07:00
|
|
|
{
|
2016-10-24 19:42:16 +07:00
|
|
|
struct drm_i915_gem_object *obj, *on;
|
2016-10-24 19:42:18 +07:00
|
|
|
int i;
|
2014-06-16 14:57:44 +07:00
|
|
|
|
2016-10-24 19:42:16 +07:00
|
|
|
/*
|
|
|
|
* Only called during RPM suspend. All users of the userfault_list
|
|
|
|
* must be holding an RPM wakeref to ensure that this can not
|
|
|
|
* run concurrently with themselves (and use the struct_mutex for
|
|
|
|
* protection between themselves).
|
|
|
|
*/
|
2016-10-24 19:42:14 +07:00
|
|
|
|
2016-10-24 19:42:16 +07:00
|
|
|
list_for_each_entry_safe(obj, on,
|
|
|
|
&dev_priv->mm.userfault_list, userfault_link) {
|
|
|
|
list_del_init(&obj->userfault_link);
|
2016-10-24 19:42:14 +07:00
|
|
|
drm_vma_node_unmap(&obj->base.vma_node,
|
|
|
|
obj->base.dev->anon_inode->i_mapping);
|
|
|
|
}
|
2016-10-24 19:42:18 +07:00
|
|
|
|
|
|
|
/* The fence will be lost when the device powers down. If any were
|
|
|
|
* in use by hardware (i.e. they are pinned), we should not be powering
|
|
|
|
* down! All other fences will be reacquired by the user upon waking.
|
|
|
|
*/
|
|
|
|
for (i = 0; i < dev_priv->num_fence_regs; i++) {
|
|
|
|
struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i];
|
|
|
|
|
|
|
|
if (WARN_ON(reg->pin_count))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (!reg->vma)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!list_empty(®->vma->obj->userfault_link));
|
|
|
|
reg->dirty = true;
|
|
|
|
}
|
2014-06-16 14:57:44 +07:00
|
|
|
}
|
|
|
|
|
2016-08-04 22:32:27 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_get_ggtt_size - return required global GTT size for an object
|
2016-08-04 22:32:28 +07:00
|
|
|
* @dev_priv: i915 device
|
2016-08-04 22:32:27 +07:00
|
|
|
* @size: object size
|
|
|
|
* @tiling_mode: tiling mode
|
|
|
|
*
|
|
|
|
* Return the required global GTT size for an object, taking into account
|
|
|
|
* potential fence register mapping.
|
|
|
|
*/
|
2016-08-04 22:32:28 +07:00
|
|
|
u64 i915_gem_get_ggtt_size(struct drm_i915_private *dev_priv,
|
|
|
|
u64 size, int tiling_mode)
|
2010-11-09 18:47:32 +07:00
|
|
|
{
|
2016-08-04 22:32:27 +07:00
|
|
|
u64 ggtt_size;
|
2010-11-09 18:47:32 +07:00
|
|
|
|
2016-08-04 22:32:27 +07:00
|
|
|
GEM_BUG_ON(size == 0);
|
|
|
|
|
2016-08-04 22:32:28 +07:00
|
|
|
if (INTEL_GEN(dev_priv) >= 4 ||
|
2011-07-19 03:11:49 +07:00
|
|
|
tiling_mode == I915_TILING_NONE)
|
|
|
|
return size;
|
2010-11-09 18:47:32 +07:00
|
|
|
|
|
|
|
/* Previous chips need a power-of-two fence region when tiling */
|
2016-08-04 22:32:28 +07:00
|
|
|
if (IS_GEN3(dev_priv))
|
2016-08-04 22:32:27 +07:00
|
|
|
ggtt_size = 1024*1024;
|
2010-11-09 18:47:32 +07:00
|
|
|
else
|
2016-08-04 22:32:27 +07:00
|
|
|
ggtt_size = 512*1024;
|
2010-11-09 18:47:32 +07:00
|
|
|
|
2016-08-04 22:32:27 +07:00
|
|
|
while (ggtt_size < size)
|
|
|
|
ggtt_size <<= 1;
|
2010-11-09 18:47:32 +07:00
|
|
|
|
2016-08-04 22:32:27 +07:00
|
|
|
return ggtt_size;
|
2010-11-09 18:47:32 +07:00
|
|
|
}
|
|
|
|
|
2008-11-13 01:03:55 +07:00
|
|
|
/**
|
2016-08-04 22:32:27 +07:00
|
|
|
* i915_gem_get_ggtt_alignment - return required global GTT alignment
|
2016-08-04 22:32:28 +07:00
|
|
|
* @dev_priv: i915 device
|
2016-06-03 20:02:17 +07:00
|
|
|
* @size: object size
|
|
|
|
* @tiling_mode: tiling mode
|
2016-08-04 22:32:27 +07:00
|
|
|
* @fenced: is fenced alignment required or not
|
2008-11-13 01:03:55 +07:00
|
|
|
*
|
2016-08-04 22:32:27 +07:00
|
|
|
* Return the required global GTT alignment for an object, taking into account
|
2010-11-15 04:32:36 +07:00
|
|
|
* potential fence register mapping.
|
2008-11-13 01:03:55 +07:00
|
|
|
*/
|
2016-08-04 22:32:28 +07:00
|
|
|
u64 i915_gem_get_ggtt_alignment(struct drm_i915_private *dev_priv, u64 size,
|
2016-08-04 22:32:27 +07:00
|
|
|
int tiling_mode, bool fenced)
|
2008-11-13 01:03:55 +07:00
|
|
|
{
|
2016-08-04 22:32:27 +07:00
|
|
|
GEM_BUG_ON(size == 0);
|
|
|
|
|
2008-11-13 01:03:55 +07:00
|
|
|
/*
|
|
|
|
* Minimum alignment is 4k (GTT page size), but might be greater
|
|
|
|
* if a fence register is needed for the object.
|
|
|
|
*/
|
2016-08-04 22:32:28 +07:00
|
|
|
if (INTEL_GEN(dev_priv) >= 4 || (!fenced && IS_G33(dev_priv)) ||
|
2011-07-19 03:11:49 +07:00
|
|
|
tiling_mode == I915_TILING_NONE)
|
2008-11-13 01:03:55 +07:00
|
|
|
return 4096;
|
|
|
|
|
2010-09-25 03:15:47 +07:00
|
|
|
/*
|
|
|
|
* Previous chips need to be aligned to the size of the smallest
|
|
|
|
* fence register that can contain the object.
|
|
|
|
*/
|
2016-08-04 22:32:28 +07:00
|
|
|
return i915_gem_get_ggtt_size(dev_priv, size, tiling_mode);
|
2010-09-25 03:15:47 +07:00
|
|
|
}
|
|
|
|
|
2012-08-11 21:41:03 +07:00
|
|
|
static int i915_gem_object_create_mmap_offset(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2016-08-05 16:14:14 +07:00
|
|
|
int err;
|
2012-12-20 21:11:16 +07:00
|
|
|
|
2016-08-05 16:14:14 +07:00
|
|
|
err = drm_gem_create_mmap_offset(&obj->base);
|
|
|
|
if (!err)
|
|
|
|
return 0;
|
2012-08-11 21:41:03 +07:00
|
|
|
|
2016-08-05 16:14:14 +07:00
|
|
|
/* We can idle the GPU locklessly to flush stale objects, but in order
|
|
|
|
* to claim that space for ourselves, we need to take the big
|
|
|
|
* struct_mutex to free the requests+objects and allocate our slot.
|
2012-08-11 21:41:03 +07:00
|
|
|
*/
|
2016-09-09 20:11:49 +07:00
|
|
|
err = i915_gem_wait_for_idle(dev_priv, I915_WAIT_INTERRUPTIBLE);
|
2016-08-05 16:14:14 +07:00
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
|
|
|
|
err = i915_mutex_lock_interruptible(&dev_priv->drm);
|
|
|
|
if (!err) {
|
|
|
|
i915_gem_retire_requests(dev_priv);
|
|
|
|
err = drm_gem_create_mmap_offset(&obj->base);
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
}
|
2012-12-20 21:11:16 +07:00
|
|
|
|
2016-08-05 16:14:14 +07:00
|
|
|
return err;
|
2012-08-11 21:41:03 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_object_free_mmap_offset(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
drm_gem_free_mmap_offset(&obj->base);
|
|
|
|
}
|
|
|
|
|
2014-12-24 10:11:17 +07:00
|
|
|
int
|
2011-02-07 09:16:14 +07:00
|
|
|
i915_gem_mmap_gtt(struct drm_file *file,
|
|
|
|
struct drm_device *dev,
|
2014-12-24 10:11:17 +07:00
|
|
|
uint32_t handle,
|
2011-02-07 09:16:14 +07:00
|
|
|
uint64_t *offset)
|
2008-11-13 01:03:55 +07:00
|
|
|
{
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2008-11-13 01:03:55 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, handle);
|
2016-08-05 16:14:14 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
2009-09-23 00:46:17 +07:00
|
|
|
|
2012-08-11 21:41:03 +07:00
|
|
|
ret = i915_gem_object_create_mmap_offset(obj);
|
2016-08-05 16:14:14 +07:00
|
|
|
if (ret == 0)
|
|
|
|
*offset = drm_vma_node_offset_addr(&obj->base.vma_node);
|
2008-11-13 01:03:55 +07:00
|
|
|
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2010-10-17 15:45:41 +07:00
|
|
|
return ret;
|
2008-11-13 01:03:55 +07:00
|
|
|
}
|
|
|
|
|
2011-02-07 09:16:14 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_mmap_gtt_ioctl - prepare an object for GTT mmap'ing
|
|
|
|
* @dev: DRM device
|
|
|
|
* @data: GTT mapping ioctl data
|
|
|
|
* @file: GEM object info
|
|
|
|
*
|
|
|
|
* Simply returns the fake offset to userspace so it can mmap it.
|
|
|
|
* The mmap call will end up in drm_gem_mmap(), which will set things
|
|
|
|
* up so we can get faults in the handler above.
|
|
|
|
*
|
|
|
|
* The fault handler will take care of binding the object into the GTT
|
|
|
|
* (since it may have been evicted to make room for something), allocating
|
|
|
|
* a fence register, and mapping the appropriate aperture address into
|
|
|
|
* userspace.
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_mmap_gtt *args = data;
|
|
|
|
|
2014-12-24 10:11:17 +07:00
|
|
|
return i915_gem_mmap_gtt(file, dev, args->handle, &args->offset);
|
2011-02-07 09:16:14 +07:00
|
|
|
}
|
|
|
|
|
2012-08-20 15:23:20 +07:00
|
|
|
/* Immediately discard the backing storage */
|
|
|
|
static void
|
|
|
|
i915_gem_object_truncate(struct drm_i915_gem_object *obj)
|
2010-10-28 19:45:36 +07:00
|
|
|
{
|
2012-08-11 21:41:05 +07:00
|
|
|
i915_gem_object_free_mmap_offset(obj);
|
2012-05-10 20:25:09 +07:00
|
|
|
|
2012-08-11 21:41:05 +07:00
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return;
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2012-08-20 15:23:20 +07:00
|
|
|
/* Our goal here is to return as much of the memory as
|
|
|
|
* is possible back to the system as we are called from OOM.
|
|
|
|
* To do this we must instruct the shmfs to drop all of its
|
|
|
|
* backing pages, *now*.
|
|
|
|
*/
|
2014-03-25 20:23:06 +07:00
|
|
|
shmem_truncate_range(file_inode(obj->base.filp), 0, (loff_t)-1);
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.madv = __I915_MADV_PURGED;
|
2012-08-20 15:23:20 +07:00
|
|
|
}
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2014-03-25 20:23:06 +07:00
|
|
|
/* Try to discard unwanted pages */
|
2016-10-28 19:58:36 +07:00
|
|
|
void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj)
|
2012-08-20 15:23:20 +07:00
|
|
|
{
|
2014-03-25 20:23:06 +07:00
|
|
|
struct address_space *mapping;
|
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
lockdep_assert_held(&obj->mm.lock);
|
|
|
|
GEM_BUG_ON(obj->mm.pages);
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
switch (obj->mm.madv) {
|
2014-03-25 20:23:06 +07:00
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
i915_gem_object_truncate(obj);
|
|
|
|
case __I915_MADV_PURGED:
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return;
|
|
|
|
|
2015-12-05 11:45:44 +07:00
|
|
|
mapping = obj->base.filp->f_mapping,
|
2014-03-25 20:23:06 +07:00
|
|
|
invalidate_mapping_pages(mapping, 0, (loff_t)-1);
|
2010-10-28 19:45:36 +07:00
|
|
|
}
|
|
|
|
|
2010-09-27 21:51:07 +07:00
|
|
|
static void
|
2016-10-28 19:58:36 +07:00
|
|
|
i915_gem_object_put_pages_gtt(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2016-05-20 17:54:06 +07:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
struct page *page;
|
2012-05-10 20:25:09 +07:00
|
|
|
|
2016-11-11 21:58:09 +07:00
|
|
|
__i915_gem_object_release_shmem(obj, pages);
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
i915_gem_gtt_finish_pages(obj, pages);
|
2015-07-09 16:59:05 +07:00
|
|
|
|
2011-09-13 02:30:02 +07:00
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
2016-10-28 19:58:36 +07:00
|
|
|
i915_gem_object_save_bit_17_swizzle(obj, pages);
|
2009-03-13 06:56:27 +07:00
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
for_each_sgt_page(page, sgt_iter, pages) {
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.dirty)
|
2012-06-01 21:20:22 +07:00
|
|
|
set_page_dirty(page);
|
2009-09-14 22:50:29 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv == I915_MADV_WILLNEED)
|
2012-06-01 21:20:22 +07:00
|
|
|
mark_page_accessed(page);
|
2009-09-14 22:50:29 +07:00
|
|
|
|
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 19:29:47 +07:00
|
|
|
put_page(page);
|
2009-09-14 22:50:29 +07:00
|
|
|
}
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.dirty = false;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
sg_free_table(pages);
|
|
|
|
kfree(pages);
|
2012-06-07 21:38:42 +07:00
|
|
|
}
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
|
2016-10-28 19:58:33 +07:00
|
|
|
static void __i915_gem_object_reset_page_iter(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
struct radix_tree_iter iter;
|
|
|
|
void **slot;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
radix_tree_for_each_slot(slot, &obj->mm.get_page.radix, &iter, 0)
|
|
|
|
radix_tree_delete(&obj->mm.get_page.radix, iter.index);
|
2016-10-28 19:58:33 +07:00
|
|
|
}
|
|
|
|
|
2016-11-01 19:11:34 +07:00
|
|
|
void __i915_gem_object_put_pages(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_mm_subclass subclass)
|
2012-06-07 21:38:42 +07:00
|
|
|
{
|
2016-10-28 19:58:36 +07:00
|
|
|
struct sg_table *pages;
|
2012-06-07 21:38:42 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (i915_gem_object_has_pinned_pages(obj))
|
2016-10-28 19:58:36 +07:00
|
|
|
return;
|
2012-09-05 03:02:54 +07:00
|
|
|
|
2016-08-04 13:52:26 +07:00
|
|
|
GEM_BUG_ON(obj->bind_count);
|
2016-10-28 19:58:37 +07:00
|
|
|
if (!READ_ONCE(obj->mm.pages))
|
|
|
|
return;
|
|
|
|
|
|
|
|
/* May be called by shrinker from within get_pages() (on another bo) */
|
2016-11-01 19:11:34 +07:00
|
|
|
mutex_lock_nested(&obj->mm.lock, subclass);
|
2016-10-28 19:58:37 +07:00
|
|
|
if (unlikely(atomic_read(&obj->mm.pages_pin_count)))
|
|
|
|
goto unlock;
|
2013-08-01 07:00:04 +07:00
|
|
|
|
2012-12-03 18:49:00 +07:00
|
|
|
/* ->put_pages might need to allocate memory for the bit17 swizzle
|
|
|
|
* array, hence protect them from being reaped by removing them from gtt
|
|
|
|
* lists early. */
|
2016-10-28 19:58:36 +07:00
|
|
|
pages = fetch_and_zero(&obj->mm.pages);
|
|
|
|
GEM_BUG_ON(!pages);
|
2012-12-03 18:49:00 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.mapping) {
|
2016-08-18 23:16:42 +07:00
|
|
|
void *ptr;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ptr = ptr_mask_bits(obj->mm.mapping);
|
2016-08-18 23:16:42 +07:00
|
|
|
if (is_vmalloc_addr(ptr))
|
|
|
|
vunmap(ptr);
|
2016-04-08 18:11:14 +07:00
|
|
|
else
|
2016-08-18 23:16:42 +07:00
|
|
|
kunmap(kmap_to_page(ptr));
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.mapping = NULL;
|
2016-04-08 18:11:11 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:33 +07:00
|
|
|
__i915_gem_object_reset_page_iter(obj);
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
obj->ops->put_pages(obj, pages);
|
2016-10-28 19:58:37 +07:00
|
|
|
unlock:
|
|
|
|
mutex_unlock(&obj->mm.lock);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
}
|
|
|
|
|
2016-10-18 19:02:50 +07:00
|
|
|
static unsigned int swiotlb_max_size(void)
|
2016-10-11 15:20:21 +07:00
|
|
|
{
|
|
|
|
#if IS_ENABLED(CONFIG_SWIOTLB)
|
|
|
|
return rounddown(swiotlb_nr_tbl() << IO_TLB_SHIFT, PAGE_SIZE);
|
|
|
|
#else
|
|
|
|
return 0;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2016-11-09 22:13:43 +07:00
|
|
|
static void i915_sg_trim(struct sg_table *orig_st)
|
|
|
|
{
|
|
|
|
struct sg_table new_st;
|
|
|
|
struct scatterlist *sg, *new_sg;
|
|
|
|
unsigned int i;
|
|
|
|
|
|
|
|
if (orig_st->nents == orig_st->orig_nents)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (sg_alloc_table(&new_st, orig_st->nents, GFP_KERNEL))
|
|
|
|
return;
|
|
|
|
|
|
|
|
new_sg = new_st.sgl;
|
|
|
|
for_each_sg(orig_st->sgl, sg, orig_st->nents, i) {
|
|
|
|
sg_set_page(new_sg, sg_page(sg), sg->length, 0);
|
|
|
|
/* called before being DMA mapped, no need to copy sg->dma_* */
|
|
|
|
new_sg = sg_next(new_sg);
|
|
|
|
}
|
|
|
|
|
|
|
|
sg_free_table(orig_st);
|
|
|
|
|
|
|
|
*orig_st = new_st;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
static struct sg_table *
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
|
2010-10-28 19:45:36 +07:00
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2010-10-28 19:45:36 +07:00
|
|
|
int page_count, i;
|
|
|
|
struct address_space *mapping;
|
2012-06-01 21:20:22 +07:00
|
|
|
struct sg_table *st;
|
|
|
|
struct scatterlist *sg;
|
2016-05-20 17:54:06 +07:00
|
|
|
struct sgt_iter sgt_iter;
|
2010-10-28 19:45:36 +07:00
|
|
|
struct page *page;
|
2013-02-19 00:28:03 +07:00
|
|
|
unsigned long last_pfn = 0; /* suppress gcc warning */
|
2016-10-18 19:02:50 +07:00
|
|
|
unsigned int max_segment;
|
2015-07-09 16:59:05 +07:00
|
|
|
int ret;
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
gfp_t gfp;
|
2010-10-28 19:45:36 +07:00
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
/* Assert that the object is not currently in any GPU domain. As it
|
|
|
|
* wasn't in the GTT, there shouldn't be any way it could have been in
|
|
|
|
* a GPU cache
|
|
|
|
*/
|
2016-10-28 19:58:36 +07:00
|
|
|
GEM_BUG_ON(obj->base.read_domains & I915_GEM_GPU_DOMAINS);
|
|
|
|
GEM_BUG_ON(obj->base.write_domain & I915_GEM_GPU_DOMAINS);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
|
2016-10-11 15:20:21 +07:00
|
|
|
max_segment = swiotlb_max_size();
|
|
|
|
if (!max_segment)
|
2016-10-18 19:02:50 +07:00
|
|
|
max_segment = rounddown(UINT_MAX, PAGE_SIZE);
|
2016-10-11 15:20:21 +07:00
|
|
|
|
2012-06-01 21:20:22 +07:00
|
|
|
st = kmalloc(sizeof(*st), GFP_KERNEL);
|
|
|
|
if (st == NULL)
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2012-06-01 21:20:22 +07:00
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
page_count = obj->base.size / PAGE_SIZE;
|
2012-06-01 21:20:22 +07:00
|
|
|
if (sg_alloc_table(st, page_count, GFP_KERNEL)) {
|
|
|
|
kfree(st);
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2012-06-01 21:20:22 +07:00
|
|
|
}
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2012-06-01 21:20:22 +07:00
|
|
|
/* Get the list of pages out of our struct file. They'll be pinned
|
|
|
|
* at this point until we release them.
|
|
|
|
*
|
|
|
|
* Fail silently without starting the shrinker
|
|
|
|
*/
|
2015-12-05 11:45:44 +07:00
|
|
|
mapping = obj->base.filp->f_mapping;
|
2015-11-07 07:28:49 +07:00
|
|
|
gfp = mapping_gfp_constraint(mapping, ~(__GFP_IO | __GFP_RECLAIM));
|
2015-11-07 07:28:21 +07:00
|
|
|
gfp |= __GFP_NORETRY | __GFP_NOWARN;
|
2013-02-19 00:28:03 +07:00
|
|
|
sg = st->sgl;
|
|
|
|
st->nents = 0;
|
|
|
|
for (i = 0; i < page_count; i++) {
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
page = shmem_read_mapping_page_gfp(mapping, i, gfp);
|
|
|
|
if (IS_ERR(page)) {
|
2014-09-09 17:16:08 +07:00
|
|
|
i915_gem_shrink(dev_priv,
|
|
|
|
page_count,
|
|
|
|
I915_SHRINK_BOUND |
|
|
|
|
I915_SHRINK_UNBOUND |
|
|
|
|
I915_SHRINK_PURGEABLE);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
page = shmem_read_mapping_page_gfp(mapping, i, gfp);
|
|
|
|
}
|
|
|
|
if (IS_ERR(page)) {
|
|
|
|
/* We've tried hard to allocate the memory by reaping
|
|
|
|
* our own buffer, now let the real VM do its job and
|
|
|
|
* go down in flames if truly OOM.
|
|
|
|
*/
|
2014-05-25 19:34:10 +07:00
|
|
|
page = shmem_read_mapping_page(mapping, i);
|
2015-07-09 16:59:05 +07:00
|
|
|
if (IS_ERR(page)) {
|
|
|
|
ret = PTR_ERR(page);
|
2016-11-14 18:29:30 +07:00
|
|
|
goto err_sg;
|
2015-07-09 16:59:05 +07:00
|
|
|
}
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
}
|
2016-10-11 15:20:21 +07:00
|
|
|
if (!i ||
|
|
|
|
sg->length >= max_segment ||
|
|
|
|
page_to_pfn(page) != last_pfn + 1) {
|
2013-02-19 00:28:03 +07:00
|
|
|
if (i)
|
|
|
|
sg = sg_next(sg);
|
|
|
|
st->nents++;
|
|
|
|
sg_set_page(sg, page, PAGE_SIZE, 0);
|
|
|
|
} else {
|
|
|
|
sg->length += PAGE_SIZE;
|
|
|
|
}
|
|
|
|
last_pfn = page_to_pfn(page);
|
2013-10-08 03:15:45 +07:00
|
|
|
|
|
|
|
/* Check that the i965g/gm workaround works. */
|
|
|
|
WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
|
2010-10-28 19:45:36 +07:00
|
|
|
}
|
2016-10-11 15:20:21 +07:00
|
|
|
if (sg) /* loop terminated early; short sg table */
|
2013-06-24 22:47:48 +07:00
|
|
|
sg_mark_end(sg);
|
2012-10-19 21:51:06 +07:00
|
|
|
|
2016-11-09 22:13:43 +07:00
|
|
|
/* Trim unused sg entries to avoid wasting memory. */
|
|
|
|
i915_sg_trim(st);
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
ret = i915_gem_gtt_prepare_pages(obj, st);
|
2015-07-09 16:59:05 +07:00
|
|
|
if (ret)
|
|
|
|
goto err_pages;
|
|
|
|
|
2011-09-13 02:30:02 +07:00
|
|
|
if (i915_gem_object_needs_bit17_swizzle(obj))
|
2016-10-28 19:58:36 +07:00
|
|
|
i915_gem_object_do_bit_17_swizzle(obj, st);
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
return st;
|
2010-10-28 19:45:36 +07:00
|
|
|
|
2016-11-14 18:29:30 +07:00
|
|
|
err_sg:
|
2013-02-19 00:28:03 +07:00
|
|
|
sg_mark_end(sg);
|
2016-11-14 18:29:30 +07:00
|
|
|
err_pages:
|
2016-05-20 17:54:06 +07:00
|
|
|
for_each_sgt_page(page, sgt_iter, st)
|
|
|
|
put_page(page);
|
2012-06-01 21:20:22 +07:00
|
|
|
sg_free_table(st);
|
|
|
|
kfree(st);
|
2014-03-25 20:23:03 +07:00
|
|
|
|
|
|
|
/* shmemfs first checks if there is enough memory to allocate the page
|
|
|
|
* and reports ENOSPC should there be insufficient, along with the usual
|
|
|
|
* ENOMEM for a genuine allocation failure.
|
|
|
|
*
|
|
|
|
* We use ENOSPC in our driver to mean that we have run out of aperture
|
|
|
|
* space and so want to translate the error from shmemfs back to our
|
|
|
|
* usual understanding of ENOMEM.
|
|
|
|
*/
|
2015-07-09 16:59:05 +07:00
|
|
|
if (ret == -ENOSPC)
|
|
|
|
ret = -ENOMEM;
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
|
|
|
|
|
|
|
void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
|
|
|
|
struct sg_table *pages)
|
|
|
|
{
|
2016-10-28 19:58:37 +07:00
|
|
|
lockdep_assert_held(&obj->mm.lock);
|
2016-10-28 19:58:36 +07:00
|
|
|
|
|
|
|
obj->mm.get_page.sg_pos = pages->sgl;
|
|
|
|
obj->mm.get_page.sg_idx = 0;
|
|
|
|
|
|
|
|
obj->mm.pages = pages;
|
2016-11-04 17:30:01 +07:00
|
|
|
|
|
|
|
if (i915_gem_object_is_tiled(obj) &&
|
|
|
|
to_i915(obj->base.dev)->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
|
|
|
|
GEM_BUG_ON(obj->mm.quirked);
|
|
|
|
__i915_gem_object_pin_pages(obj);
|
|
|
|
obj->mm.quirked = true;
|
|
|
|
}
|
2016-10-28 19:58:36 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
struct sg_table *pages;
|
|
|
|
|
2016-11-04 17:30:01 +07:00
|
|
|
GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj));
|
|
|
|
|
2016-10-28 19:58:36 +07:00
|
|
|
if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) {
|
|
|
|
DRM_DEBUG("Attempting to obtain a purgeable object\n");
|
|
|
|
return -EFAULT;
|
|
|
|
}
|
|
|
|
|
|
|
|
pages = obj->ops->get_pages(obj);
|
|
|
|
if (unlikely(IS_ERR(pages)))
|
|
|
|
return PTR_ERR(pages);
|
|
|
|
|
|
|
|
__i915_gem_object_set_pages(obj, pages);
|
|
|
|
return 0;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2012-06-07 21:38:42 +07:00
|
|
|
/* Ensure that the associated pages are gathered from the backing storage
|
2016-10-28 19:58:37 +07:00
|
|
|
* and pinned into our object. i915_gem_object_pin_pages() may be called
|
2012-06-07 21:38:42 +07:00
|
|
|
* multiple times before they are released by a single call to
|
2016-10-28 19:58:37 +07:00
|
|
|
* i915_gem_object_unpin_pages() - once the pages are no longer referenced
|
2012-06-07 21:38:42 +07:00
|
|
|
* either as a result of memory pressure (reaping pages under the shrinker)
|
|
|
|
* or as the object is itself released.
|
|
|
|
*/
|
2016-10-28 19:58:35 +07:00
|
|
|
int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj)
|
2012-06-07 21:38:42 +07:00
|
|
|
{
|
2016-10-28 19:58:36 +07:00
|
|
|
int err;
|
2012-06-07 21:38:42 +07:00
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
err = mutex_lock_interruptible(&obj->mm.lock);
|
|
|
|
if (err)
|
|
|
|
return err;
|
2016-10-28 19:58:32 +07:00
|
|
|
|
2016-11-04 17:30:01 +07:00
|
|
|
if (unlikely(!obj->mm.pages)) {
|
|
|
|
err = ____i915_gem_object_get_pages(obj);
|
|
|
|
if (err)
|
|
|
|
goto unlock;
|
2012-06-07 21:38:42 +07:00
|
|
|
|
2016-11-04 17:30:01 +07:00
|
|
|
smp_mb__before_atomic();
|
|
|
|
}
|
|
|
|
atomic_inc(&obj->mm.pages_pin_count);
|
2015-04-07 22:20:25 +07:00
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
unlock:
|
|
|
|
mutex_unlock(&obj->mm.lock);
|
2016-10-28 19:58:36 +07:00
|
|
|
return err;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-05-20 17:54:04 +07:00
|
|
|
/* The 'mapping' part of i915_gem_object_pin_map() below */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
static void *i915_gem_object_map(const struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_map_type type)
|
2016-05-20 17:54:04 +07:00
|
|
|
{
|
|
|
|
unsigned long n_pages = obj->base.size >> PAGE_SHIFT;
|
2016-10-28 19:58:35 +07:00
|
|
|
struct sg_table *sgt = obj->mm.pages;
|
2016-05-20 17:54:06 +07:00
|
|
|
struct sgt_iter sgt_iter;
|
|
|
|
struct page *page;
|
2016-05-20 17:54:05 +07:00
|
|
|
struct page *stack_pages[32];
|
|
|
|
struct page **pages = stack_pages;
|
2016-05-20 17:54:04 +07:00
|
|
|
unsigned long i = 0;
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
pgprot_t pgprot;
|
2016-05-20 17:54:04 +07:00
|
|
|
void *addr;
|
|
|
|
|
|
|
|
/* A single page can always be kmapped */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
if (n_pages == 1 && type == I915_MAP_WB)
|
2016-05-20 17:54:04 +07:00
|
|
|
return kmap(sg_page(sgt->sgl));
|
|
|
|
|
2016-05-20 17:54:05 +07:00
|
|
|
if (n_pages > ARRAY_SIZE(stack_pages)) {
|
|
|
|
/* Too big for stack -- allocate temporary array instead */
|
|
|
|
pages = drm_malloc_gfp(n_pages, sizeof(*pages), GFP_TEMPORARY);
|
|
|
|
if (!pages)
|
|
|
|
return NULL;
|
|
|
|
}
|
2016-05-20 17:54:04 +07:00
|
|
|
|
2016-05-20 17:54:06 +07:00
|
|
|
for_each_sgt_page(page, sgt_iter, sgt)
|
|
|
|
pages[i++] = page;
|
2016-05-20 17:54:04 +07:00
|
|
|
|
|
|
|
/* Check that we have the expected number of pages */
|
|
|
|
GEM_BUG_ON(i != n_pages);
|
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
switch (type) {
|
|
|
|
case I915_MAP_WB:
|
|
|
|
pgprot = PAGE_KERNEL;
|
|
|
|
break;
|
|
|
|
case I915_MAP_WC:
|
|
|
|
pgprot = pgprot_writecombine(PAGE_KERNEL_IO);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
addr = vmap(pages, n_pages, 0, pgprot);
|
2016-05-20 17:54:04 +07:00
|
|
|
|
2016-05-20 17:54:05 +07:00
|
|
|
if (pages != stack_pages)
|
|
|
|
drm_free_large(pages);
|
2016-05-20 17:54:04 +07:00
|
|
|
|
|
|
|
return addr;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* get, pin, and map the pages of the object into kernel space */
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_map_type type)
|
2016-04-08 18:11:11 +07:00
|
|
|
{
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
enum i915_map_type has_type;
|
|
|
|
bool pinned;
|
|
|
|
void *ptr;
|
2016-04-08 18:11:11 +07:00
|
|
|
int ret;
|
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
|
2016-04-08 18:11:11 +07:00
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
ret = mutex_lock_interruptible(&obj->mm.lock);
|
2016-04-08 18:11:11 +07:00
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
pinned = true;
|
|
|
|
if (!atomic_inc_not_zero(&obj->mm.pages_pin_count)) {
|
2016-11-04 17:30:01 +07:00
|
|
|
if (unlikely(!obj->mm.pages)) {
|
|
|
|
ret = ____i915_gem_object_get_pages(obj);
|
|
|
|
if (ret)
|
|
|
|
goto err_unlock;
|
2016-10-28 19:58:37 +07:00
|
|
|
|
2016-11-04 17:30:01 +07:00
|
|
|
smp_mb__before_atomic();
|
|
|
|
}
|
|
|
|
atomic_inc(&obj->mm.pages_pin_count);
|
2016-10-28 19:58:37 +07:00
|
|
|
pinned = false;
|
|
|
|
}
|
|
|
|
GEM_BUG_ON(!obj->mm.pages);
|
2016-04-08 18:11:11 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ptr = ptr_unpack_bits(obj->mm.mapping, has_type);
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
if (ptr && has_type != type) {
|
|
|
|
if (pinned) {
|
|
|
|
ret = -EBUSY;
|
2016-10-28 19:58:37 +07:00
|
|
|
goto err_unpin;
|
2016-04-08 18:11:11 +07:00
|
|
|
}
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
|
|
|
|
if (is_vmalloc_addr(ptr))
|
|
|
|
vunmap(ptr);
|
|
|
|
else
|
|
|
|
kunmap(kmap_to_page(ptr));
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ptr = obj->mm.mapping = NULL;
|
2016-04-08 18:11:11 +07:00
|
|
|
}
|
|
|
|
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
if (!ptr) {
|
|
|
|
ptr = i915_gem_object_map(obj, type);
|
|
|
|
if (!ptr) {
|
|
|
|
ret = -ENOMEM;
|
2016-10-28 19:58:37 +07:00
|
|
|
goto err_unpin;
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.mapping = ptr_pack_bits(ptr, type);
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
out_unlock:
|
|
|
|
mutex_unlock(&obj->mm.lock);
|
drm/i915: Support for creating write combined type vmaps
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-12 18:39:58 +07:00
|
|
|
return ptr;
|
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
err_unpin:
|
|
|
|
atomic_dec(&obj->mm.pages_pin_count);
|
|
|
|
err_unlock:
|
|
|
|
ptr = ERR_PTR(ret);
|
|
|
|
goto out_unlock;
|
2016-04-08 18:11:11 +07:00
|
|
|
}
|
|
|
|
|
2016-07-04 14:08:37 +07:00
|
|
|
static bool i915_context_is_banned(const struct i915_gem_context *ctx)
|
2013-08-30 20:19:28 +07:00
|
|
|
{
|
2016-11-16 22:20:34 +07:00
|
|
|
if (ctx->banned)
|
2013-08-30 20:19:28 +07:00
|
|
|
return true;
|
|
|
|
|
2016-11-16 22:20:34 +07:00
|
|
|
if (!ctx->bannable)
|
2016-11-16 22:20:31 +07:00
|
|
|
return false;
|
|
|
|
|
2016-11-16 22:20:34 +07:00
|
|
|
if (ctx->ban_score >= CONTEXT_SCORE_BAN_THRESHOLD) {
|
2016-11-16 22:20:31 +07:00
|
|
|
DRM_DEBUG("context hanging too often, banning!\n");
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2013-08-30 20:19:28 +07:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-11-16 22:20:31 +07:00
|
|
|
static void i915_gem_context_mark_guilty(struct i915_gem_context *ctx)
|
2013-06-12 19:13:20 +07:00
|
|
|
{
|
2016-11-16 22:20:34 +07:00
|
|
|
ctx->ban_score += CONTEXT_SCORE_GUILTY;
|
2016-11-16 22:20:31 +07:00
|
|
|
|
2016-11-16 22:20:34 +07:00
|
|
|
ctx->banned = i915_context_is_banned(ctx);
|
|
|
|
ctx->guilty_count++;
|
2016-11-18 20:10:47 +07:00
|
|
|
|
|
|
|
DRM_DEBUG_DRIVER("context %s marked guilty (score %d) banned? %s\n",
|
2016-11-16 22:20:34 +07:00
|
|
|
ctx->name, ctx->ban_score,
|
|
|
|
yesno(ctx->banned));
|
2016-11-18 20:10:47 +07:00
|
|
|
|
2016-11-22 21:41:18 +07:00
|
|
|
if (!ctx->banned || IS_ERR_OR_NULL(ctx->file_priv))
|
2016-11-18 20:10:47 +07:00
|
|
|
return;
|
|
|
|
|
2016-11-22 21:41:18 +07:00
|
|
|
ctx->file_priv->context_bans++;
|
|
|
|
DRM_DEBUG_DRIVER("client %s has had %d context banned\n",
|
|
|
|
ctx->name, ctx->file_priv->context_bans);
|
2016-11-16 22:20:31 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_context_mark_innocent(struct i915_gem_context *ctx)
|
|
|
|
{
|
2016-11-16 22:20:34 +07:00
|
|
|
ctx->active_count++;
|
2013-06-12 19:13:20 +07:00
|
|
|
}
|
|
|
|
|
2014-02-25 22:11:23 +07:00
|
|
|
struct drm_i915_gem_request *
|
2016-03-16 18:00:37 +07:00
|
|
|
i915_gem_find_active_request(struct intel_engine_cs *engine)
|
2010-09-19 18:21:28 +07:00
|
|
|
{
|
2013-12-04 18:37:09 +07:00
|
|
|
struct drm_i915_gem_request *request;
|
|
|
|
|
2016-07-01 23:23:16 +07:00
|
|
|
/* We are called by the error capture and reset at a random
|
|
|
|
* point in time. In particular, note that neither is crucially
|
|
|
|
* ordered with an interrupt. After a hang, the GPU is dead and we
|
|
|
|
* assume that no more writes can happen (we waited long enough for
|
|
|
|
* all writes that were in transaction to be flushed) - adding an
|
|
|
|
* extra delay for a recent interrupt is pointless. Hence, we do
|
|
|
|
* not need an engine->irq_seqno_barrier() before the seqno reads.
|
|
|
|
*/
|
2016-10-28 19:58:46 +07:00
|
|
|
list_for_each_entry(request, &engine->timeline->requests, link) {
|
2016-10-28 19:58:58 +07:00
|
|
|
if (__i915_gem_request_completed(request))
|
2013-12-04 18:37:09 +07:00
|
|
|
continue;
|
2013-06-12 19:13:20 +07:00
|
|
|
|
2014-01-31 00:04:43 +07:00
|
|
|
return request;
|
2013-12-04 18:37:09 +07:00
|
|
|
}
|
2014-01-31 00:04:43 +07:00
|
|
|
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2016-09-09 20:11:53 +07:00
|
|
|
static void reset_request(struct drm_i915_gem_request *request)
|
|
|
|
{
|
|
|
|
void *vaddr = request->ring->vaddr;
|
|
|
|
u32 head;
|
|
|
|
|
|
|
|
/* As this request likely depends on state from the lost
|
|
|
|
* context, clear out all the user operations leaving the
|
|
|
|
* breadcrumb at the end (so we get the fence notifications).
|
|
|
|
*/
|
|
|
|
head = request->head;
|
|
|
|
if (request->postfix < head) {
|
|
|
|
memset(vaddr + head, 0, request->ring->size - head);
|
|
|
|
head = 0;
|
|
|
|
}
|
|
|
|
memset(vaddr + head, 0, request->postfix - head);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_reset_engine(struct intel_engine_cs *engine)
|
2014-01-31 00:04:43 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_request *request;
|
2016-09-09 20:11:53 +07:00
|
|
|
struct i915_gem_context *incomplete_ctx;
|
2016-10-28 19:58:58 +07:00
|
|
|
struct intel_timeline *timeline;
|
2014-01-31 00:04:43 +07:00
|
|
|
bool ring_hung;
|
|
|
|
|
2016-09-09 20:11:53 +07:00
|
|
|
if (engine->irq_seqno_barrier)
|
|
|
|
engine->irq_seqno_barrier(engine);
|
|
|
|
|
2016-03-16 18:00:37 +07:00
|
|
|
request = i915_gem_find_active_request(engine);
|
2016-09-09 20:11:53 +07:00
|
|
|
if (!request)
|
2014-01-31 00:04:43 +07:00
|
|
|
return;
|
|
|
|
|
2016-11-18 20:09:04 +07:00
|
|
|
ring_hung = engine->hangcheck.stalled;
|
|
|
|
if (engine->hangcheck.seqno != intel_engine_get_seqno(engine)) {
|
|
|
|
DRM_DEBUG_DRIVER("%s pardoned, was guilty? %s\n",
|
|
|
|
engine->name,
|
|
|
|
yesno(ring_hung));
|
2016-10-05 03:11:29 +07:00
|
|
|
ring_hung = false;
|
2016-11-18 20:09:04 +07:00
|
|
|
}
|
2016-10-05 03:11:29 +07:00
|
|
|
|
2016-11-16 22:20:31 +07:00
|
|
|
if (ring_hung)
|
|
|
|
i915_gem_context_mark_guilty(request->ctx);
|
|
|
|
else
|
|
|
|
i915_gem_context_mark_innocent(request->ctx);
|
|
|
|
|
2016-09-09 20:11:53 +07:00
|
|
|
if (!ring_hung)
|
|
|
|
return;
|
|
|
|
|
|
|
|
DRM_DEBUG_DRIVER("resetting %s to restart from tail of request 0x%x\n",
|
2016-10-28 19:58:49 +07:00
|
|
|
engine->name, request->global_seqno);
|
2016-09-09 20:11:53 +07:00
|
|
|
|
|
|
|
/* Setup the CS to resume from the breadcrumb of the hung request */
|
|
|
|
engine->reset_hw(engine, request);
|
|
|
|
|
|
|
|
/* Users of the default context do not rely on logical state
|
|
|
|
* preserved between batches. They have to emit full state on
|
|
|
|
* every batch and so it is safe to execute queued requests following
|
|
|
|
* the hang.
|
|
|
|
*
|
|
|
|
* Other contexts preserve state, now corrupt. We want to skip all
|
|
|
|
* queued requests that reference the corrupt context.
|
|
|
|
*/
|
|
|
|
incomplete_ctx = request->ctx;
|
|
|
|
if (i915_gem_context_is_default(incomplete_ctx))
|
|
|
|
return;
|
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
list_for_each_entry_continue(request, &engine->timeline->requests, link)
|
2016-09-09 20:11:53 +07:00
|
|
|
if (request->ctx == incomplete_ctx)
|
|
|
|
reset_request(request);
|
2016-10-28 19:58:58 +07:00
|
|
|
|
|
|
|
timeline = i915_gem_context_lookup_timeline(incomplete_ctx, engine);
|
|
|
|
list_for_each_entry(request, &timeline->requests, link)
|
|
|
|
reset_request(request);
|
2013-12-04 18:37:09 +07:00
|
|
|
}
|
2013-06-12 19:13:20 +07:00
|
|
|
|
2016-09-09 20:11:53 +07:00
|
|
|
void i915_gem_reset(struct drm_i915_private *dev_priv)
|
2013-12-04 18:37:09 +07:00
|
|
|
{
|
2016-09-09 20:11:53 +07:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
enum intel_engine_id id;
|
2015-09-03 19:01:40 +07:00
|
|
|
|
2016-10-28 19:58:32 +07:00
|
|
|
lockdep_assert_held(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2016-09-09 20:11:53 +07:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-09-09 20:11:53 +07:00
|
|
|
i915_gem_reset_engine(engine);
|
|
|
|
|
2016-11-16 15:55:33 +07:00
|
|
|
i915_gem_restore_fences(dev_priv);
|
2016-09-21 20:51:06 +07:00
|
|
|
|
|
|
|
if (dev_priv->gt.awake) {
|
|
|
|
intel_sanitize_gt_powersave(dev_priv);
|
|
|
|
intel_enable_gt_powersave(dev_priv);
|
|
|
|
if (INTEL_GEN(dev_priv) >= 6)
|
|
|
|
gen6_rps_busy(dev_priv);
|
|
|
|
}
|
2016-09-09 20:11:53 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void nop_submit_request(struct drm_i915_gem_request *request)
|
|
|
|
{
|
2016-11-22 21:41:20 +07:00
|
|
|
i915_gem_request_submit(request);
|
|
|
|
intel_engine_init_global_seqno(request->engine, request->global_seqno);
|
2016-09-09 20:11:53 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_cleanup_engine(struct intel_engine_cs *engine)
|
|
|
|
{
|
2016-11-22 21:41:21 +07:00
|
|
|
/* We need to be sure that no thread is running the old callback as
|
|
|
|
* we install the nop handler (otherwise we would submit a request
|
|
|
|
* to hardware that will never complete). In order to prevent this
|
|
|
|
* race, we wait until the machine is idle before making the swap
|
|
|
|
* (using stop_machine()).
|
|
|
|
*/
|
2016-09-09 20:11:53 +07:00
|
|
|
engine->submit_request = nop_submit_request;
|
2016-09-09 20:11:46 +07:00
|
|
|
|
2016-07-20 15:21:10 +07:00
|
|
|
/* Mark all pending requests as complete so that any concurrent
|
|
|
|
* (lockless) lookup doesn't try and wait upon the request as we
|
|
|
|
* reset it.
|
|
|
|
*/
|
2016-10-28 19:58:46 +07:00
|
|
|
intel_engine_init_global_seqno(engine,
|
2016-11-01 17:03:16 +07:00
|
|
|
intel_engine_last_submit(engine));
|
2016-07-20 15:21:10 +07:00
|
|
|
|
drm/i915/bdw: Pin the context backing objects to GGTT on-demand
Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).
This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).
v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(
v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.
v4: Break out pin and unpin into functions. Fix style problems reported
by checkpatch
v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked. Add WARN_ONs to make sure this is the case in future.
Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-13 17:28:10 +07:00
|
|
|
/*
|
|
|
|
* Clear the execlists queue up before freeing the requests, as those
|
|
|
|
* are the ones that keep the context and ringbuffer backing objects
|
|
|
|
* pinned in place.
|
|
|
|
*/
|
|
|
|
|
2015-10-19 22:32:32 +07:00
|
|
|
if (i915.enable_execlists) {
|
2016-11-15 03:41:00 +07:00
|
|
|
unsigned long flags;
|
|
|
|
|
|
|
|
spin_lock_irqsave(&engine->timeline->lock, flags);
|
|
|
|
|
2016-09-09 20:11:46 +07:00
|
|
|
i915_gem_request_put(engine->execlist_port[0].request);
|
|
|
|
i915_gem_request_put(engine->execlist_port[1].request);
|
|
|
|
memset(engine->execlist_port, 0, sizeof(engine->execlist_port));
|
2016-11-15 03:41:03 +07:00
|
|
|
engine->execlist_queue = RB_ROOT;
|
|
|
|
engine->execlist_first = NULL;
|
2016-11-15 03:41:00 +07:00
|
|
|
|
|
|
|
spin_unlock_irqrestore(&engine->timeline->lock, flags);
|
drm/i915/bdw: Pin the context backing objects to GGTT on-demand
Up until now, we have pinned every logical ring context backing object
during creation, and left it pinned until destruction. This made my life
easier, but it's a harmful thing to do, because we cause fragmentation
of the GGTT (and, eventually, we would run out of space).
This patch makes the pinning on-demand: the backing objects of the two
contexts that are written to the ELSP are pinned right before submission
and unpinned once the hardware is done with them. The only context that
is still pinned regardless is the global default one, so that the HWS can
still be accessed in the same way (ring->status_page).
v2: In the early version of this patch, we were pinning the context as
we put it into the ELSP: on the one hand, this is very efficient because
only a maximum two contexts are pinned at any given time, but on the other
hand, we cannot really pin in interrupt time :(
v3: Use a mutex rather than atomic_t to protect pin count to avoid races.
Do not unpin default context in free_request.
v4: Break out pin and unpin into functions. Fix style problems reported
by checkpatch
v5: Remove unpin_lock as all pinning and unpinning is done with the struct
mutex already locked. Add WARN_ONs to make sure this is the case in future.
Issue: VIZ-4277
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Akash Goel <akash.goels@gmail.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-11-13 17:28:10 +07:00
|
|
|
}
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-11-22 21:41:21 +07:00
|
|
|
static int __i915_gem_set_wedged_BKL(void *data)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2016-11-22 21:41:21 +07:00
|
|
|
struct drm_i915_private *i915 = data;
|
2016-03-16 18:00:36 +07:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
enum intel_engine_id id;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-11-22 21:41:21 +07:00
|
|
|
for_each_engine(engine, i915, id)
|
|
|
|
i915_gem_cleanup_engine(engine);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void i915_gem_set_wedged(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2016-09-09 20:11:53 +07:00
|
|
|
lockdep_assert_held(&dev_priv->drm.struct_mutex);
|
|
|
|
set_bit(I915_WEDGED, &dev_priv->gpu_error.flags);
|
2013-12-04 18:37:09 +07:00
|
|
|
|
2016-11-22 21:41:21 +07:00
|
|
|
stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL);
|
2010-09-22 16:31:52 +07:00
|
|
|
|
2016-11-22 21:41:21 +07:00
|
|
|
i915_gem_context_lost(dev_priv);
|
2016-09-09 20:11:53 +07:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
2016-11-22 21:41:21 +07:00
|
|
|
|
|
|
|
mod_delayed_work(dev_priv->wq, &dev_priv->gt.idle_work, 0);
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2010-08-21 05:25:16 +07:00
|
|
|
static void
|
2008-07-31 02:06:12 +07:00
|
|
|
i915_gem_retire_work_handler(struct work_struct *work)
|
|
|
|
{
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
struct drm_i915_private *dev_priv =
|
2016-07-04 14:08:31 +07:00
|
|
|
container_of(work, typeof(*dev_priv), gt.retire_work.work);
|
2016-07-05 16:40:23 +07:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2010-09-29 18:26:37 +07:00
|
|
|
/* Come back later if the device is busy... */
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
if (mutex_trylock(&dev->struct_mutex)) {
|
2016-07-04 14:08:31 +07:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
2016-07-04 14:08:31 +07:00
|
|
|
|
|
|
|
/* Keep the retire handler running until we are finally idle.
|
|
|
|
* We do not need to do this test under locking as in the worst-case
|
|
|
|
* we queue the retire worker once too often.
|
|
|
|
*/
|
2016-07-09 16:12:06 +07:00
|
|
|
if (READ_ONCE(dev_priv->gt.awake)) {
|
|
|
|
i915_queue_hangcheck(dev_priv);
|
2016-07-04 14:08:31 +07:00
|
|
|
queue_delayed_work(dev_priv->wq,
|
|
|
|
&dev_priv->gt.retire_work,
|
2012-10-05 23:02:57 +07:00
|
|
|
round_jiffies_up_relative(HZ));
|
2016-07-09 16:12:06 +07:00
|
|
|
}
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
}
|
2011-01-10 04:05:44 +07:00
|
|
|
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
static void
|
|
|
|
i915_gem_idle_work_handler(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *dev_priv =
|
2016-07-04 14:08:31 +07:00
|
|
|
container_of(work, typeof(*dev_priv), gt.idle_work.work);
|
2016-07-05 16:40:23 +07:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2016-03-24 18:20:38 +07:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
enum intel_engine_id id;
|
2016-07-04 14:08:31 +07:00
|
|
|
bool rearm_hangcheck;
|
|
|
|
|
|
|
|
if (!READ_ONCE(dev_priv->gt.awake))
|
|
|
|
return;
|
|
|
|
|
2016-11-07 16:20:04 +07:00
|
|
|
/*
|
|
|
|
* Wait for last execlists context complete, but bail out in case a
|
|
|
|
* new request is submitted.
|
|
|
|
*/
|
|
|
|
wait_for(READ_ONCE(dev_priv->gt.active_requests) ||
|
|
|
|
intel_execlists_idle(dev_priv), 10);
|
|
|
|
|
2016-10-28 19:58:56 +07:00
|
|
|
if (READ_ONCE(dev_priv->gt.active_requests))
|
2016-07-04 14:08:31 +07:00
|
|
|
return;
|
|
|
|
|
|
|
|
rearm_hangcheck =
|
|
|
|
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
|
|
|
|
|
|
|
|
if (!mutex_trylock(&dev->struct_mutex)) {
|
|
|
|
/* Currently busy, come back later */
|
|
|
|
mod_delayed_work(dev_priv->wq,
|
|
|
|
&dev_priv->gt.idle_work,
|
|
|
|
msecs_to_jiffies(50));
|
|
|
|
goto out_rearm;
|
|
|
|
}
|
|
|
|
|
2016-11-07 16:20:03 +07:00
|
|
|
/*
|
|
|
|
* New request retired after this work handler started, extend active
|
|
|
|
* period until next instance of the work.
|
|
|
|
*/
|
|
|
|
if (work_pending(work))
|
|
|
|
goto out_unlock;
|
|
|
|
|
2016-10-28 19:58:56 +07:00
|
|
|
if (dev_priv->gt.active_requests)
|
2016-07-04 14:08:31 +07:00
|
|
|
goto out_unlock;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
2016-11-07 16:20:04 +07:00
|
|
|
if (wait_for(intel_execlists_idle(dev_priv), 10))
|
|
|
|
DRM_ERROR("Timeout waiting for engines to idle\n");
|
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-07-04 14:08:31 +07:00
|
|
|
i915_gem_batch_pool_fini(&engine->batch_pool);
|
2015-04-07 22:20:37 +07:00
|
|
|
|
2016-07-04 14:08:31 +07:00
|
|
|
GEM_BUG_ON(!dev_priv->gt.awake);
|
|
|
|
dev_priv->gt.awake = false;
|
|
|
|
rearm_hangcheck = false;
|
2015-12-09 15:29:36 +07:00
|
|
|
|
2016-07-04 14:08:31 +07:00
|
|
|
if (INTEL_GEN(dev_priv) >= 6)
|
|
|
|
gen6_rps_idle(dev_priv);
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
out_unlock:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
2016-07-04 14:08:31 +07:00
|
|
|
out_rearm:
|
|
|
|
if (rearm_hangcheck) {
|
|
|
|
GEM_BUG_ON(!dev_priv->gt.awake);
|
|
|
|
i915_queue_hangcheck(dev_priv);
|
2015-04-07 22:20:37 +07:00
|
|
|
}
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-08-04 13:52:45 +07:00
|
|
|
void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(gem);
|
|
|
|
struct drm_i915_file_private *fpriv = file->driver_priv;
|
|
|
|
struct i915_vma *vma, *vn;
|
|
|
|
|
|
|
|
mutex_lock(&obj->base.dev->struct_mutex);
|
|
|
|
list_for_each_entry_safe(vma, vn, &obj->vma_list, obj_link)
|
|
|
|
if (vma->vm->file == fpriv)
|
|
|
|
i915_vma_close(vma);
|
2016-10-28 19:58:29 +07:00
|
|
|
|
|
|
|
if (i915_gem_object_is_active(obj) &&
|
|
|
|
!i915_gem_object_has_active_reference(obj)) {
|
|
|
|
i915_gem_object_set_active_reference(obj);
|
|
|
|
i915_gem_object_get(obj);
|
|
|
|
}
|
2016-08-04 13:52:45 +07:00
|
|
|
mutex_unlock(&obj->base.dev->struct_mutex);
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
static unsigned long to_wait_timeout(s64 timeout_ns)
|
|
|
|
{
|
|
|
|
if (timeout_ns < 0)
|
|
|
|
return MAX_SCHEDULE_TIMEOUT;
|
|
|
|
|
|
|
|
if (timeout_ns == 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return nsecs_to_jiffies_timeout(timeout_ns);
|
|
|
|
}
|
|
|
|
|
2012-05-25 05:03:10 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_wait_ioctl - implements DRM_IOCTL_I915_GEM_WAIT
|
2016-06-03 20:02:17 +07:00
|
|
|
* @dev: drm device pointer
|
|
|
|
* @data: ioctl data blob
|
|
|
|
* @file: drm file pointer
|
2012-05-25 05:03:10 +07:00
|
|
|
*
|
|
|
|
* Returns 0 if successful, else an error is returned with the remaining time in
|
|
|
|
* the timeout parameter.
|
|
|
|
* -ETIME: object is still busy after timeout
|
|
|
|
* -ERESTARTSYS: signal interrupted the wait
|
|
|
|
* -ENONENT: object doesn't exist
|
|
|
|
* Also possible, but rare:
|
|
|
|
* -EAGAIN: GPU wedged
|
|
|
|
* -ENOMEM: damn
|
|
|
|
* -ENODEV: Internal IRQ fail
|
|
|
|
* -E?: The add request failed
|
|
|
|
*
|
|
|
|
* The wait ioctl with a timeout of 0 reimplements the busy ioctl. With any
|
|
|
|
* non-zero timeout parameter the wait ioctl will wait for the given number of
|
|
|
|
* nanoseconds on an object becoming unbusy. Since the wait itself does so
|
|
|
|
* without holding struct_mutex the object may become re-busied before this
|
|
|
|
* function completes. A similar but shorter * race condition exists in the busy
|
|
|
|
* ioctl
|
|
|
|
*/
|
|
|
|
int
|
|
|
|
i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_wait *args = data;
|
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 19:58:27 +07:00
|
|
|
ktime_t start;
|
|
|
|
long ret;
|
2012-05-25 05:03:10 +07:00
|
|
|
|
2014-09-29 20:31:26 +07:00
|
|
|
if (args->flags != 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->bo_handle);
|
2016-08-05 16:14:17 +07:00
|
|
|
if (!obj)
|
2012-05-25 05:03:10 +07:00
|
|
|
return -ENOENT;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
start = ktime_get();
|
|
|
|
|
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE | I915_WAIT_ALL,
|
|
|
|
to_wait_timeout(args->timeout_ns),
|
|
|
|
to_rps_client(file));
|
|
|
|
|
|
|
|
if (args->timeout_ns > 0) {
|
|
|
|
args->timeout_ns -= ktime_to_ns(ktime_sub(ktime_get(), start));
|
|
|
|
if (args->timeout_ns < 0)
|
|
|
|
args->timeout_ns = 0;
|
2015-04-27 19:41:17 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:43 +07:00
|
|
|
i915_gem_object_put(obj);
|
2014-11-25 01:49:28 +07:00
|
|
|
return ret;
|
2012-05-25 05:03:10 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
|
2010-02-19 17:52:00 +07:00
|
|
|
{
|
2016-10-28 19:58:46 +07:00
|
|
|
int ret, i;
|
2010-02-19 17:52:00 +07:00
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
for (i = 0; i < ARRAY_SIZE(tl->engine); i++) {
|
|
|
|
ret = i915_gem_active_wait(&tl->engine[i].last_request, flags);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2016-06-24 20:55:52 +07:00
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int i915_gem_wait_for_idle(struct drm_i915_private *i915, unsigned int flags)
|
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-11-11 21:58:08 +07:00
|
|
|
if (flags & I915_WAIT_LOCKED) {
|
|
|
|
struct i915_gem_timeline *tl;
|
|
|
|
|
|
|
|
lockdep_assert_held(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
list_for_each_entry(tl, &i915->gt.timelines, link) {
|
|
|
|
ret = wait_for_timeline(tl, flags);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
ret = wait_for_timeline(&i915->gt.global_timeline, flags);
|
2010-12-04 18:30:53 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2010-02-19 17:52:00 +07:00
|
|
|
|
2010-02-12 04:29:04 +07:00
|
|
|
return 0;
|
2010-02-19 17:52:00 +07:00
|
|
|
}
|
|
|
|
|
2016-11-06 19:59:59 +07:00
|
|
|
void i915_gem_clflush_object(struct drm_i915_gem_object *obj,
|
|
|
|
bool force)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
/* If we don't have a page list set up, then we're not pinned
|
|
|
|
* to GPU, and we can ignore the cache flush because it'll happen
|
|
|
|
* again at bind time.
|
|
|
|
*/
|
2016-10-28 19:58:35 +07:00
|
|
|
if (!obj->mm.pages)
|
2016-11-06 19:59:59 +07:00
|
|
|
return;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2013-02-14 02:56:05 +07:00
|
|
|
/*
|
|
|
|
* Stolen memory is always coherent with the GPU as it is explicitly
|
|
|
|
* marked as wc by the system, or the system is cache-coherent.
|
|
|
|
*/
|
2014-11-04 19:51:40 +07:00
|
|
|
if (obj->stolen || obj->phys_handle)
|
2016-11-06 19:59:59 +07:00
|
|
|
return;
|
2013-02-14 02:56:05 +07:00
|
|
|
|
2011-03-30 06:59:52 +07:00
|
|
|
/* If the GPU is snooping the contents of the CPU cache,
|
|
|
|
* we do not need to manually clear the CPU cache lines. However,
|
|
|
|
* the caches are only snooped when the render cache is
|
|
|
|
* flushed/invalidated. As we always have to emit invalidations
|
|
|
|
* and flushes when moving into and out of the RENDER domain, correct
|
|
|
|
* snooping behaviour occurs naturally as the result of our domain
|
|
|
|
* tracking.
|
|
|
|
*/
|
2015-01-13 20:32:52 +07:00
|
|
|
if (!force && cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) {
|
|
|
|
obj->cache_dirty = true;
|
2016-11-06 19:59:59 +07:00
|
|
|
return;
|
2015-01-13 20:32:52 +07:00
|
|
|
}
|
2011-03-30 06:59:52 +07:00
|
|
|
|
2009-08-25 17:15:50 +07:00
|
|
|
trace_i915_gem_object_clflush(obj);
|
2016-10-28 19:58:35 +07:00
|
|
|
drm_clflush_sg(obj->mm.pages);
|
2015-01-13 20:32:52 +07:00
|
|
|
obj->cache_dirty = false;
|
2008-11-15 04:35:19 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/** Flushes the GTT write domain for the object if it's dirty. */
|
|
|
|
static void
|
2010-11-09 02:18:58 +07:00
|
|
|
i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj)
|
2008-11-15 04:35:19 +07:00
|
|
|
{
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 23:16:49 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
2009-08-25 17:15:50 +07:00
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_GTT)
|
2008-11-15 04:35:19 +07:00
|
|
|
return;
|
|
|
|
|
2011-01-05 01:42:07 +07:00
|
|
|
/* No actual flushing is required for the GTT write domain. Writes
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 23:16:49 +07:00
|
|
|
* to it "immediately" go to main memory as far as we know, so there's
|
2008-11-15 04:35:19 +07:00
|
|
|
* no chipset flush. It also doesn't land in render cache.
|
2011-01-05 01:42:07 +07:00
|
|
|
*
|
|
|
|
* However, we do have to enforce the order so that all writes through
|
|
|
|
* the GTT land before any writes to the device, such as updates to
|
|
|
|
* the GATT itself.
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 23:16:49 +07:00
|
|
|
*
|
|
|
|
* We also have to wait a bit for the writes to land from the GTT.
|
|
|
|
* An uncached read (i.e. mmio) seems to be ideal for the round-trip
|
|
|
|
* timing. This issue has only been observed when switching quickly
|
|
|
|
* between GTT writes and CPU reads from inside the kernel on recent hw,
|
|
|
|
* and it appears to only affect discrete GTT blocks (i.e. on LLC
|
|
|
|
* system agents we cannot reproduce this behaviour).
|
2008-11-15 04:35:19 +07:00
|
|
|
*/
|
2011-01-05 01:42:07 +07:00
|
|
|
wmb();
|
drm/i915: Wait for writes through the GTT to land before reading back
If we quickly switch from writing through the GTT to a read of the
physical page directly with the CPU (e.g. performing relocations through
the GTT and then running the command parser), we can observe that the
writes are not visible to the CPU. It is not a coherency problem, as
extensive investigations with clflush have demonstrated, but a mere
timing issue - we have to wait for the GTT to complete it's write before
we start our read from the CPU.
The issue can be illustrated in userspace with:
gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE);
gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT);
for (i = 0; i < OBJECT_SIZE / 64; i++) {
int x = 16*i + (i%16);
gtt[x] = i;
clflush(&cpu[x], sizeof(cpu[x]));
assert(cpu[x] == i);
}
Experimenting with that shows that this behaviour is indeed limited to
recent Atom-class hardware.
Testcase: igt/gem_exec_flush/basic-batch-default-cmd #byt
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-10-chris@chris-wilson.co.uk
2016-08-18 23:16:49 +07:00
|
|
|
if (INTEL_GEN(dev_priv) >= 6 && !HAS_LLC(dev_priv))
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
POSTING_READ(RING_ACTHD(dev_priv->engine[RCS]->mmio_base));
|
2011-01-05 01:42:07 +07:00
|
|
|
|
2016-08-18 23:16:44 +07:00
|
|
|
intel_fb_obj_flush(obj, false, write_origin(obj, I915_GEM_DOMAIN_GTT));
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 21:01:59 +07:00
|
|
|
|
2016-08-18 23:16:51 +07:00
|
|
|
obj->base.write_domain = 0;
|
2009-08-25 17:15:50 +07:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains,
|
2016-08-18 23:16:51 +07:00
|
|
|
I915_GEM_DOMAIN_GTT);
|
2008-11-15 04:35:19 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/** Flushes the CPU write domain for the object if it's dirty. */
|
|
|
|
static void
|
2015-01-21 20:53:48 +07:00
|
|
|
i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj)
|
2008-11-15 04:35:19 +07:00
|
|
|
{
|
2010-11-09 02:18:58 +07:00
|
|
|
if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
|
2008-11-15 04:35:19 +07:00
|
|
|
return;
|
|
|
|
|
2016-11-06 19:59:59 +07:00
|
|
|
i915_gem_clflush_object(obj, obj->pin_display);
|
2015-07-08 06:28:51 +07:00
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_CPU);
|
drm/i915: Track frontbuffer invalidation/flushing
So these are the guts of the new beast. This tracks when a frontbuffer
gets invalidated (due to frontbuffer rendering) and hence should be
constantly scaned out, and when it's flushed again and can be
compressed/one-shot-upload.
Rules for flushing are simple: The frontbuffer needs one more full
upload starting from the next vblank. Which means that the flushing
can _only_ be called once the frontbuffer update has been latched.
But this poses a problem for pageflips: We can't just delay the
flushing until the pageflip is latched, since that would pose the risk
that we override frontbuffer rendering that has been scheduled
in-between the pageflip ioctl and the actual latching.
To handle this track asynchronous invalidations (and also pageflip)
state per-ring and delay any in-between flushing until the rendering
has completed. And also cancel any delayed flushing if we get a new
invalidation request (whether delayed or not).
Also call intel_mark_fb_busy in both cases in all cases to make sure
that we keep the screen at the highest refresh rate both on flips,
synchronous plane updates and for frontbuffer rendering.
v2: Lots of improvements
Suggestions from Chris:
- Move invalidate/flush in flush_*_domain and set_to_*_domain.
- Drop the flush in busy_ioctl since it's redundant. Was a leftover
from an earlier concept to track flips/delayed flushes.
- Don't forget about the initial modeset enable/final disable.
Suggested by Chris.
Track flips accurately, too. Since flips complete independently of
rendering we need to track pending flips in a separate mask. Again if
an invalidate happens we need to cancel the evenutal flush to avoid
races.
v3:
Provide correct header declarations for flip functions. Currently not
needed outside of intel_display.c, but part of the proper interface.
v4: Add proper domain management to fbcon so that the fbcon buffer is
also tracked correctly.
v5: Fixup locking around the fbcon set_to_gtt_domain call.
v6: More comments from Chris:
- Split out fbcon changes.
- Drop superflous checks for potential scanout before calling intel_fb
functions - we can micro-optimize this later.
- s/intel_fb_/intel_fb_obj_/ to make it clear that this deals in gem
object. We already have precedence for fb_obj in the pin_and_fence
functions.
v7: Clarify the semantics of the flip flush handling by renaming
things a bit:
- Don't go through a gem object but take the relevant frontbuffer bits
directly. These functions center on the plane, the actual object is
irrelevant - even a flip to the same object as already active should
cause a flush.
- Add a new intel_frontbuffer_flip for synchronous plane updates. It
currently just calls intel_frontbuffer_flush since the implemenation
differs.
This way we achieve a clear split between one-shot update events on
one side and frontbuffer rendering with potentially a very long delay
between the invalidate and flush.
Chris and I also had some discussions about mark_busy and whether it
is appropriate to call from flush. But mark busy is a state which
should be derived from the 3 events (invalidate, flush, flip) we now
have by the users, like psr does by tracking relevant information in
psr.busy_frontbuffer_bits. DRRS (the only real use of mark_busy for
frontbuffer) needs to have similar logic. With that the overall
mark_busy in the core could be removed.
v8: Only when retiring gpu buffers only flush frontbuffer bits we
actually invalidated in a batch. Just for safety since before any
additional usage/invalidate we should always retire current rendering.
Suggested by Chris Wilson.
v9: Actually use intel_frontbuffer_flip in all appropriate places.
Spotted by Chris.
v10: Address more comments from Chris:
- Don't call _flip in set_base when the crtc is inactive, avoids redunancy
in the modeset case with the initial enabling of all planes.
- Add comments explaining that the initial/final plane enable/disable
still has work left to do before it's fully generic.
v11: Only invalidate for gtt/cpu access when writing. Spotted by Chris.
v12: s/_flush/_flip/ in intel_overlay.c per Chris' comment.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-06-19 21:01:59 +07:00
|
|
|
|
2016-08-18 23:16:51 +07:00
|
|
|
obj->base.write_domain = 0;
|
2009-08-25 17:15:50 +07:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains,
|
2016-08-18 23:16:51 +07:00
|
|
|
I915_GEM_DOMAIN_CPU);
|
2008-11-15 04:35:19 +07:00
|
|
|
}
|
|
|
|
|
2008-11-11 01:53:25 +07:00
|
|
|
/**
|
|
|
|
* Moves a single object to the GTT read, and possibly write domain.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @obj: object to act on
|
|
|
|
* @write: ask for write access or read only
|
2008-11-11 01:53:25 +07:00
|
|
|
*
|
|
|
|
* This function returns when the move is complete, including waiting on
|
|
|
|
* flushes to occur.
|
|
|
|
*/
|
DRM: i915: add mode setting support
This commit adds i915 driver support for the DRM mode setting APIs.
Currently, VGA, LVDS, SDVO DVI & VGA, TV and DVO LVDS outputs are
supported. HDMI, DisplayPort and additional SDVO output support will
follow.
Support for the mode setting code is controlled by the new 'modeset'
module option. A new config option, CONFIG_DRM_I915_KMS controls the
default behavior, and whether a PCI ID list is built into the module for
use by user level module utilities.
Note that if mode setting is enabled, user level drivers that access
display registers directly or that don't use the kernel graphics memory
manager will likely corrupt kernel graphics memory, disrupt output
configuration (possibly leading to hangs and/or blank displays), and
prevent panic/oops messages from appearing. So use caution when
enabling this code; be sure your user level code supports the new
interfaces.
A new SysRq key, 'g', provides emergency support for switching back to
the kernel's framebuffer console; which is useful for testing.
Co-authors: Dave Airlie <airlied@linux.ie>, Hong Liu <hong.liu@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2008-11-08 05:24:08 +07:00
|
|
|
int
|
2010-11-23 22:26:33 +07:00
|
|
|
i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
|
2008-11-11 01:53:25 +07:00
|
|
|
{
|
2009-08-25 17:15:50 +07:00
|
|
|
uint32_t old_write_domain, old_read_domains;
|
2008-11-15 04:35:19 +07:00
|
|
|
int ret;
|
2008-11-11 01:53:25 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2016-10-28 19:58:32 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
(write ? I915_WAIT_ALL : 0),
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2011-01-08 00:09:48 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-07-20 15:21:15 +07:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_GTT)
|
|
|
|
return 0;
|
|
|
|
|
2015-01-02 17:59:29 +07:00
|
|
|
/* Flush and acquire obj->pages so that we are coherent through
|
|
|
|
* direct access in memory with previous cached writes through
|
|
|
|
* shmemfs and that our cache domain tracking remains valid.
|
|
|
|
* For example, if the obj->filp was moved to swap without us
|
|
|
|
* being notified and releasing the pages, we would mistakenly
|
|
|
|
* continue to assume that the obj remained out of the CPU cached
|
|
|
|
* domain.
|
|
|
|
*/
|
2016-10-28 19:58:35 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2015-01-02 17:59:29 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2015-01-21 20:53:48 +07:00
|
|
|
i915_gem_object_flush_cpu_write_domain(obj);
|
2009-08-25 17:15:50 +07:00
|
|
|
|
2012-10-10 01:24:37 +07:00
|
|
|
/* Serialise direct access to this object with the barriers for
|
|
|
|
* coherent writes from the GPU, by effectively invalidating the
|
|
|
|
* GTT domain upon first access.
|
|
|
|
*/
|
|
|
|
if ((obj->base.read_domains & I915_GEM_DOMAIN_GTT) == 0)
|
|
|
|
mb();
|
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
old_write_domain = obj->base.write_domain;
|
|
|
|
old_read_domains = obj->base.read_domains;
|
2009-08-25 17:15:50 +07:00
|
|
|
|
2008-11-15 04:35:19 +07:00
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2016-10-28 19:58:41 +07:00
|
|
|
GEM_BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_GTT) != 0);
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_GTT;
|
2008-11-15 04:35:19 +07:00
|
|
|
if (write) {
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_GTT;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_GTT;
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.dirty = true;
|
2008-11-11 01:53:25 +07:00
|
|
|
}
|
|
|
|
|
2009-08-25 17:15:50 +07:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
|
|
|
old_write_domain);
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
2008-11-15 04:35:19 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-10-09 20:11:27 +07:00
|
|
|
/**
|
|
|
|
* Changes the cache-level of an object across all VMA.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @obj: object to act on
|
|
|
|
* @cache_level: new cache level to set for the object
|
2015-10-09 20:11:27 +07:00
|
|
|
*
|
|
|
|
* After this function returns, the object will be in the new cache-level
|
|
|
|
* across all GTT and the contents of the backing storage will be coherent,
|
|
|
|
* with respect to the new cache-level. In order to keep the backing storage
|
|
|
|
* coherent for all users, we only allow a single cache level to be set
|
|
|
|
* globally on the object and prevent it from being changed whilst the
|
|
|
|
* hardware is reading from the object. That is if the object is currently
|
|
|
|
* on the scanout it will be set to uncached (or equivalent display
|
|
|
|
* cache coherency) and all non-MOCS GPU access will also be uncached so
|
|
|
|
* that all direct access to the scanout remains coherent.
|
|
|
|
*/
|
2011-04-04 15:44:39 +07:00
|
|
|
int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
|
|
|
|
enum i915_cache_level cache_level)
|
|
|
|
{
|
2016-08-04 13:52:27 +07:00
|
|
|
struct i915_vma *vma;
|
2016-11-19 04:17:46 +07:00
|
|
|
int ret;
|
2011-04-04 15:44:39 +07:00
|
|
|
|
2016-10-28 19:58:32 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2011-04-04 15:44:39 +07:00
|
|
|
if (obj->cache_level == cache_level)
|
2016-11-19 04:17:46 +07:00
|
|
|
return 0;
|
2011-04-04 15:44:39 +07:00
|
|
|
|
2015-10-09 20:11:27 +07:00
|
|
|
/* Inspect the list of currently bound VMA and unbind any that would
|
|
|
|
* be invalid given the new cache-level. This is principally to
|
|
|
|
* catch the issue of the CS prefetch crossing page boundaries and
|
|
|
|
* reading an invalid PTE on older architectures.
|
|
|
|
*/
|
2016-08-04 13:52:27 +07:00
|
|
|
restart:
|
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
2015-10-09 20:11:27 +07:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
2016-08-04 22:32:30 +07:00
|
|
|
if (i915_vma_is_pinned(vma)) {
|
2015-10-09 20:11:27 +07:00
|
|
|
DRM_DEBUG("can not change the cache level of pinned objects\n");
|
|
|
|
return -EBUSY;
|
|
|
|
}
|
|
|
|
|
2016-08-04 13:52:27 +07:00
|
|
|
if (i915_gem_valid_gtt_space(vma, cache_level))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
|
|
|
/* As unbinding may affect other elements in the
|
|
|
|
* obj->vma_list (due to side-effects from retiring
|
|
|
|
* an active vma), play safe and restart the iterator.
|
|
|
|
*/
|
|
|
|
goto restart;
|
2012-07-26 17:49:32 +07:00
|
|
|
}
|
|
|
|
|
2015-10-09 20:11:27 +07:00
|
|
|
/* We can reuse the existing drm_mm nodes but need to change the
|
|
|
|
* cache-level on the PTE. We could simply unbind them all and
|
|
|
|
* rebind with the correct cache-level on next use. However since
|
|
|
|
* we already have a valid slot, dma mapping, pages etc, we may as
|
|
|
|
* rewrite the PTE in the belief that doing so tramples upon less
|
|
|
|
* state and so involves less work.
|
|
|
|
*/
|
2016-08-04 13:52:26 +07:00
|
|
|
if (obj->bind_count) {
|
2015-10-09 20:11:27 +07:00
|
|
|
/* Before we change the PTE, the GPU must not be accessing it.
|
|
|
|
* If we wait upon the object, we know that all the bound
|
|
|
|
* VMA are no longer active.
|
|
|
|
*/
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
I915_WAIT_ALL,
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2011-04-04 15:44:39 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-11-04 21:42:44 +07:00
|
|
|
if (!HAS_LLC(to_i915(obj->base.dev)) &&
|
|
|
|
cache_level != I915_CACHE_NONE) {
|
2015-10-09 20:11:27 +07:00
|
|
|
/* Access to snoopable pages through the GTT is
|
|
|
|
* incoherent and on some machines causes a hard
|
|
|
|
* lockup. Relinquish the CPU mmaping to force
|
|
|
|
* userspace to refault in the pages and we can
|
|
|
|
* then double check if the GTT mapping is still
|
|
|
|
* valid for that pointer access.
|
|
|
|
*/
|
|
|
|
i915_gem_release_mmap(obj);
|
|
|
|
|
|
|
|
/* As we no longer need a fence for GTT access,
|
|
|
|
* we can relinquish it now (and so prevent having
|
|
|
|
* to steal a fence from someone else on the next
|
|
|
|
* fence request). Note GPU activity would have
|
|
|
|
* dropped the fence as all snoopable access is
|
|
|
|
* supposed to be linear.
|
|
|
|
*/
|
2016-08-18 23:17:00 +07:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
|
|
|
ret = i915_vma_put_fence(vma);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2015-10-09 20:11:27 +07:00
|
|
|
} else {
|
|
|
|
/* We either have incoherent backing store and
|
|
|
|
* so no GTT access or the architecture is fully
|
|
|
|
* coherent. In such cases, existing GTT mmaps
|
|
|
|
* ignore the cache bit in the PTE and we can
|
|
|
|
* rewrite it without confusing the GPU or having
|
|
|
|
* to force userspace to fault back in its mmaps.
|
|
|
|
*/
|
2011-04-04 15:44:39 +07:00
|
|
|
}
|
|
|
|
|
2016-02-26 18:03:19 +07:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link) {
|
2015-10-09 20:11:27 +07:00
|
|
|
if (!drm_mm_node_allocated(&vma->node))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
ret = i915_vma_bind(vma, cache_level, PIN_UPDATE);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
}
|
2011-04-04 15:44:39 +07:00
|
|
|
}
|
|
|
|
|
2016-11-19 04:17:46 +07:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_CPU &&
|
|
|
|
cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
|
|
|
|
obj->cache_dirty = true;
|
|
|
|
|
2016-02-26 18:03:19 +07:00
|
|
|
list_for_each_entry(vma, &obj->vma_list, obj_link)
|
2013-08-09 18:26:45 +07:00
|
|
|
vma->node.color = cache_level;
|
|
|
|
obj->cache_level = cache_level;
|
|
|
|
|
2011-04-04 15:44:39 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-09-22 07:01:20 +07:00
|
|
|
int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
2012-07-10 16:27:08 +07:00
|
|
|
{
|
2012-09-22 07:01:20 +07:00
|
|
|
struct drm_i915_gem_caching *args = data;
|
2012-07-10 16:27:08 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 19:58:42 +07:00
|
|
|
int err = 0;
|
2012-07-10 16:27:08 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
rcu_read_lock();
|
|
|
|
obj = i915_gem_object_lookup_rcu(file, args->handle);
|
|
|
|
if (!obj) {
|
|
|
|
err = -ENOENT;
|
|
|
|
goto out;
|
|
|
|
}
|
2012-07-10 16:27:08 +07:00
|
|
|
|
2013-08-08 20:41:10 +07:00
|
|
|
switch (obj->cache_level) {
|
|
|
|
case I915_CACHE_LLC:
|
|
|
|
case I915_CACHE_L3_LLC:
|
|
|
|
args->caching = I915_CACHING_CACHED;
|
|
|
|
break;
|
|
|
|
|
2013-08-08 20:41:11 +07:00
|
|
|
case I915_CACHE_WT:
|
|
|
|
args->caching = I915_CACHING_DISPLAY;
|
|
|
|
break;
|
|
|
|
|
2013-08-08 20:41:10 +07:00
|
|
|
default:
|
|
|
|
args->caching = I915_CACHING_NONE;
|
|
|
|
break;
|
|
|
|
}
|
2016-10-28 19:58:42 +07:00
|
|
|
out:
|
|
|
|
rcu_read_unlock();
|
|
|
|
return err;
|
2012-07-10 16:27:08 +07:00
|
|
|
}
|
|
|
|
|
2012-09-22 07:01:20 +07:00
|
|
|
int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file)
|
2012-07-10 16:27:08 +07:00
|
|
|
{
|
2016-10-24 19:42:15 +07:00
|
|
|
struct drm_i915_private *i915 = to_i915(dev);
|
2012-09-22 07:01:20 +07:00
|
|
|
struct drm_i915_gem_caching *args = data;
|
2012-07-10 16:27:08 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
enum i915_cache_level level;
|
|
|
|
int ret;
|
|
|
|
|
2012-09-22 07:01:20 +07:00
|
|
|
switch (args->caching) {
|
|
|
|
case I915_CACHING_NONE:
|
2012-07-10 16:27:08 +07:00
|
|
|
level = I915_CACHE_NONE;
|
|
|
|
break;
|
2012-09-22 07:01:20 +07:00
|
|
|
case I915_CACHING_CACHED:
|
2015-08-14 22:43:30 +07:00
|
|
|
/*
|
|
|
|
* Due to a HW issue on BXT A stepping, GPU stores via a
|
|
|
|
* snooped mapping may leave stale data in a corresponding CPU
|
|
|
|
* cacheline, whereas normally such cachelines would get
|
|
|
|
* invalidated.
|
|
|
|
*/
|
2016-10-24 19:42:15 +07:00
|
|
|
if (!HAS_LLC(i915) && !HAS_SNOOP(i915))
|
2015-08-14 22:43:30 +07:00
|
|
|
return -ENODEV;
|
|
|
|
|
2012-07-10 16:27:08 +07:00
|
|
|
level = I915_CACHE_LLC;
|
|
|
|
break;
|
2013-08-08 20:41:11 +07:00
|
|
|
case I915_CACHING_DISPLAY:
|
2016-10-24 19:42:15 +07:00
|
|
|
level = HAS_WT(i915) ? I915_CACHE_WT : I915_CACHE_NONE;
|
2013-08-08 20:41:11 +07:00
|
|
|
break;
|
2012-07-10 16:27:08 +07:00
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2012-09-27 06:15:20 +07:00
|
|
|
ret = i915_mutex_lock_interruptible(dev);
|
|
|
|
if (ret)
|
2016-10-24 19:42:15 +07:00
|
|
|
return ret;
|
2012-09-27 06:15:20 +07:00
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file, args->handle);
|
|
|
|
if (!obj) {
|
2012-07-10 16:27:08 +07:00
|
|
|
ret = -ENOENT;
|
|
|
|
goto unlock;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_cache_level(obj, level);
|
2016-07-20 19:31:53 +07:00
|
|
|
i915_gem_object_put(obj);
|
2012-07-10 16:27:08 +07:00
|
|
|
unlock:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2009-11-25 12:09:39 +07:00
|
|
|
/*
|
2011-04-14 15:41:17 +07:00
|
|
|
* Prepare buffer for display plane (scanout, cursors, etc).
|
|
|
|
* Can be called from an uninterruptible phase (modesetting) and allows
|
|
|
|
* any flushes to be pipelined (for pageflips).
|
2009-11-25 12:09:39 +07:00
|
|
|
*/
|
2016-08-15 16:49:06 +07:00
|
|
|
struct i915_vma *
|
2011-04-14 15:41:17 +07:00
|
|
|
i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
|
|
|
|
u32 alignment,
|
2015-03-23 18:10:33 +07:00
|
|
|
const struct i915_ggtt_view *view)
|
2009-11-25 12:09:39 +07:00
|
|
|
{
|
2016-08-15 16:49:06 +07:00
|
|
|
struct i915_vma *vma;
|
2011-04-14 15:41:17 +07:00
|
|
|
u32 old_read_domains, old_write_domain;
|
2009-11-25 12:09:39 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:32 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2013-08-09 18:25:09 +07:00
|
|
|
/* Mark the pin_display early so that we account for the
|
|
|
|
* display coherency whilst setting up the cache domains.
|
|
|
|
*/
|
2015-04-13 17:50:09 +07:00
|
|
|
obj->pin_display++;
|
2013-08-09 18:25:09 +07:00
|
|
|
|
2011-03-30 06:59:54 +07:00
|
|
|
/* The display engine is not coherent with the LLC cache on gen6. As
|
|
|
|
* a result, we make sure that the pinning that is about to occur is
|
|
|
|
* done with uncached PTEs. This is lowest common denominator for all
|
|
|
|
* chipsets.
|
|
|
|
*
|
|
|
|
* However for gen6+, we could do better by using the GFDT bit instead
|
|
|
|
* of uncaching, which would allow us to flush all the LLC-cached data
|
|
|
|
* with that bit in the PTE to main memory with just one PIPE_CONTROL.
|
|
|
|
*/
|
2013-08-08 20:41:10 +07:00
|
|
|
ret = i915_gem_object_set_cache_level(obj,
|
2016-10-13 17:03:00 +07:00
|
|
|
HAS_WT(to_i915(obj->base.dev)) ?
|
|
|
|
I915_CACHE_WT : I915_CACHE_NONE);
|
2016-08-15 16:49:06 +07:00
|
|
|
if (ret) {
|
|
|
|
vma = ERR_PTR(ret);
|
2013-08-09 18:25:09 +07:00
|
|
|
goto err_unpin_display;
|
2016-08-15 16:49:06 +07:00
|
|
|
}
|
2011-03-30 06:59:54 +07:00
|
|
|
|
2011-04-14 15:41:17 +07:00
|
|
|
/* As the user may map the buffer once pinned in the display plane
|
|
|
|
* (e.g. libkms for the bootup splash), we have to ensure that we
|
2016-08-18 23:17:06 +07:00
|
|
|
* always use map_and_fenceable for all scanout buffers. However,
|
|
|
|
* it may simply be too big to fit into mappable, in which case
|
|
|
|
* put it anyway and hope that userspace can cope (but always first
|
|
|
|
* try to preserve the existing ABI).
|
2011-04-14 15:41:17 +07:00
|
|
|
*/
|
2016-08-18 23:17:06 +07:00
|
|
|
vma = ERR_PTR(-ENOSPC);
|
|
|
|
if (view->type == I915_GGTT_VIEW_NORMAL)
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment,
|
|
|
|
PIN_MAPPABLE | PIN_NONBLOCK);
|
2016-11-07 18:01:28 +07:00
|
|
|
if (IS_ERR(vma)) {
|
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
unsigned int flags;
|
|
|
|
|
|
|
|
/* Valleyview is definitely limited to scanning out the first
|
|
|
|
* 512MiB. Lets presume this behaviour was inherited from the
|
|
|
|
* g4x display engine and that all earlier gen are similarly
|
|
|
|
* limited. Testing suggests that it is a little more
|
|
|
|
* complicated than this. For example, Cherryview appears quite
|
|
|
|
* happy to scanout from anywhere within its global aperture.
|
|
|
|
*/
|
|
|
|
flags = 0;
|
|
|
|
if (HAS_GMCH_DISPLAY(i915))
|
|
|
|
flags = PIN_MAPPABLE;
|
|
|
|
vma = i915_gem_object_ggtt_pin(obj, view, 0, alignment, flags);
|
|
|
|
}
|
2016-08-15 16:49:06 +07:00
|
|
|
if (IS_ERR(vma))
|
2013-08-09 18:25:09 +07:00
|
|
|
goto err_unpin_display;
|
2011-04-14 15:41:17 +07:00
|
|
|
|
2016-08-18 23:17:07 +07:00
|
|
|
vma->display_alignment = max_t(u64, vma->display_alignment, alignment);
|
|
|
|
|
2016-11-19 04:17:46 +07:00
|
|
|
/* Treat this as an end-of-frame, like intel_user_framebuffer_dirty() */
|
|
|
|
if (obj->cache_dirty) {
|
|
|
|
i915_gem_clflush_object(obj, true);
|
|
|
|
intel_fb_obj_flush(obj, false, ORIGIN_DIRTYFB);
|
|
|
|
}
|
2010-05-27 19:18:14 +07:00
|
|
|
|
2011-04-14 15:41:17 +07:00
|
|
|
old_write_domain = obj->base.write_domain;
|
2010-11-09 02:18:58 +07:00
|
|
|
old_read_domains = obj->base.read_domains;
|
2011-04-14 15:41:17 +07:00
|
|
|
|
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2012-07-20 18:41:00 +07:00
|
|
|
obj->base.write_domain = 0;
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_GTT;
|
2009-11-25 12:09:39 +07:00
|
|
|
|
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
2011-04-14 15:41:17 +07:00
|
|
|
old_write_domain);
|
2009-11-25 12:09:39 +07:00
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
return vma;
|
2013-08-09 18:25:09 +07:00
|
|
|
|
|
|
|
err_unpin_display:
|
2015-04-13 17:50:09 +07:00
|
|
|
obj->pin_display--;
|
2016-08-15 16:49:06 +07:00
|
|
|
return vma;
|
2013-08-09 18:25:09 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void
|
2016-08-15 16:49:06 +07:00
|
|
|
i915_gem_object_unpin_from_display_plane(struct i915_vma *vma)
|
2013-08-09 18:25:09 +07:00
|
|
|
{
|
2016-11-29 16:50:08 +07:00
|
|
|
lockdep_assert_held(&vma->vm->i915->drm.struct_mutex);
|
2016-10-28 19:58:32 +07:00
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
if (WARN_ON(vma->obj->pin_display == 0))
|
2015-04-13 17:50:09 +07:00
|
|
|
return;
|
|
|
|
|
2016-08-18 23:17:07 +07:00
|
|
|
if (--vma->obj->pin_display == 0)
|
|
|
|
vma->display_alignment = 0;
|
2015-03-23 18:10:33 +07:00
|
|
|
|
2016-08-18 23:17:08 +07:00
|
|
|
/* Bump the LRU to try and avoid premature eviction whilst flipping */
|
|
|
|
if (!i915_vma_is_active(vma))
|
|
|
|
list_move_tail(&vma->vm_link, &vma->vm->inactive_list);
|
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
i915_vma_unpin(vma);
|
2009-11-25 12:09:39 +07:00
|
|
|
}
|
|
|
|
|
2008-11-15 04:35:19 +07:00
|
|
|
/**
|
|
|
|
* Moves a single object to the CPU read, and possibly write domain.
|
2016-06-03 20:02:17 +07:00
|
|
|
* @obj: object to act on
|
|
|
|
* @write: requesting write or read-only access
|
2008-11-15 04:35:19 +07:00
|
|
|
*
|
|
|
|
* This function returns when the move is complete, including waiting on
|
|
|
|
* flushes to occur.
|
|
|
|
*/
|
2012-03-26 15:10:27 +07:00
|
|
|
int
|
2010-11-12 20:42:53 +07:00
|
|
|
i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
|
2008-11-15 04:35:19 +07:00
|
|
|
{
|
2009-08-25 17:15:50 +07:00
|
|
|
uint32_t old_write_domain, old_read_domains;
|
2008-11-15 04:35:19 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
2016-10-28 19:58:32 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_gem_object_wait(obj,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED |
|
|
|
|
(write ? I915_WAIT_ALL : 0),
|
|
|
|
MAX_SCHEDULE_TIMEOUT,
|
|
|
|
NULL);
|
2011-01-08 00:09:48 +07:00
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
|
2016-07-20 15:21:15 +07:00
|
|
|
if (obj->base.write_domain == I915_GEM_DOMAIN_CPU)
|
|
|
|
return 0;
|
|
|
|
|
2008-11-15 04:35:19 +07:00
|
|
|
i915_gem_object_flush_gtt_write_domain(obj);
|
2008-11-11 01:53:25 +07:00
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
old_write_domain = obj->base.write_domain;
|
|
|
|
old_read_domains = obj->base.read_domains;
|
2009-08-25 17:15:50 +07:00
|
|
|
|
2008-11-15 04:35:19 +07:00
|
|
|
/* Flush the CPU cache if it's still invalid. */
|
2010-11-09 02:18:58 +07:00
|
|
|
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
|
2013-08-09 18:26:45 +07:00
|
|
|
i915_gem_clflush_object(obj, false);
|
2008-11-11 01:53:25 +07:00
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
|
2008-11-11 01:53:25 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
/* It should now be out of any other write domains, and we can update
|
|
|
|
* the domain values for our changes.
|
|
|
|
*/
|
2016-10-28 19:58:41 +07:00
|
|
|
GEM_BUG_ON((obj->base.write_domain & ~I915_GEM_DOMAIN_CPU) != 0);
|
2008-11-15 04:35:19 +07:00
|
|
|
|
|
|
|
/* If we're writing through the CPU, then the GPU read domains will
|
|
|
|
* need to be invalidated at next use.
|
|
|
|
*/
|
|
|
|
if (write) {
|
2010-11-09 02:18:58 +07:00
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
2008-11-15 04:35:19 +07:00
|
|
|
}
|
2008-11-11 01:53:25 +07:00
|
|
|
|
2009-08-25 17:15:50 +07:00
|
|
|
trace_i915_gem_object_change_domain(obj,
|
|
|
|
old_read_domains,
|
|
|
|
old_write_domain);
|
|
|
|
|
2008-11-11 01:53:25 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
/* Throttle our rendering by waiting until the ring has completed our requests
|
|
|
|
* emitted over 20 msec ago.
|
|
|
|
*
|
2009-06-03 14:27:35 +07:00
|
|
|
* Note that if we were to use the current jiffies each time around the loop,
|
|
|
|
* we wouldn't escape the function with any frames outstanding if the time to
|
|
|
|
* render a frame was over 20ms.
|
|
|
|
*
|
2008-07-31 02:06:12 +07:00
|
|
|
* This should get us reasonable parallelism between CPU and GPU but also
|
|
|
|
* relatively low latency when blocking on a particular request to finish.
|
|
|
|
*/
|
2009-03-13 01:23:52 +07:00
|
|
|
static int
|
2010-09-24 22:02:42 +07:00
|
|
|
i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
|
2009-03-13 01:23:52 +07:00
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2010-09-24 22:02:42 +07:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2015-05-22 03:01:48 +07:00
|
|
|
unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES;
|
2014-11-25 01:49:27 +07:00
|
|
|
struct drm_i915_gem_request *request, *target = NULL;
|
2016-10-28 19:58:27 +07:00
|
|
|
long ret;
|
2010-01-31 17:40:48 +07:00
|
|
|
|
drm/i915: Prevent leaking of -EIO from i915_wait_request()
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
2016-04-13 23:35:08 +07:00
|
|
|
/* ABI: return -EIO if already wedged */
|
|
|
|
if (i915_terminally_wedged(&dev_priv->gpu_error))
|
|
|
|
return -EIO;
|
2011-01-26 22:39:14 +07:00
|
|
|
|
2010-09-26 17:03:27 +07:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2010-09-24 22:02:42 +07:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_list) {
|
2009-06-03 14:27:35 +07:00
|
|
|
if (time_after_eq(request->emitted_jiffies, recent_enough))
|
|
|
|
break;
|
2009-03-13 01:23:52 +07:00
|
|
|
|
2015-05-29 23:44:12 +07:00
|
|
|
/*
|
|
|
|
* Note that the request might not have been submitted yet.
|
|
|
|
* In which case emitted_jiffies will be zero.
|
|
|
|
*/
|
|
|
|
if (!request->emitted_jiffies)
|
|
|
|
continue;
|
|
|
|
|
2014-11-25 01:49:27 +07:00
|
|
|
target = request;
|
2009-06-03 14:27:35 +07:00
|
|
|
}
|
2014-11-25 01:49:28 +07:00
|
|
|
if (target)
|
2016-07-20 19:31:49 +07:00
|
|
|
i915_gem_request_get(target);
|
2010-09-26 17:03:27 +07:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
2009-03-13 01:23:52 +07:00
|
|
|
|
2014-11-25 01:49:27 +07:00
|
|
|
if (target == NULL)
|
2010-09-24 22:02:42 +07:00
|
|
|
return 0;
|
2009-04-07 03:55:41 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
ret = i915_wait_request(target,
|
|
|
|
I915_WAIT_INTERRUPTIBLE,
|
|
|
|
MAX_SCHEDULE_TIMEOUT);
|
2016-07-20 19:31:49 +07:00
|
|
|
i915_gem_request_put(target);
|
2014-11-25 01:49:28 +07:00
|
|
|
|
2016-10-28 19:58:27 +07:00
|
|
|
return ret < 0 ? ret : 0;
|
2009-03-13 01:23:52 +07:00
|
|
|
}
|
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
struct i915_vma *
|
2015-03-16 19:11:13 +07:00
|
|
|
i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
|
|
|
|
const struct i915_ggtt_view *view,
|
2016-08-04 22:32:23 +07:00
|
|
|
u64 size,
|
2016-08-04 22:32:22 +07:00
|
|
|
u64 alignment,
|
|
|
|
u64 flags)
|
2015-03-16 19:11:13 +07:00
|
|
|
{
|
2016-10-13 15:55:04 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
|
|
|
|
struct i915_address_space *vm = &dev_priv->ggtt.base;
|
2016-08-04 22:32:31 +07:00
|
|
|
struct i915_vma *vma;
|
|
|
|
int ret;
|
2016-03-30 20:57:10 +07:00
|
|
|
|
2016-10-28 19:58:32 +07:00
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
vma = i915_gem_obj_lookup_or_create_vma(obj, vm, view);
|
2016-08-04 22:32:31 +07:00
|
|
|
if (IS_ERR(vma))
|
2016-08-15 16:49:06 +07:00
|
|
|
return vma;
|
2016-08-04 22:32:31 +07:00
|
|
|
|
|
|
|
if (i915_vma_misplaced(vma, size, alignment, flags)) {
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
(i915_vma_is_pinned(vma) || i915_vma_is_active(vma)))
|
2016-08-15 16:49:06 +07:00
|
|
|
return ERR_PTR(-ENOSPC);
|
2016-08-04 22:32:31 +07:00
|
|
|
|
2016-10-13 15:55:04 +07:00
|
|
|
if (flags & PIN_MAPPABLE) {
|
|
|
|
u32 fence_size;
|
|
|
|
|
|
|
|
fence_size = i915_gem_get_ggtt_size(dev_priv, vma->size,
|
|
|
|
i915_gem_object_get_tiling(obj));
|
|
|
|
/* If the required space is larger than the available
|
|
|
|
* aperture, we will not able to find a slot for the
|
|
|
|
* object and unbinding the object now will be in
|
|
|
|
* vain. Worse, doing so may cause us to ping-pong
|
|
|
|
* the object in and out of the Global GTT and
|
|
|
|
* waste a lot of cycles under the mutex.
|
|
|
|
*/
|
|
|
|
if (fence_size > dev_priv->ggtt.mappable_end)
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
/* If NONBLOCK is set the caller is optimistically
|
|
|
|
* trying to cache the full object within the mappable
|
|
|
|
* aperture, and *must* have a fallback in place for
|
|
|
|
* situations where we cannot bind the object. We
|
|
|
|
* can be a little more lax here and use the fallback
|
|
|
|
* more often to avoid costly migrations of ourselves
|
|
|
|
* and other objects within the aperture.
|
|
|
|
*
|
|
|
|
* Half-the-aperture is used as a simple heuristic.
|
|
|
|
* More interesting would to do search for a free
|
|
|
|
* block prior to making the commitment to unbind.
|
|
|
|
* That caters for the self-harm case, and with a
|
|
|
|
* little more heuristics (e.g. NOFAULT, NOEVICT)
|
|
|
|
* we could try to minimise harm to others.
|
|
|
|
*/
|
|
|
|
if (flags & PIN_NONBLOCK &&
|
|
|
|
fence_size > dev_priv->ggtt.mappable_end / 2)
|
|
|
|
return ERR_PTR(-ENOSPC);
|
|
|
|
}
|
|
|
|
|
2016-08-04 22:32:31 +07:00
|
|
|
WARN(i915_vma_is_pinned(vma),
|
|
|
|
"bo is already pinned in ggtt with incorrect alignment:"
|
2016-08-18 23:16:55 +07:00
|
|
|
" offset=%08x, req.alignment=%llx,"
|
|
|
|
" req.map_and_fenceable=%d, vma->map_and_fenceable=%d\n",
|
|
|
|
i915_ggtt_offset(vma), alignment,
|
2016-08-04 22:32:31 +07:00
|
|
|
!!(flags & PIN_MAPPABLE),
|
2016-08-18 23:16:55 +07:00
|
|
|
i915_vma_is_map_and_fenceable(vma));
|
2016-08-04 22:32:31 +07:00
|
|
|
ret = i915_vma_unbind(vma);
|
|
|
|
if (ret)
|
2016-08-15 16:49:06 +07:00
|
|
|
return ERR_PTR(ret);
|
2016-08-04 22:32:31 +07:00
|
|
|
}
|
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL);
|
|
|
|
if (ret)
|
|
|
|
return ERR_PTR(ret);
|
2015-03-16 19:11:13 +07:00
|
|
|
|
2016-08-15 16:49:06 +07:00
|
|
|
return vma;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-08-09 15:23:33 +07:00
|
|
|
static __always_inline unsigned int __busy_read_flag(unsigned int id)
|
2016-08-05 16:14:18 +07:00
|
|
|
{
|
|
|
|
/* Note that we could alias engines in the execbuf API, but
|
|
|
|
* that would be very unwise as it prevents userspace from
|
|
|
|
* fine control over engine selection. Ahem.
|
|
|
|
*
|
|
|
|
* This should be something like EXEC_MAX_ENGINE instead of
|
|
|
|
* I915_NUM_ENGINES.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(I915_NUM_ENGINES > 16);
|
|
|
|
return 0x10000 << id;
|
|
|
|
}
|
|
|
|
|
|
|
|
static __always_inline unsigned int __busy_write_id(unsigned int id)
|
|
|
|
{
|
2016-08-10 00:08:25 +07:00
|
|
|
/* The uABI guarantees an active writer is also amongst the read
|
|
|
|
* engines. This would be true if we accessed the activity tracking
|
|
|
|
* under the lock, but as we perform the lookup of the object and
|
|
|
|
* its activity locklessly we can not guarantee that the last_write
|
|
|
|
* being active implies that we have set the same engine flag from
|
|
|
|
* last_read - hence we always set both read and write busy for
|
|
|
|
* last_write.
|
|
|
|
*/
|
|
|
|
return id | __busy_read_flag(id);
|
2016-08-05 16:14:18 +07:00
|
|
|
}
|
|
|
|
|
2016-08-09 15:23:33 +07:00
|
|
|
static __always_inline unsigned int
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
__busy_set_if_active(const struct dma_fence *fence,
|
2016-08-05 16:14:18 +07:00
|
|
|
unsigned int (*flag)(unsigned int id))
|
|
|
|
{
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
struct drm_i915_gem_request *rq;
|
2016-08-05 16:14:18 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
/* We have to check the current hw status of the fence as the uABI
|
|
|
|
* guarantees forward progress. We could rely on the idle worker
|
|
|
|
* to eventually flush us, but to minimise latency just ask the
|
|
|
|
* hardware.
|
2016-08-16 15:50:40 +07:00
|
|
|
*
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
* Note we only report on the status of native fences.
|
2016-08-16 15:50:40 +07:00
|
|
|
*/
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
if (!dma_fence_is_i915(fence))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* opencode to_request() in order to avoid const warnings */
|
|
|
|
rq = container_of(fence, struct drm_i915_gem_request, fence);
|
|
|
|
if (i915_gem_request_completed(rq))
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return flag(rq->engine->exec_id);
|
2016-08-05 16:14:18 +07:00
|
|
|
}
|
|
|
|
|
2016-08-09 15:23:33 +07:00
|
|
|
static __always_inline unsigned int
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
busy_check_reader(const struct dma_fence *fence)
|
2016-08-05 16:14:18 +07:00
|
|
|
{
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
return __busy_set_if_active(fence, __busy_read_flag);
|
2016-08-05 16:14:18 +07:00
|
|
|
}
|
|
|
|
|
2016-08-09 15:23:33 +07:00
|
|
|
static __always_inline unsigned int
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
busy_check_writer(const struct dma_fence *fence)
|
2016-08-05 16:14:18 +07:00
|
|
|
{
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
if (!fence)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
return __busy_set_if_active(fence, __busy_write_id);
|
2016-08-05 16:14:18 +07:00
|
|
|
}
|
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
int
|
|
|
|
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_file *file)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
|
|
|
struct drm_i915_gem_busy *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
struct reservation_object_list *list;
|
|
|
|
unsigned int seq;
|
2016-10-28 19:58:42 +07:00
|
|
|
int err;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
err = -ENOENT;
|
2016-10-28 19:58:42 +07:00
|
|
|
rcu_read_lock();
|
|
|
|
obj = i915_gem_object_lookup_rcu(file, args->handle);
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
if (!obj)
|
2016-10-28 19:58:42 +07:00
|
|
|
goto out;
|
2010-05-21 08:08:57 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
/* A discrepancy here is that we do not report the status of
|
|
|
|
* non-i915 fences, i.e. even though we may report the object as idle,
|
|
|
|
* a call to set-domain may still stall waiting for foreign rendering.
|
|
|
|
* This also means that wait-ioctl may report an object as busy,
|
|
|
|
* where busy-ioctl considers it idle.
|
|
|
|
*
|
|
|
|
* We trade the ability to warn of foreign fences to report on which
|
|
|
|
* i915 engines are active for the object.
|
|
|
|
*
|
|
|
|
* Alternatively, we can trade that extra information on read/write
|
|
|
|
* activity with
|
|
|
|
* args->busy =
|
|
|
|
* !reservation_object_test_signaled_rcu(obj->resv, true);
|
|
|
|
* to report the overall busyness. This is what the wait-ioctl does.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
retry:
|
|
|
|
seq = raw_read_seqcount(&obj->resv->seq);
|
2016-01-15 23:51:46 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
/* Translate the exclusive fence to the READ *and* WRITE engine */
|
|
|
|
args->busy = busy_check_writer(rcu_dereference(obj->resv->fence_excl));
|
2016-08-05 16:14:18 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
/* Translate shared fences to READ set of engines */
|
|
|
|
list = rcu_dereference(obj->resv->fence);
|
|
|
|
if (list) {
|
|
|
|
unsigned int shared_count = list->shared_count, i;
|
2016-08-05 16:14:18 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
for (i = 0; i < shared_count; ++i) {
|
|
|
|
struct dma_fence *fence =
|
|
|
|
rcu_dereference(list->shared[i]);
|
|
|
|
|
|
|
|
args->busy |= busy_check_reader(fence);
|
|
|
|
}
|
2016-01-15 23:51:46 +07:00
|
|
|
}
|
2008-07-31 02:06:12 +07:00
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
if (args->busy && read_seqcount_retry(&obj->resv->seq, seq))
|
|
|
|
goto retry;
|
|
|
|
|
|
|
|
err = 0;
|
2016-10-28 19:58:42 +07:00
|
|
|
out:
|
|
|
|
rcu_read_unlock();
|
|
|
|
return err;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
i915_gem_throttle_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2011-08-17 02:34:10 +07:00
|
|
|
return i915_gem_ring_throttle(dev, file_priv);
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2009-09-14 22:50:29 +07:00
|
|
|
int
|
|
|
|
i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
|
|
|
|
struct drm_file *file_priv)
|
|
|
|
{
|
2016-07-04 17:34:36 +07:00
|
|
|
struct drm_i915_private *dev_priv = to_i915(dev);
|
2009-09-14 22:50:29 +07:00
|
|
|
struct drm_i915_gem_madvise *args = data;
|
2010-11-09 02:18:58 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2016-10-28 19:58:37 +07:00
|
|
|
int err;
|
2009-09-14 22:50:29 +07:00
|
|
|
|
|
|
|
switch (args->madv) {
|
|
|
|
case I915_MADV_DONTNEED:
|
|
|
|
case I915_MADV_WILLNEED:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
2016-07-20 19:31:51 +07:00
|
|
|
obj = i915_gem_object_lookup(file_priv, args->handle);
|
2016-10-28 19:58:37 +07:00
|
|
|
if (!obj)
|
|
|
|
return -ENOENT;
|
|
|
|
|
|
|
|
err = mutex_lock_interruptible(&obj->mm.lock);
|
|
|
|
if (err)
|
|
|
|
goto out;
|
2009-09-14 22:50:29 +07:00
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.pages &&
|
2016-08-05 16:14:23 +07:00
|
|
|
i915_gem_object_is_tiled(obj) &&
|
2014-11-20 15:26:30 +07:00
|
|
|
dev_priv->quirks & QUIRK_PIN_SWIZZLED_PAGES) {
|
2016-11-01 17:03:17 +07:00
|
|
|
if (obj->mm.madv == I915_MADV_WILLNEED) {
|
|
|
|
GEM_BUG_ON(!obj->mm.quirked);
|
2016-10-28 19:58:35 +07:00
|
|
|
__i915_gem_object_unpin_pages(obj);
|
2016-11-01 17:03:17 +07:00
|
|
|
obj->mm.quirked = false;
|
|
|
|
}
|
|
|
|
if (args->madv == I915_MADV_WILLNEED) {
|
2016-11-04 17:30:01 +07:00
|
|
|
GEM_BUG_ON(obj->mm.quirked);
|
2016-10-28 19:58:35 +07:00
|
|
|
__i915_gem_object_pin_pages(obj);
|
2016-11-01 17:03:17 +07:00
|
|
|
obj->mm.quirked = true;
|
|
|
|
}
|
2014-11-20 15:26:30 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv != __I915_MADV_PURGED)
|
|
|
|
obj->mm.madv = args->madv;
|
2009-09-14 22:50:29 +07:00
|
|
|
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
/* if the object is no longer attached, discard its backing storage */
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv == I915_MADV_DONTNEED && !obj->mm.pages)
|
2009-09-21 05:13:10 +07:00
|
|
|
i915_gem_object_truncate(obj);
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
args->retained = obj->mm.madv != __I915_MADV_PURGED;
|
2016-10-28 19:58:37 +07:00
|
|
|
mutex_unlock(&obj->mm.lock);
|
2009-09-22 20:24:13 +07:00
|
|
|
|
2016-10-28 19:58:37 +07:00
|
|
|
out:
|
2016-07-20 19:31:53 +07:00
|
|
|
i915_gem_object_put(obj);
|
2016-10-28 19:58:37 +07:00
|
|
|
return err;
|
2009-09-14 22:50:29 +07:00
|
|
|
}
|
|
|
|
|
2016-11-17 02:07:04 +07:00
|
|
|
static void
|
|
|
|
frontbuffer_retire(struct i915_gem_active *active,
|
|
|
|
struct drm_i915_gem_request *request)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj =
|
|
|
|
container_of(active, typeof(*obj), frontbuffer_write);
|
|
|
|
|
|
|
|
intel_fb_obj_flush(obj, true, ORIGIN_CS);
|
|
|
|
}
|
|
|
|
|
2012-06-07 21:38:42 +07:00
|
|
|
void i915_gem_object_init(struct drm_i915_gem_object *obj,
|
|
|
|
const struct drm_i915_gem_object_ops *ops)
|
2012-08-11 21:41:06 +07:00
|
|
|
{
|
2016-10-28 19:58:37 +07:00
|
|
|
mutex_init(&obj->mm.lock);
|
|
|
|
|
2016-11-02 17:16:04 +07:00
|
|
|
INIT_LIST_HEAD(&obj->global_link);
|
2016-10-24 19:42:14 +07:00
|
|
|
INIT_LIST_HEAD(&obj->userfault_link);
|
2013-08-14 16:38:33 +07:00
|
|
|
INIT_LIST_HEAD(&obj->obj_exec_link);
|
2013-07-18 02:19:03 +07:00
|
|
|
INIT_LIST_HEAD(&obj->vma_list);
|
2015-04-07 22:20:38 +07:00
|
|
|
INIT_LIST_HEAD(&obj->batch_pool_link);
|
2012-08-11 21:41:06 +07:00
|
|
|
|
2012-06-07 21:38:42 +07:00
|
|
|
obj->ops = ops;
|
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
reservation_object_init(&obj->__builtin_resv);
|
|
|
|
obj->resv = &obj->__builtin_resv;
|
|
|
|
|
2016-08-18 23:17:04 +07:00
|
|
|
obj->frontbuffer_ggtt_origin = ORIGIN_GTT;
|
2016-11-17 02:07:04 +07:00
|
|
|
init_request_active(&obj->frontbuffer_write, frontbuffer_retire);
|
2016-10-28 19:58:35 +07:00
|
|
|
|
|
|
|
obj->mm.madv = I915_MADV_WILLNEED;
|
|
|
|
INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN);
|
|
|
|
mutex_init(&obj->mm.get_page.lock);
|
2012-08-11 21:41:06 +07:00
|
|
|
|
2016-07-04 17:34:37 +07:00
|
|
|
i915_gem_info_add_obj(to_i915(obj->base.dev), obj->base.size);
|
2012-08-11 21:41:06 +07:00
|
|
|
}
|
|
|
|
|
2012-06-07 21:38:42 +07:00
|
|
|
static const struct drm_i915_gem_object_ops i915_gem_object_ops = {
|
2016-11-01 21:44:10 +07:00
|
|
|
.flags = I915_GEM_OBJECT_HAS_STRUCT_PAGE |
|
|
|
|
I915_GEM_OBJECT_IS_SHRINKABLE,
|
2012-06-07 21:38:42 +07:00
|
|
|
.get_pages = i915_gem_object_get_pages_gtt,
|
|
|
|
.put_pages = i915_gem_object_put_pages_gtt,
|
|
|
|
};
|
|
|
|
|
2016-10-18 19:02:49 +07:00
|
|
|
/* Note we don't consider signbits :| */
|
|
|
|
#define overflows_type(x, T) \
|
|
|
|
(sizeof(x) > sizeof(T) && (x) >> (sizeof(T) * BITS_PER_BYTE))
|
|
|
|
|
|
|
|
struct drm_i915_gem_object *
|
2016-12-01 21:16:37 +07:00
|
|
|
i915_gem_object_create(struct drm_i915_private *dev_priv, u64 size)
|
2010-04-10 02:05:06 +07:00
|
|
|
{
|
2010-04-10 02:05:07 +07:00
|
|
|
struct drm_i915_gem_object *obj;
|
2011-06-28 06:18:18 +07:00
|
|
|
struct address_space *mapping;
|
2012-11-30 04:18:51 +07:00
|
|
|
gfp_t mask;
|
2016-04-25 19:32:13 +07:00
|
|
|
int ret;
|
2010-04-10 02:05:06 +07:00
|
|
|
|
2016-10-18 19:02:49 +07:00
|
|
|
/* There is a prevalence of the assumption that we fit the object's
|
|
|
|
* page count inside a 32bit _signed_ variable. Let's document this and
|
|
|
|
* catch if we ever need to fix it. In the meantime, if you do spot
|
|
|
|
* such a local variable, please consider fixing!
|
|
|
|
*/
|
|
|
|
if (WARN_ON(size >> PAGE_SHIFT > INT_MAX))
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
|
|
|
if (overflows_type(size, obj->base.size))
|
|
|
|
return ERR_PTR(-E2BIG);
|
|
|
|
|
2016-12-01 21:16:36 +07:00
|
|
|
obj = i915_gem_object_alloc(dev_priv);
|
2010-04-10 02:05:07 +07:00
|
|
|
if (obj == NULL)
|
2016-04-25 19:32:13 +07:00
|
|
|
return ERR_PTR(-ENOMEM);
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-12-01 21:16:37 +07:00
|
|
|
ret = drm_gem_object_init(&dev_priv->drm, &obj->base, size);
|
2016-04-25 19:32:13 +07:00
|
|
|
if (ret)
|
|
|
|
goto fail;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2012-05-25 02:48:12 +07:00
|
|
|
mask = GFP_HIGHUSER | __GFP_RECLAIMABLE;
|
2016-12-07 17:13:04 +07:00
|
|
|
if (IS_I965GM(dev_priv) || IS_I965G(dev_priv)) {
|
2012-05-25 02:48:12 +07:00
|
|
|
/* 965gm cannot relocate objects above 4GiB. */
|
|
|
|
mask &= ~__GFP_HIGHMEM;
|
|
|
|
mask |= __GFP_DMA32;
|
|
|
|
}
|
|
|
|
|
2015-12-05 11:45:44 +07:00
|
|
|
mapping = obj->base.filp->f_mapping;
|
2012-05-25 02:48:12 +07:00
|
|
|
mapping_set_gfp_mask(mapping, mask);
|
2011-06-28 06:18:18 +07:00
|
|
|
|
2012-06-07 21:38:42 +07:00
|
|
|
i915_gem_object_init(obj, &i915_gem_object_ops);
|
2010-09-30 17:46:12 +07:00
|
|
|
|
2010-04-10 02:05:07 +07:00
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-11-04 21:42:44 +07:00
|
|
|
if (HAS_LLC(dev_priv)) {
|
2012-01-17 23:43:53 +07:00
|
|
|
/* On some devices, we can have the GPU use the LLC (the CPU
|
2011-03-30 06:59:55 +07:00
|
|
|
* cache) for about a 10% performance improvement
|
|
|
|
* compared to uncached. Graphics requests other than
|
|
|
|
* display scanout are coherent with the CPU in
|
|
|
|
* accessing this cache. This means in this mode we
|
|
|
|
* don't need to clflush on the CPU side, and on the
|
|
|
|
* GPU side we only need to flush internal caches to
|
|
|
|
* get data visible to the CPU.
|
|
|
|
*
|
|
|
|
* However, we maintain the display planes as UC, and so
|
|
|
|
* need to rebind when first used as such.
|
|
|
|
*/
|
|
|
|
obj->cache_level = I915_CACHE_LLC;
|
|
|
|
} else
|
|
|
|
obj->cache_level = I915_CACHE_NONE;
|
|
|
|
|
2013-07-25 04:25:03 +07:00
|
|
|
trace_i915_gem_object_create(obj);
|
|
|
|
|
2010-11-09 02:18:58 +07:00
|
|
|
return obj;
|
2016-04-25 19:32:13 +07:00
|
|
|
|
|
|
|
fail:
|
|
|
|
i915_gem_object_free(obj);
|
|
|
|
return ERR_PTR(ret);
|
2010-04-10 02:05:07 +07:00
|
|
|
}
|
|
|
|
|
2014-05-22 15:16:52 +07:00
|
|
|
static bool discard_backing_storage(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
/* If we are the last user of the backing storage (be it shmemfs
|
|
|
|
* pages or stolen etc), we know that the pages are going to be
|
|
|
|
* immediately released. In this case, we can then skip copying
|
|
|
|
* back the contents from the GPU.
|
|
|
|
*/
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
if (obj->mm.madv != I915_MADV_WILLNEED)
|
2014-05-22 15:16:52 +07:00
|
|
|
return false;
|
|
|
|
|
|
|
|
if (obj->base.filp == NULL)
|
|
|
|
return true;
|
|
|
|
|
|
|
|
/* At first glance, this looks racy, but then again so would be
|
|
|
|
* userspace racing mmap against close. However, the first external
|
|
|
|
* reference to the filp can only be obtained through the
|
|
|
|
* i915_gem_mmap_ioctl() which safeguards us against the user
|
|
|
|
* acquiring such a reference whilst we are in the middle of
|
|
|
|
* freeing the object.
|
|
|
|
*/
|
|
|
|
return atomic_long_read(&obj->base.filp->f_count) == 1;
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
static void __i915_gem_free_objects(struct drm_i915_private *i915,
|
|
|
|
struct llist_node *freed)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2016-10-28 19:58:42 +07:00
|
|
|
struct drm_i915_gem_object *obj, *on;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
mutex_lock(&i915->drm.struct_mutex);
|
|
|
|
intel_runtime_pm_get(i915);
|
|
|
|
llist_for_each_entry(obj, freed, freed) {
|
|
|
|
struct i915_vma *vma, *vn;
|
|
|
|
|
|
|
|
trace_i915_gem_object_destroy(obj);
|
|
|
|
|
|
|
|
GEM_BUG_ON(i915_gem_object_is_active(obj));
|
|
|
|
list_for_each_entry_safe(vma, vn,
|
|
|
|
&obj->vma_list, obj_link) {
|
|
|
|
GEM_BUG_ON(!i915_vma_is_ggtt(vma));
|
|
|
|
GEM_BUG_ON(i915_vma_is_active(vma));
|
|
|
|
vma->flags &= ~I915_VMA_PIN_MASK;
|
|
|
|
i915_vma_close(vma);
|
|
|
|
}
|
2016-11-01 18:54:00 +07:00
|
|
|
GEM_BUG_ON(!list_empty(&obj->vma_list));
|
|
|
|
GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma_tree));
|
2016-10-28 19:58:42 +07:00
|
|
|
|
2016-11-02 17:16:04 +07:00
|
|
|
list_del(&obj->global_link);
|
2016-10-28 19:58:42 +07:00
|
|
|
}
|
|
|
|
intel_runtime_pm_put(i915);
|
|
|
|
mutex_unlock(&i915->drm.struct_mutex);
|
|
|
|
|
|
|
|
llist_for_each_entry_safe(obj, on, freed, freed) {
|
|
|
|
GEM_BUG_ON(obj->bind_count);
|
|
|
|
GEM_BUG_ON(atomic_read(&obj->frontbuffer_bits));
|
|
|
|
|
|
|
|
if (obj->ops->release)
|
|
|
|
obj->ops->release(obj);
|
2013-11-28 03:20:34 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
if (WARN_ON(i915_gem_object_has_pinned_pages(obj)))
|
|
|
|
atomic_set(&obj->mm.pages_pin_count, 0);
|
2016-11-01 19:11:34 +07:00
|
|
|
__i915_gem_object_put_pages(obj, I915_MM_NORMAL);
|
2016-10-28 19:58:42 +07:00
|
|
|
GEM_BUG_ON(obj->mm.pages);
|
|
|
|
|
|
|
|
if (obj->base.import_attach)
|
|
|
|
drm_prime_gem_destroy(&obj->base, NULL);
|
|
|
|
|
drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)
v2: Reimplement busy-ioctl (by walking the reservation object), postpone
the ABI change for another day. Similarly use the reservation object to
find the last_write request (if active and from i915) for choosing
display CS flips.
Caveats:
* busy-ioctl: busy-ioctl only reports on the native fences, it will not
warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is
being rendered to by external fences. It also will not report the same
busy state as wait-ioctl (or polling on the dma-buf) in the same
circumstances. On the plus side, it does retain reporting of which
*i915* engines are engaged with this object.
* non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering, see
"drm/i915: Restore nonblocking awaits for modesetting"
* dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 66s), mainly due to atomic operations
(maintaining the fence refcounts).
* loss of object-level retirement callbacks, emulated by VMA retirement
tracking.
* minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk
2016-10-28 19:58:44 +07:00
|
|
|
reservation_object_fini(&obj->__builtin_resv);
|
2016-10-28 19:58:42 +07:00
|
|
|
drm_gem_object_release(&obj->base);
|
|
|
|
i915_gem_info_remove_obj(i915, obj->base.size);
|
|
|
|
|
|
|
|
kfree(obj->bit_17);
|
|
|
|
i915_gem_object_free(obj);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static void i915_gem_flush_free_objects(struct drm_i915_private *i915)
|
|
|
|
{
|
|
|
|
struct llist_node *freed;
|
|
|
|
|
|
|
|
freed = llist_del_all(&i915->mm.free_list);
|
|
|
|
if (unlikely(freed))
|
|
|
|
__i915_gem_free_objects(i915, freed);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void __i915_gem_free_work(struct work_struct *work)
|
|
|
|
{
|
|
|
|
struct drm_i915_private *i915 =
|
|
|
|
container_of(work, struct drm_i915_private, mm.free_work);
|
|
|
|
struct llist_node *freed;
|
2011-03-20 18:20:19 +07:00
|
|
|
|
2016-08-04 13:52:45 +07:00
|
|
|
/* All file-owned VMA should have been released by this point through
|
|
|
|
* i915_gem_close_object(), or earlier by i915_gem_context_close().
|
|
|
|
* However, the object may also be bound into the global GTT (e.g.
|
|
|
|
* older GPUs without per-process support, or for direct access through
|
|
|
|
* the GTT either for the user or for scanout). Those VMA still need to
|
|
|
|
* unbound now.
|
|
|
|
*/
|
2012-04-24 21:47:31 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
while ((freed = llist_del_all(&i915->mm.free_list)))
|
|
|
|
__i915_gem_free_objects(i915, freed);
|
|
|
|
}
|
2014-06-19 04:28:09 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
static void __i915_gem_free_object_rcu(struct rcu_head *head)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj =
|
|
|
|
container_of(head, typeof(*obj), rcu);
|
|
|
|
struct drm_i915_private *i915 = to_i915(obj->base.dev);
|
|
|
|
|
|
|
|
/* We can't simply use call_rcu() from i915_gem_free_object()
|
|
|
|
* as we need to block whilst unbinding, and the call_rcu
|
|
|
|
* task may be called from softirq context. So we take a
|
|
|
|
* detour through a worker.
|
|
|
|
*/
|
|
|
|
if (llist_add(&obj->freed, &i915->mm.free_list))
|
|
|
|
schedule_work(&i915->mm.free_work);
|
|
|
|
}
|
2014-11-20 15:26:30 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
void i915_gem_free_object(struct drm_gem_object *gem_obj)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj = to_intel_bo(gem_obj);
|
2016-10-28 19:58:35 +07:00
|
|
|
|
2016-11-01 17:03:17 +07:00
|
|
|
if (obj->mm.quirked)
|
|
|
|
__i915_gem_object_unpin_pages(obj);
|
|
|
|
|
2014-05-22 15:16:52 +07:00
|
|
|
if (discard_backing_storage(obj))
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.madv = I915_MADV_DONTNEED;
|
2008-11-13 01:03:55 +07:00
|
|
|
|
2016-10-28 19:58:42 +07:00
|
|
|
/* Before we free the object, make sure any pure RCU-only
|
|
|
|
* read-side critical sections are complete, e.g.
|
|
|
|
* i915_gem_busy_ioctl(). For the corresponding synchronized
|
|
|
|
* lookup see i915_gem_object_lookup_rcu().
|
|
|
|
*/
|
|
|
|
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:29 +07:00
|
|
|
void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj)
|
|
|
|
{
|
|
|
|
lockdep_assert_held(&obj->base.dev->struct_mutex);
|
|
|
|
|
|
|
|
GEM_BUG_ON(i915_gem_object_has_active_reference(obj));
|
|
|
|
if (i915_gem_object_is_active(obj))
|
|
|
|
i915_gem_object_set_active_reference(obj);
|
|
|
|
else
|
|
|
|
i915_gem_object_put(obj);
|
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:47 +07:00
|
|
|
static void assert_kernel_context_is_current(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct intel_engine_cs *engine;
|
|
|
|
enum intel_engine_id id;
|
|
|
|
|
|
|
|
for_each_engine(engine, dev_priv, id)
|
|
|
|
GEM_BUG_ON(engine->last_context != dev_priv->kernel_context);
|
|
|
|
}
|
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
int i915_gem_suspend(struct drm_i915_private *dev_priv)
|
2010-01-07 17:39:13 +07:00
|
|
|
{
|
2016-12-01 21:16:38 +07:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2016-08-05 16:14:11 +07:00
|
|
|
int ret;
|
2008-11-14 06:00:55 +07:00
|
|
|
|
2016-07-22 03:16:19 +07:00
|
|
|
intel_suspend_gt_powersave(dev_priv);
|
|
|
|
|
2013-10-16 17:50:01 +07:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2016-07-15 20:56:20 +07:00
|
|
|
|
|
|
|
/* We have to flush all the executing contexts to main memory so
|
|
|
|
* that they can saved in the hibernation image. To ensure the last
|
|
|
|
* context image is coherent, we have to switch away from it. That
|
|
|
|
* leaves the dev_priv->kernel_context still active when
|
|
|
|
* we actually suspend, and its image in memory may not match the GPU
|
|
|
|
* state. Fortunately, the kernel_context is disposable and we do
|
|
|
|
* not rely on its state.
|
|
|
|
*/
|
|
|
|
ret = i915_gem_switch_to_kernel_context(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
goto err;
|
|
|
|
|
2016-09-09 20:11:50 +07:00
|
|
|
ret = i915_gem_wait_for_idle(dev_priv,
|
|
|
|
I915_WAIT_INTERRUPTIBLE |
|
|
|
|
I915_WAIT_LOCKED);
|
2013-09-14 05:57:04 +07:00
|
|
|
if (ret)
|
2013-10-16 17:50:01 +07:00
|
|
|
goto err;
|
2013-09-14 05:57:04 +07:00
|
|
|
|
2016-05-06 21:40:21 +07:00
|
|
|
i915_gem_retire_requests(dev_priv);
|
2016-10-28 19:58:56 +07:00
|
|
|
GEM_BUG_ON(dev_priv->gt.active_requests);
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2016-10-28 19:58:47 +07:00
|
|
|
assert_kernel_context_is_current(dev_priv);
|
2016-04-28 15:56:41 +07:00
|
|
|
i915_gem_context_lost(dev_priv);
|
2013-10-16 17:50:01 +07:00
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
|
2015-01-26 23:03:03 +07:00
|
|
|
cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
|
2016-07-04 14:08:31 +07:00
|
|
|
cancel_delayed_work_sync(&dev_priv->gt.retire_work);
|
|
|
|
flush_delayed_work(&dev_priv->gt.idle_work);
|
2016-10-28 19:58:42 +07:00
|
|
|
flush_work(&dev_priv->mm.free_work);
|
2010-01-07 17:39:13 +07:00
|
|
|
|
2014-11-25 18:56:33 +07:00
|
|
|
/* Assert that we sucessfully flushed all the work and
|
|
|
|
* reset the GPU back to its idle, low power state.
|
|
|
|
*/
|
2016-07-04 14:08:31 +07:00
|
|
|
WARN_ON(dev_priv->gt.awake);
|
2016-11-07 16:20:05 +07:00
|
|
|
WARN_ON(!intel_execlists_idle(dev_priv));
|
2014-11-25 18:56:33 +07:00
|
|
|
|
2016-10-12 21:46:37 +07:00
|
|
|
/*
|
|
|
|
* Neither the BIOS, ourselves or any other kernel
|
|
|
|
* expects the system to be in execlists mode on startup,
|
|
|
|
* so we need to reset the GPU back to legacy mode. And the only
|
|
|
|
* known way to disable logical contexts is through a GPU reset.
|
|
|
|
*
|
|
|
|
* So in order to leave the system in a known default configuration,
|
|
|
|
* always reset the GPU upon unload and suspend. Afterwards we then
|
|
|
|
* clean up the GEM state tracking, flushing off the requests and
|
|
|
|
* leaving the system in a known idle state.
|
|
|
|
*
|
|
|
|
* Note that is of the upmost importance that the GPU is idle and
|
|
|
|
* all stray writes are flushed *before* we dismantle the backing
|
|
|
|
* storage for the pinned objects.
|
|
|
|
*
|
|
|
|
* However, since we are uncertain that resetting the GPU on older
|
|
|
|
* machines is a good idea, we don't - just in case it leaves the
|
|
|
|
* machine in an unusable condition.
|
|
|
|
*/
|
2016-11-04 21:42:44 +07:00
|
|
|
if (HAS_HW_CONTEXTS(dev_priv)) {
|
2016-10-12 21:46:37 +07:00
|
|
|
int reset = intel_gpu_reset(dev_priv, ALL_ENGINES);
|
|
|
|
WARN_ON(reset && reset != -ENODEV);
|
|
|
|
}
|
|
|
|
|
2008-07-31 02:06:12 +07:00
|
|
|
return 0;
|
2013-10-16 17:50:01 +07:00
|
|
|
|
|
|
|
err:
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
return ret;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
void i915_gem_resume(struct drm_i915_private *dev_priv)
|
2016-07-15 20:56:20 +07:00
|
|
|
{
|
2016-12-01 21:16:38 +07:00
|
|
|
struct drm_device *dev = &dev_priv->drm;
|
2016-07-15 20:56:20 +07:00
|
|
|
|
2016-11-07 16:20:05 +07:00
|
|
|
WARN_ON(dev_priv->gt.awake);
|
|
|
|
|
2016-07-15 20:56:20 +07:00
|
|
|
mutex_lock(&dev->struct_mutex);
|
2016-11-16 15:55:34 +07:00
|
|
|
i915_gem_restore_gtt_mappings(dev_priv);
|
2016-07-15 20:56:20 +07:00
|
|
|
|
|
|
|
/* As we didn't flush the kernel context before suspend, we cannot
|
|
|
|
* guarantee that the context image is complete. So let's just reset
|
|
|
|
* it and start again.
|
|
|
|
*/
|
2016-09-09 20:11:53 +07:00
|
|
|
dev_priv->gt.resume(dev_priv);
|
2016-07-15 20:56:20 +07:00
|
|
|
|
|
|
|
mutex_unlock(&dev->struct_mutex);
|
|
|
|
}
|
|
|
|
|
2016-11-16 15:55:31 +07:00
|
|
|
void i915_gem_init_swizzling(struct drm_i915_private *dev_priv)
|
2012-02-02 15:58:12 +07:00
|
|
|
{
|
2016-11-16 15:55:31 +07:00
|
|
|
if (INTEL_GEN(dev_priv) < 5 ||
|
2012-02-02 15:58:12 +07:00
|
|
|
dev_priv->mm.bit_6_swizzle_x == I915_BIT_6_SWIZZLE_NONE)
|
|
|
|
return;
|
|
|
|
|
|
|
|
I915_WRITE(DISP_ARB_CTL, I915_READ(DISP_ARB_CTL) |
|
|
|
|
DISP_TILE_SURFACE_SWIZZLING);
|
|
|
|
|
2016-10-13 17:03:10 +07:00
|
|
|
if (IS_GEN5(dev_priv))
|
2012-01-31 22:47:55 +07:00
|
|
|
return;
|
|
|
|
|
2012-02-02 15:58:12 +07:00
|
|
|
I915_WRITE(TILECTL, I915_READ(TILECTL) | TILECTL_SWZCTL);
|
2016-10-13 17:03:10 +07:00
|
|
|
if (IS_GEN6(dev_priv))
|
2012-04-24 19:04:12 +07:00
|
|
|
I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_SNB));
|
2016-10-13 17:03:10 +07:00
|
|
|
else if (IS_GEN7(dev_priv))
|
2012-04-24 19:04:12 +07:00
|
|
|
I915_WRITE(ARB_MODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_IVB));
|
2016-10-13 17:03:10 +07:00
|
|
|
else if (IS_GEN8(dev_priv))
|
2013-11-03 11:07:04 +07:00
|
|
|
I915_WRITE(GAMTARBMODE, _MASKED_BIT_ENABLE(ARB_MODE_SWIZZLE_BDW));
|
2012-12-19 01:31:23 +07:00
|
|
|
else
|
|
|
|
BUG();
|
2012-02-02 15:58:12 +07:00
|
|
|
}
|
2012-02-10 02:53:27 +07:00
|
|
|
|
2016-10-13 17:02:58 +07:00
|
|
|
static void init_unused_ring(struct drm_i915_private *dev_priv, u32 base)
|
2014-08-15 05:21:55 +07:00
|
|
|
{
|
|
|
|
I915_WRITE(RING_CTL(base), 0);
|
|
|
|
I915_WRITE(RING_HEAD(base), 0);
|
|
|
|
I915_WRITE(RING_TAIL(base), 0);
|
|
|
|
I915_WRITE(RING_START(base), 0);
|
|
|
|
}
|
|
|
|
|
2016-10-13 17:02:58 +07:00
|
|
|
static void init_unused_rings(struct drm_i915_private *dev_priv)
|
2014-08-15 05:21:55 +07:00
|
|
|
{
|
2016-10-13 17:02:58 +07:00
|
|
|
if (IS_I830(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, PRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB0_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB2_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB3_BASE);
|
|
|
|
} else if (IS_GEN2(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, SRB0_BASE);
|
|
|
|
init_unused_ring(dev_priv, SRB1_BASE);
|
|
|
|
} else if (IS_GEN3(dev_priv)) {
|
|
|
|
init_unused_ring(dev_priv, PRB1_BASE);
|
|
|
|
init_unused_ring(dev_priv, PRB2_BASE);
|
2014-08-15 05:21:55 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-02-09 02:49:24 +07:00
|
|
|
int
|
2016-12-01 21:16:38 +07:00
|
|
|
i915_gem_init_hw(struct drm_i915_private *dev_priv)
|
2013-02-09 02:49:24 +07:00
|
|
|
{
|
2016-03-16 18:00:36 +07:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
enum intel_engine_id id;
|
2016-04-28 15:56:44 +07:00
|
|
|
int ret;
|
2013-02-09 02:49:24 +07:00
|
|
|
|
2016-10-25 19:16:02 +07:00
|
|
|
dev_priv->gt.last_init_time = ktime_get();
|
|
|
|
|
2015-02-13 21:35:59 +07:00
|
|
|
/* Double layer security blanket, see i915_gem_init() */
|
|
|
|
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
|
|
|
|
|
2016-11-04 21:42:44 +07:00
|
|
|
if (HAS_EDRAM(dev_priv) && INTEL_GEN(dev_priv) < 9)
|
2013-07-05 01:02:04 +07:00
|
|
|
I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
|
2013-02-09 02:49:24 +07:00
|
|
|
|
2016-10-13 17:03:01 +07:00
|
|
|
if (IS_HASWELL(dev_priv))
|
2016-10-13 17:02:58 +07:00
|
|
|
I915_WRITE(MI_PREDICATE_RESULT_2, IS_HSW_GT3(dev_priv) ?
|
2013-11-29 19:56:12 +07:00
|
|
|
LOWER_SLICE_ENABLED : LOWER_SLICE_DISABLED);
|
2013-08-29 02:45:46 +07:00
|
|
|
|
2016-10-13 17:02:53 +07:00
|
|
|
if (HAS_PCH_NOP(dev_priv)) {
|
2016-10-14 16:13:06 +07:00
|
|
|
if (IS_IVYBRIDGE(dev_priv)) {
|
2014-01-23 05:39:30 +07:00
|
|
|
u32 temp = I915_READ(GEN7_MSG_CTL);
|
|
|
|
temp &= ~(WAIT_FOR_PCH_FLR_ACK | WAIT_FOR_PCH_RESET_ACK);
|
|
|
|
I915_WRITE(GEN7_MSG_CTL, temp);
|
2016-11-16 15:55:31 +07:00
|
|
|
} else if (INTEL_GEN(dev_priv) >= 7) {
|
2014-01-23 05:39:30 +07:00
|
|
|
u32 temp = I915_READ(HSW_NDE_RSTWRN_OPT);
|
|
|
|
temp &= ~RESET_PCH_HANDSHAKE_ENABLE;
|
|
|
|
I915_WRITE(HSW_NDE_RSTWRN_OPT, temp);
|
|
|
|
}
|
2013-04-06 03:12:43 +07:00
|
|
|
}
|
|
|
|
|
2016-11-16 15:55:31 +07:00
|
|
|
i915_gem_init_swizzling(dev_priv);
|
2013-02-09 02:49:24 +07:00
|
|
|
|
2014-11-20 15:45:19 +07:00
|
|
|
/*
|
|
|
|
* At least 830 can leave some of the unused rings
|
|
|
|
* "active" (ie. head != tail) after resume which
|
|
|
|
* will prevent c3 entry. Makes sure all unused rings
|
|
|
|
* are totally idle.
|
|
|
|
*/
|
2016-10-13 17:02:58 +07:00
|
|
|
init_unused_rings(dev_priv);
|
2014-11-20 15:45:19 +07:00
|
|
|
|
2016-01-20 02:02:54 +07:00
|
|
|
BUG_ON(!dev_priv->kernel_context);
|
2015-05-29 23:43:37 +07:00
|
|
|
|
2016-11-16 15:55:31 +07:00
|
|
|
ret = i915_ppgtt_init_hw(dev_priv);
|
2015-06-18 19:11:20 +07:00
|
|
|
if (ret) {
|
|
|
|
DRM_ERROR("PPGTT enable HW failed %d\n", ret);
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Need to do basic initialisation of all rings first: */
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
for_each_engine(engine, dev_priv, id) {
|
2016-03-16 18:00:36 +07:00
|
|
|
ret = engine->init_hw(engine);
|
2014-11-20 06:33:07 +07:00
|
|
|
if (ret)
|
2015-02-13 21:35:59 +07:00
|
|
|
goto out;
|
2014-11-20 06:33:07 +07:00
|
|
|
}
|
2013-01-22 19:12:17 +07:00
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
intel_mocs_init_l3cc_table(dev_priv);
|
2016-04-13 21:03:25 +07:00
|
|
|
|
2015-08-12 21:43:36 +07:00
|
|
|
/* We can't enable contexts until all firmware is loaded */
|
2016-12-01 21:16:38 +07:00
|
|
|
ret = intel_guc_setup(dev_priv);
|
2016-06-07 15:14:49 +07:00
|
|
|
if (ret)
|
|
|
|
goto out;
|
2015-08-12 21:43:36 +07:00
|
|
|
|
2015-02-13 21:35:59 +07:00
|
|
|
out:
|
|
|
|
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 05:11:04 +07:00
|
|
|
return ret;
|
2010-05-21 08:08:55 +07:00
|
|
|
}
|
|
|
|
|
2016-07-20 19:31:57 +07:00
|
|
|
bool intel_sanitize_semaphores(struct drm_i915_private *dev_priv, int value)
|
|
|
|
{
|
|
|
|
if (INTEL_INFO(dev_priv)->gen < 6)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
/* TODO: make semaphores and Execlists play nicely together */
|
|
|
|
if (i915.enable_execlists)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
if (value >= 0)
|
|
|
|
return value;
|
|
|
|
|
|
|
|
#ifdef CONFIG_INTEL_IOMMU
|
|
|
|
/* Enable semaphores on SNB when IO remapping is off */
|
|
|
|
if (INTEL_INFO(dev_priv)->gen == 6 && intel_iommu_gfx_mapped)
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
int i915_gem_init(struct drm_i915_private *dev_priv)
|
2012-04-24 21:47:41 +07:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
2013-03-09 01:45:53 +07:00
|
|
|
|
2014-07-24 23:04:21 +07:00
|
|
|
if (!i915.enable_execlists) {
|
2016-09-09 20:11:53 +07:00
|
|
|
dev_priv->gt.resume = intel_legacy_submission_resume;
|
2016-08-03 04:50:21 +07:00
|
|
|
dev_priv->gt.cleanup_engine = intel_engine_cleanup;
|
2014-07-24 23:04:22 +07:00
|
|
|
} else {
|
2016-09-09 20:11:53 +07:00
|
|
|
dev_priv->gt.resume = intel_lr_context_resume;
|
2016-03-16 18:00:40 +07:00
|
|
|
dev_priv->gt.cleanup_engine = intel_logical_ring_cleanup;
|
2014-07-24 23:04:21 +07:00
|
|
|
}
|
|
|
|
|
2015-02-13 21:35:59 +07:00
|
|
|
/* This is just a security blanket to placate dragons.
|
|
|
|
* On some systems, we very sporadically observe that the first TLBs
|
|
|
|
* used by the CS may be stale, despite us poking the TLB reset. If
|
|
|
|
* we hold the forcewake during initialisation these problems
|
|
|
|
* just magically go away.
|
|
|
|
*/
|
|
|
|
intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
|
|
|
|
|
2016-05-19 22:17:16 +07:00
|
|
|
i915_gem_init_userptr(dev_priv);
|
2016-08-04 13:52:23 +07:00
|
|
|
|
|
|
|
ret = i915_gem_init_ggtt(dev_priv);
|
|
|
|
if (ret)
|
|
|
|
goto out_unlock;
|
2013-03-09 01:45:53 +07:00
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
ret = i915_gem_context_init(dev_priv);
|
2014-12-05 19:17:42 +07:00
|
|
|
if (ret)
|
|
|
|
goto out_unlock;
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 05:11:04 +07:00
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
ret = intel_engines_init(dev_priv);
|
2014-11-20 06:33:07 +07:00
|
|
|
if (ret)
|
2014-12-05 19:17:42 +07:00
|
|
|
goto out_unlock;
|
drm/i915: Split context enabling from init
We **need** to do this for exactly 1 reason, because we want to embed a
PPGTT into the context, but we don't want to special case the default
context.
To achieve that, we must be able to initialize contexts after the GTT is
setup (so we can allocate and pin the default context's BO), but before
the PPGTT and rings are initialized. This is because, currently, context
initialization requires ring usage. We don't have rings until after the
GTT is setup. If we split the enabling part of context initialization,
the part requiring the ringbuffer, we can untangle this, and then later
embed the PPGTT
Incidentally this allows us to also adhere to the original design of
context init/fini in future patches: they were only ever meant to be
called at driver load and unload.
v2: Move hw_contexts_disabled test in i915_gem_context_enable() (Chris)
v3: BUG_ON after checking for disabled contexts. Or else it blows up pre
gen6 (Ben)
v4: Forward port
Modified enable for each ring, since that patch is earlier in the series
Dropped ring arg from create_default_context so it can be used by others
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-12-07 05:11:04 +07:00
|
|
|
|
2016-12-01 21:16:38 +07:00
|
|
|
ret = i915_gem_init_hw(dev_priv);
|
2014-04-09 15:19:42 +07:00
|
|
|
if (ret == -EIO) {
|
2016-07-27 15:07:29 +07:00
|
|
|
/* Allow engine initialisation to fail by marking the GPU as
|
2014-04-09 15:19:42 +07:00
|
|
|
* wedged. But we only want to do this where the GPU is angry,
|
|
|
|
* for all other failure, such as an allocation failure, bail.
|
|
|
|
*/
|
|
|
|
DRM_ERROR("Failed to initialize GPU, declaring it wedged\n");
|
2016-09-09 20:11:53 +07:00
|
|
|
i915_gem_set_wedged(dev_priv);
|
2014-04-09 15:19:42 +07:00
|
|
|
ret = 0;
|
2012-04-24 21:47:41 +07:00
|
|
|
}
|
2014-12-05 19:17:42 +07:00
|
|
|
|
|
|
|
out_unlock:
|
2015-02-13 21:35:59 +07:00
|
|
|
intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
|
2016-12-01 21:16:38 +07:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2012-04-24 21:47:41 +07:00
|
|
|
|
2014-04-09 15:19:42 +07:00
|
|
|
return ret;
|
2012-04-24 21:47:41 +07:00
|
|
|
}
|
|
|
|
|
2010-05-21 08:08:55 +07:00
|
|
|
void
|
2016-12-01 21:16:39 +07:00
|
|
|
i915_gem_cleanup_engines(struct drm_i915_private *dev_priv)
|
2010-05-21 08:08:55 +07:00
|
|
|
{
|
2016-03-16 18:00:36 +07:00
|
|
|
struct intel_engine_cs *engine;
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
enum intel_engine_id id;
|
2010-05-21 08:08:55 +07:00
|
|
|
|
drm/i915: Allocate intel_engine_cs structure only for the enabled engines
With the possibility of addition of many more number of rings in future,
the drm_i915_private structure could bloat as an array, of type
intel_engine_cs, is embedded inside it.
struct intel_engine_cs engine[I915_NUM_ENGINES];
Though this is still fine as generally there is only a single instance of
drm_i915_private structure used, but not all of the possible rings would be
enabled or active on most of the platforms. Some memory can be saved by
allocating intel_engine_cs structure only for the enabled/active engines.
Currently the engine/ring ID is kept static and dev_priv->engine[] is simply
indexed using the enums defined in intel_engine_id.
To save memory and continue using the static engine/ring IDs, 'engine' is
defined as an array of pointers.
struct intel_engine_cs *engine[I915_NUM_ENGINES];
dev_priv->engine[engine_ID] will be NULL for disabled engine instances.
There is a text size reduction of 928 bytes, from 1028200 to 1027272, for
i915.o file (but for i915.ko file text size remain same as 1193131 bytes).
v2:
- Remove the engine iterator field added in drm_i915_private structure,
instead pass a local iterator variable to the for_each_engine**
macros. (Chris)
- Do away with intel_engine_initialized() and instead directly use the
NULL pointer check on engine pointer. (Chris)
v3:
- Remove for_each_engine_id() macro, as the updated macro for_each_engine()
can be used in place of it. (Chris)
- Protect the access to Render engine Fault register with a NULL check, as
engine specific init is done later in Driver load sequence.
v4:
- Use !!dev_priv->engine[VCS] style for the engine check in getparam. (Chris)
- Kill the superfluous init_engine_lists().
v5:
- Cleanup the intel_engines_init() & intel_engines_setup(), with respect to
allocation of intel_engine_cs structure. (Chris)
v6:
- Rebase.
v7:
- Optimize the for_each_engine_masked() macro. (Chris)
- Change the type of 'iter' local variable to enum intel_engine_id. (Chris)
- Rebase.
v8: Rebase.
v9: Rebase.
v10:
- For index calculation use engine ID instead of pointer based arithmetic in
intel_engine_sync_index() as engine pointers are not contiguous now (Chris)
- For appropriateness, rename local enum variable 'iter' to 'id'. (Joonas)
- Use for_each_engine macro for cleanup in intel_engines_init() and remove
check for NULL engine pointer in cleanup() routines. (Joonas)
v11: Rebase.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1476378888-7372-1-git-send-email-akash.goel@intel.com
2016-10-14 00:14:48 +07:00
|
|
|
for_each_engine(engine, dev_priv, id)
|
2016-03-16 18:00:40 +07:00
|
|
|
dev_priv->gt.cleanup_engine(engine);
|
2010-05-21 08:08:55 +07:00
|
|
|
}
|
|
|
|
|
2016-03-16 19:54:03 +07:00
|
|
|
void
|
|
|
|
i915_gem_load_init_fences(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
2016-08-18 23:17:00 +07:00
|
|
|
int i;
|
2016-03-16 19:54:03 +07:00
|
|
|
|
|
|
|
if (INTEL_INFO(dev_priv)->gen >= 7 && !IS_VALLEYVIEW(dev_priv) &&
|
|
|
|
!IS_CHERRYVIEW(dev_priv))
|
|
|
|
dev_priv->num_fence_regs = 32;
|
|
|
|
else if (INTEL_INFO(dev_priv)->gen >= 4 || IS_I945G(dev_priv) ||
|
|
|
|
IS_I945GM(dev_priv) || IS_G33(dev_priv))
|
|
|
|
dev_priv->num_fence_regs = 16;
|
|
|
|
else
|
|
|
|
dev_priv->num_fence_regs = 8;
|
|
|
|
|
2016-05-06 21:40:21 +07:00
|
|
|
if (intel_vgpu_active(dev_priv))
|
2016-03-16 19:54:03 +07:00
|
|
|
dev_priv->num_fence_regs =
|
|
|
|
I915_READ(vgtif_reg(avail_rs.fence_num));
|
|
|
|
|
|
|
|
/* Initialize fence registers to zero */
|
2016-08-18 23:17:00 +07:00
|
|
|
for (i = 0; i < dev_priv->num_fence_regs; i++) {
|
|
|
|
struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
|
|
|
|
|
|
|
|
fence->i915 = dev_priv;
|
|
|
|
fence->id = i;
|
|
|
|
list_add_tail(&fence->link, &dev_priv->mm.fence_list);
|
|
|
|
}
|
2016-11-16 15:55:33 +07:00
|
|
|
i915_gem_restore_fences(dev_priv);
|
2016-03-16 19:54:03 +07:00
|
|
|
|
2016-11-16 15:55:33 +07:00
|
|
|
i915_gem_detect_bit_6_swizzle(dev_priv);
|
2016-03-16 19:54:03 +07:00
|
|
|
}
|
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
int
|
2016-12-01 21:16:39 +07:00
|
|
|
i915_gem_load_init(struct drm_i915_private *dev_priv)
|
2008-07-31 02:06:12 +07:00
|
|
|
{
|
2016-11-02 22:14:59 +07:00
|
|
|
int err = -ENOMEM;
|
2012-11-15 18:32:30 +07:00
|
|
|
|
2016-11-02 22:14:59 +07:00
|
|
|
dev_priv->objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
|
|
|
|
if (!dev_priv->objects)
|
2016-10-28 19:58:46 +07:00
|
|
|
goto err_out;
|
|
|
|
|
2016-11-02 22:14:59 +07:00
|
|
|
dev_priv->vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN);
|
|
|
|
if (!dev_priv->vmas)
|
2016-10-28 19:58:46 +07:00
|
|
|
goto err_objects;
|
|
|
|
|
2016-11-02 22:14:59 +07:00
|
|
|
dev_priv->requests = KMEM_CACHE(drm_i915_gem_request,
|
|
|
|
SLAB_HWCACHE_ALIGN |
|
|
|
|
SLAB_RECLAIM_ACCOUNT |
|
|
|
|
SLAB_DESTROY_BY_RCU);
|
|
|
|
if (!dev_priv->requests)
|
2016-10-28 19:58:46 +07:00
|
|
|
goto err_vmas;
|
|
|
|
|
2016-11-15 03:41:02 +07:00
|
|
|
dev_priv->dependencies = KMEM_CACHE(i915_dependency,
|
|
|
|
SLAB_HWCACHE_ALIGN |
|
|
|
|
SLAB_RECLAIM_ACCOUNT);
|
|
|
|
if (!dev_priv->dependencies)
|
|
|
|
goto err_requests;
|
|
|
|
|
2016-10-28 19:58:46 +07:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
INIT_LIST_HEAD(&dev_priv->gt.timelines);
|
2016-11-15 03:40:57 +07:00
|
|
|
err = i915_gem_timeline_init__global(dev_priv);
|
2016-10-28 19:58:46 +07:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
if (err)
|
2016-11-15 03:41:02 +07:00
|
|
|
goto err_dependencies;
|
2008-07-31 02:06:12 +07:00
|
|
|
|
2013-09-18 11:12:45 +07:00
|
|
|
INIT_LIST_HEAD(&dev_priv->context_list);
|
2016-10-28 19:58:42 +07:00
|
|
|
INIT_WORK(&dev_priv->mm.free_work, __i915_gem_free_work);
|
|
|
|
init_llist_head(&dev_priv->mm.free_list);
|
drm/i915: Track unbound pages
When dealing with a working set larger than the GATT, or even the
mappable aperture when touching through the GTT, we end up with evicting
objects only to rebind them at a new offset again later. Moving an
object into and out of the GTT requires clflushing the pages, thus
causing a double-clflush penalty for rebinding.
To avoid having to clflush on rebinding, we can track the pages as they
are evicted from the GTT and only relinquish those pages on memory
pressure.
As usual, if it were not for the handling of out-of-memory condition and
having to manually shrink our own bo caches, it would be a net reduction
of code. Alas.
Note: The patch also contains a few changes to the last-hope
evict_everything logic in i916_gem_execbuffer.c - we no longer try to
only evict the purgeable stuff in a first try (since that's superflous
and only helps in OOM corner-cases, not fragmented-gtt trashing
situations).
Also, the extraction of the get_pages retry loop from bind_to_gtt (and
other callsites) to get_pages should imo have been a separate patch.
v2: Ditch the newly added put_pages (for unbound objects only) in
i915_gem_reset. A quick irc discussion hasn't revealed any important
reason for this, so if we need this, I'd like to have a git blame'able
explanation for it.
v3: Undo the s/drm_malloc_ab/kmalloc/ in get_pages that Chris noticed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Split out code movements and rant a bit in the commit message
with a few Notes. Done v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-20 16:40:46 +07:00
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.unbound_list);
|
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.bound_list);
|
2009-08-30 02:49:51 +07:00
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.fence_list);
|
2016-10-24 19:42:14 +07:00
|
|
|
INIT_LIST_HEAD(&dev_priv->mm.userfault_list);
|
2016-07-04 14:08:31 +07:00
|
|
|
INIT_DELAYED_WORK(&dev_priv->gt.retire_work,
|
2008-07-31 02:06:12 +07:00
|
|
|
i915_gem_retire_work_handler);
|
2016-07-04 14:08:31 +07:00
|
|
|
INIT_DELAYED_WORK(&dev_priv->gt.idle_work,
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
i915_gem_idle_work_handler);
|
2016-07-01 23:23:14 +07:00
|
|
|
init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
|
2012-11-15 23:17:22 +07:00
|
|
|
init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
|
2009-09-14 22:50:28 +07:00
|
|
|
|
2010-12-19 18:42:05 +07:00
|
|
|
dev_priv->relative_constants_mode = I915_EXEC_CONSTANTS_REL_GENERAL;
|
|
|
|
|
2009-11-18 23:25:18 +07:00
|
|
|
init_waitqueue_head(&dev_priv->pending_flip_queue);
|
2010-10-28 18:51:39 +07:00
|
|
|
|
2011-02-21 21:43:56 +07:00
|
|
|
dev_priv->mm.interruptible = true;
|
|
|
|
|
2016-09-01 18:58:21 +07:00
|
|
|
atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
|
|
|
|
|
2016-08-04 22:32:36 +07:00
|
|
|
spin_lock_init(&dev_priv->fb_tracking.lock);
|
2016-10-28 19:58:46 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
|
2016-11-15 03:41:02 +07:00
|
|
|
err_dependencies:
|
|
|
|
kmem_cache_destroy(dev_priv->dependencies);
|
2016-10-28 19:58:46 +07:00
|
|
|
err_requests:
|
|
|
|
kmem_cache_destroy(dev_priv->requests);
|
|
|
|
err_vmas:
|
|
|
|
kmem_cache_destroy(dev_priv->vmas);
|
|
|
|
err_objects:
|
|
|
|
kmem_cache_destroy(dev_priv->objects);
|
|
|
|
err_out:
|
|
|
|
return err;
|
2008-07-31 02:06:12 +07:00
|
|
|
}
|
2008-12-30 17:31:46 +07:00
|
|
|
|
2016-12-01 21:16:39 +07:00
|
|
|
void i915_gem_load_cleanup(struct drm_i915_private *dev_priv)
|
2016-01-19 20:26:29 +07:00
|
|
|
{
|
2016-11-01 15:48:41 +07:00
|
|
|
WARN_ON(!llist_empty(&dev_priv->mm.free_list));
|
|
|
|
|
2016-11-18 04:04:11 +07:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_timeline_fini(&dev_priv->gt.global_timeline);
|
|
|
|
WARN_ON(!list_empty(&dev_priv->gt.timelines));
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
2016-11-15 03:41:02 +07:00
|
|
|
kmem_cache_destroy(dev_priv->dependencies);
|
2016-01-19 20:26:29 +07:00
|
|
|
kmem_cache_destroy(dev_priv->requests);
|
|
|
|
kmem_cache_destroy(dev_priv->vmas);
|
|
|
|
kmem_cache_destroy(dev_priv->objects);
|
drm/i915: Enable lockless lookup of request tracking via RCU
If we enable RCU for the requests (providing a grace period where we can
inspect a "dead" request before it is freed), we can allow callers to
carefully perform lockless lookup of an active request.
However, by enabling deferred freeing of requests, we can potentially
hog a lot of memory when dealing with tens of thousands of requests per
second - with a quick insertion of a synchronize_rcu() inside our
shrinker callback, that issue disappears.
v2: Currently, it is our responsibility to handle reclaim i.e. to avoid
hogging memory with the delayed slab frees. At the moment, we wait for a
grace period in the shrinker, and block for all RCU callbacks on oom.
Suggested alternatives focus on flushing our RCU callback when we have a
certain number of outstanding request frees, and blocking on that flush
after a second high watermark. (So rather than wait for the system to
run out of memory, we stop issuing requests - both are nondeterministic.)
Paul E. McKenney wrote:
Another approach is synchronize_rcu() after some largish number of
requests. The advantage of this approach is that it throttles the
production of callbacks at the source. The corresponding disadvantage
is that it slows things up.
Another approach is to use call_rcu(), but if the previous call_rcu()
is still in flight, block waiting for it. Yet another approach is
the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The
idea is to do something like this:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
You would of course do an initial get_state_synchronize_rcu() to
get things going. This would not block unless there was less than
one grace period's worth of time between invocations. But this
assumes a busy system, where there is almost always a grace period
in flight. But you can make that happen as follows:
cond_synchronize_rcu(cookie);
cookie = get_state_synchronize_rcu();
call_rcu(&my_rcu_head, noop_function);
Note that you need additional code to make sure that the old callback
has completed before doing a new one. Setting and clearing a flag
with appropriate memory ordering control suffices (e.g,. smp_load_acquire()
and smp_store_release()).
v3: More comments on compiler and processor order of operations within
the RCU lookup and discover we can use rcu_access_pointer() here instead.
v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04 22:32:41 +07:00
|
|
|
|
|
|
|
/* And ensure that our DESTROY_BY_RCU slabs are truly destroyed */
|
|
|
|
rcu_barrier();
|
2016-01-19 20:26:29 +07:00
|
|
|
}
|
|
|
|
|
2016-09-21 20:51:07 +07:00
|
|
|
int i915_gem_freeze(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
intel_runtime_pm_get(dev_priv);
|
|
|
|
|
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_shrink_all(dev_priv);
|
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
|
|
|
|
|
|
|
intel_runtime_pm_put(dev_priv);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2016-05-14 13:26:33 +07:00
|
|
|
int i915_gem_freeze_late(struct drm_i915_private *dev_priv)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
2016-09-10 02:02:18 +07:00
|
|
|
struct list_head *phases[] = {
|
|
|
|
&dev_priv->mm.unbound_list,
|
|
|
|
&dev_priv->mm.bound_list,
|
|
|
|
NULL
|
|
|
|
}, **p;
|
2016-05-14 13:26:33 +07:00
|
|
|
|
|
|
|
/* Called just before we write the hibernation image.
|
|
|
|
*
|
|
|
|
* We need to update the domain tracking to reflect that the CPU
|
|
|
|
* will be accessing all the pages to create and restore from the
|
|
|
|
* hibernation, and so upon restoration those pages will be in the
|
|
|
|
* CPU domain.
|
|
|
|
*
|
|
|
|
* To make sure the hibernation image contains the latest state,
|
|
|
|
* we update that state just before writing out the image.
|
2016-09-10 02:02:18 +07:00
|
|
|
*
|
|
|
|
* To try and reduce the hibernation image, we manually shrink
|
|
|
|
* the objects as well.
|
2016-05-14 13:26:33 +07:00
|
|
|
*/
|
|
|
|
|
2016-09-21 20:51:07 +07:00
|
|
|
mutex_lock(&dev_priv->drm.struct_mutex);
|
|
|
|
i915_gem_shrink(dev_priv, -1UL, I915_SHRINK_UNBOUND);
|
2016-05-14 13:26:33 +07:00
|
|
|
|
2016-09-10 02:02:18 +07:00
|
|
|
for (p = phases; *p; p++) {
|
2016-11-02 17:16:04 +07:00
|
|
|
list_for_each_entry(obj, *p, global_link) {
|
2016-09-10 02:02:18 +07:00
|
|
|
obj->base.read_domains = I915_GEM_DOMAIN_CPU;
|
|
|
|
obj->base.write_domain = I915_GEM_DOMAIN_CPU;
|
|
|
|
}
|
2016-05-14 13:26:33 +07:00
|
|
|
}
|
2016-09-21 20:51:07 +07:00
|
|
|
mutex_unlock(&dev_priv->drm.struct_mutex);
|
2016-05-14 13:26:33 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2010-09-24 22:02:42 +07:00
|
|
|
void i915_gem_release(struct drm_device *dev, struct drm_file *file)
|
2009-06-03 14:27:35 +07:00
|
|
|
{
|
2010-09-24 22:02:42 +07:00
|
|
|
struct drm_i915_file_private *file_priv = file->driver_priv;
|
2016-07-26 18:01:52 +07:00
|
|
|
struct drm_i915_gem_request *request;
|
2009-06-03 14:27:35 +07:00
|
|
|
|
|
|
|
/* Clean up our request list when the client is going away, so that
|
|
|
|
* later retire_requests won't dereference our soon-to-be-gone
|
|
|
|
* file_priv.
|
|
|
|
*/
|
2010-09-26 17:03:27 +07:00
|
|
|
spin_lock(&file_priv->mm.lock);
|
2016-07-26 18:01:52 +07:00
|
|
|
list_for_each_entry(request, &file_priv->mm.request_list, client_list)
|
2010-09-24 22:02:42 +07:00
|
|
|
request->file_priv = NULL;
|
2010-09-26 17:03:27 +07:00
|
|
|
spin_unlock(&file_priv->mm.lock);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
2015-04-27 19:41:22 +07:00
|
|
|
if (!list_empty(&file_priv->rps.link)) {
|
2015-05-22 03:01:47 +07:00
|
|
|
spin_lock(&to_i915(dev)->rps.client_lock);
|
2015-04-27 19:41:22 +07:00
|
|
|
list_del(&file_priv->rps.link);
|
2015-05-22 03:01:47 +07:00
|
|
|
spin_unlock(&to_i915(dev)->rps.client_lock);
|
2015-04-07 22:20:32 +07:00
|
|
|
}
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
int i915_gem_open(struct drm_device *dev, struct drm_file *file)
|
|
|
|
{
|
|
|
|
struct drm_i915_file_private *file_priv;
|
2013-12-07 05:10:58 +07:00
|
|
|
int ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
2016-11-09 17:45:07 +07:00
|
|
|
DRM_DEBUG("\n");
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
|
|
|
file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
|
|
|
|
if (!file_priv)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
file->driver_priv = file_priv;
|
2016-07-04 17:34:37 +07:00
|
|
|
file_priv->dev_priv = to_i915(dev);
|
2014-02-25 22:11:24 +07:00
|
|
|
file_priv->file = file;
|
2015-04-27 19:41:22 +07:00
|
|
|
INIT_LIST_HEAD(&file_priv->rps.link);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
|
|
|
spin_lock_init(&file_priv->mm.lock);
|
|
|
|
INIT_LIST_HEAD(&file_priv->mm.request_list);
|
|
|
|
|
2016-07-27 15:07:27 +07:00
|
|
|
file_priv->bsd_engine = -1;
|
2016-01-15 22:12:50 +07:00
|
|
|
|
2013-12-07 05:10:58 +07:00
|
|
|
ret = i915_gem_context_open(dev, file);
|
|
|
|
if (ret)
|
|
|
|
kfree(file_priv);
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
|
2013-12-07 05:10:58 +07:00
|
|
|
return ret;
|
drm/i915: Boost RPS frequency for CPU stalls
If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.
This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.
In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)
Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from permanently boosting the RPS state, we ratelimit the
frequency of the wait-boosts each client recieves.
Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.
v2: Limit each client to receiving a single boost for each active period.
Tested by QA to only marginally increase power, and to demonstrably
increase throughput in games. No latency measurements yet.
v3: Cater for front-buffer rendering with manual throttling.
v4: Tidy up.
v5: Sadly the compositor needs frequent boosts as it may never idle, but
due to its picking mechanism (using ReadPixels) may require frequent
waits. Those waits, along with the waits for the vrefresh swap, conspire
to keep the GPU at low frequencies despite the interactive latency. To
overcome this we ditch the one-boost-per-active-period and just ratelimit
the number of wait-boosts each client can receive.
Reported-and-tested-by: Paul Neumann <paul104x@yahoo.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=68716
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
[danvet: No extern for function prototypes in headers.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-25 23:34:56 +07:00
|
|
|
}
|
|
|
|
|
2014-09-19 23:27:27 +07:00
|
|
|
/**
|
|
|
|
* i915_gem_track_fb - update frontbuffer tracking
|
2015-09-15 19:58:44 +07:00
|
|
|
* @old: current GEM buffer for the frontbuffer slots
|
|
|
|
* @new: new GEM buffer for the frontbuffer slots
|
|
|
|
* @frontbuffer_bits: bitmask of frontbuffer slots
|
2014-09-19 23:27:27 +07:00
|
|
|
*
|
|
|
|
* This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them
|
|
|
|
* from @old and setting them in @new. Both @old and @new can be NULL.
|
|
|
|
*/
|
2014-06-19 04:28:09 +07:00
|
|
|
void i915_gem_track_fb(struct drm_i915_gem_object *old,
|
|
|
|
struct drm_i915_gem_object *new,
|
|
|
|
unsigned frontbuffer_bits)
|
|
|
|
{
|
2016-08-04 22:32:37 +07:00
|
|
|
/* Control of individual bits within the mask are guarded by
|
|
|
|
* the owning plane->mutex, i.e. we can never see concurrent
|
|
|
|
* manipulation of individual bits. But since the bitfield as a whole
|
|
|
|
* is updated using RMW, we need to use atomics in order to update
|
|
|
|
* the bits.
|
|
|
|
*/
|
|
|
|
BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES >
|
|
|
|
sizeof(atomic_t) * BITS_PER_BYTE);
|
|
|
|
|
2014-06-19 04:28:09 +07:00
|
|
|
if (old) {
|
2016-08-04 22:32:37 +07:00
|
|
|
WARN_ON(!(atomic_read(&old->frontbuffer_bits) & frontbuffer_bits));
|
|
|
|
atomic_andnot(frontbuffer_bits, &old->frontbuffer_bits);
|
2014-06-19 04:28:09 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (new) {
|
2016-08-04 22:32:37 +07:00
|
|
|
WARN_ON(atomic_read(&new->frontbuffer_bits) & frontbuffer_bits);
|
|
|
|
atomic_or(frontbuffer_bits, &new->frontbuffer_bits);
|
2014-06-19 04:28:09 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-07-10 01:29:02 +07:00
|
|
|
/* Allocate a new GEM object and fill it with the supplied data */
|
|
|
|
struct drm_i915_gem_object *
|
2016-12-01 21:16:37 +07:00
|
|
|
i915_gem_object_create_from_data(struct drm_i915_private *dev_priv,
|
2015-07-10 01:29:02 +07:00
|
|
|
const void *data, size_t size)
|
|
|
|
{
|
|
|
|
struct drm_i915_gem_object *obj;
|
|
|
|
struct sg_table *sg;
|
|
|
|
size_t bytes;
|
|
|
|
int ret;
|
|
|
|
|
2016-12-01 21:16:37 +07:00
|
|
|
obj = i915_gem_object_create(dev_priv, round_up(size, PAGE_SIZE));
|
2016-04-25 19:32:13 +07:00
|
|
|
if (IS_ERR(obj))
|
2015-07-10 01:29:02 +07:00
|
|
|
return obj;
|
|
|
|
|
|
|
|
ret = i915_gem_object_set_to_cpu_domain(obj, true);
|
|
|
|
if (ret)
|
|
|
|
goto fail;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
ret = i915_gem_object_pin_pages(obj);
|
2015-07-10 01:29:02 +07:00
|
|
|
if (ret)
|
|
|
|
goto fail;
|
|
|
|
|
2016-10-28 19:58:35 +07:00
|
|
|
sg = obj->mm.pages;
|
2015-07-10 01:29:02 +07:00
|
|
|
bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
|
2016-10-28 19:58:35 +07:00
|
|
|
obj->mm.dirty = true; /* Backing store is now out of date */
|
2015-07-10 01:29:02 +07:00
|
|
|
i915_gem_object_unpin_pages(obj);
|
|
|
|
|
|
|
|
if (WARN_ON(bytes != size)) {
|
|
|
|
DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
|
|
|
|
ret = -EFAULT;
|
|
|
|
goto fail;
|
|
|
|
}
|
|
|
|
|
|
|
|
return obj;
|
|
|
|
|
|
|
|
fail:
|
2016-07-20 19:31:53 +07:00
|
|
|
i915_gem_object_put(obj);
|
2015-07-10 01:29:02 +07:00
|
|
|
return ERR_PTR(ret);
|
|
|
|
}
|
2016-10-28 19:58:33 +07:00
|
|
|
|
|
|
|
struct scatterlist *
|
|
|
|
i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int n,
|
|
|
|
unsigned int *offset)
|
|
|
|
{
|
2016-10-28 19:58:35 +07:00
|
|
|
struct i915_gem_object_page_iter *iter = &obj->mm.get_page;
|
2016-10-28 19:58:33 +07:00
|
|
|
struct scatterlist *sg;
|
|
|
|
unsigned int idx, count;
|
|
|
|
|
|
|
|
might_sleep();
|
|
|
|
GEM_BUG_ON(n >= obj->base.size >> PAGE_SHIFT);
|
2016-10-28 19:58:35 +07:00
|
|
|
GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
|
2016-10-28 19:58:33 +07:00
|
|
|
|
|
|
|
/* As we iterate forward through the sg, we record each entry in a
|
|
|
|
* radixtree for quick repeated (backwards) lookups. If we have seen
|
|
|
|
* this index previously, we will have an entry for it.
|
|
|
|
*
|
|
|
|
* Initial lookup is O(N), but this is amortized to O(1) for
|
|
|
|
* sequential page access (where each new request is consecutive
|
|
|
|
* to the previous one). Repeated lookups are O(lg(obj->base.size)),
|
|
|
|
* i.e. O(1) with a large constant!
|
|
|
|
*/
|
|
|
|
if (n < READ_ONCE(iter->sg_idx))
|
|
|
|
goto lookup;
|
|
|
|
|
|
|
|
mutex_lock(&iter->lock);
|
|
|
|
|
|
|
|
/* We prefer to reuse the last sg so that repeated lookup of this
|
|
|
|
* (or the subsequent) sg are fast - comparing against the last
|
|
|
|
* sg is faster than going through the radixtree.
|
|
|
|
*/
|
|
|
|
|
|
|
|
sg = iter->sg_pos;
|
|
|
|
idx = iter->sg_idx;
|
|
|
|
count = __sg_page_count(sg);
|
|
|
|
|
|
|
|
while (idx + count <= n) {
|
|
|
|
unsigned long exception, i;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
/* If we cannot allocate and insert this entry, or the
|
|
|
|
* individual pages from this range, cancel updating the
|
|
|
|
* sg_idx so that on this lookup we are forced to linearly
|
|
|
|
* scan onwards, but on future lookups we will try the
|
|
|
|
* insertion again (in which case we need to be careful of
|
|
|
|
* the error return reporting that we have already inserted
|
|
|
|
* this index).
|
|
|
|
*/
|
|
|
|
ret = radix_tree_insert(&iter->radix, idx, sg);
|
|
|
|
if (ret && ret != -EEXIST)
|
|
|
|
goto scan;
|
|
|
|
|
|
|
|
exception =
|
|
|
|
RADIX_TREE_EXCEPTIONAL_ENTRY |
|
|
|
|
idx << RADIX_TREE_EXCEPTIONAL_SHIFT;
|
|
|
|
for (i = 1; i < count; i++) {
|
|
|
|
ret = radix_tree_insert(&iter->radix, idx + i,
|
|
|
|
(void *)exception);
|
|
|
|
if (ret && ret != -EEXIST)
|
|
|
|
goto scan;
|
|
|
|
}
|
|
|
|
|
|
|
|
idx += count;
|
|
|
|
sg = ____sg_next(sg);
|
|
|
|
count = __sg_page_count(sg);
|
|
|
|
}
|
|
|
|
|
|
|
|
scan:
|
|
|
|
iter->sg_pos = sg;
|
|
|
|
iter->sg_idx = idx;
|
|
|
|
|
|
|
|
mutex_unlock(&iter->lock);
|
|
|
|
|
|
|
|
if (unlikely(n < idx)) /* insertion completed by another thread */
|
|
|
|
goto lookup;
|
|
|
|
|
|
|
|
/* In case we failed to insert the entry into the radixtree, we need
|
|
|
|
* to look beyond the current sg.
|
|
|
|
*/
|
|
|
|
while (idx + count <= n) {
|
|
|
|
idx += count;
|
|
|
|
sg = ____sg_next(sg);
|
|
|
|
count = __sg_page_count(sg);
|
|
|
|
}
|
|
|
|
|
|
|
|
*offset = n - idx;
|
|
|
|
return sg;
|
|
|
|
|
|
|
|
lookup:
|
|
|
|
rcu_read_lock();
|
|
|
|
|
|
|
|
sg = radix_tree_lookup(&iter->radix, n);
|
|
|
|
GEM_BUG_ON(!sg);
|
|
|
|
|
|
|
|
/* If this index is in the middle of multi-page sg entry,
|
|
|
|
* the radixtree will contain an exceptional entry that points
|
|
|
|
* to the start of that range. We will return the pointer to
|
|
|
|
* the base page and the offset of this page within the
|
|
|
|
* sg entry's range.
|
|
|
|
*/
|
|
|
|
*offset = 0;
|
|
|
|
if (unlikely(radix_tree_exception(sg))) {
|
|
|
|
unsigned long base =
|
|
|
|
(unsigned long)sg >> RADIX_TREE_EXCEPTIONAL_SHIFT;
|
|
|
|
|
|
|
|
sg = radix_tree_lookup(&iter->radix, base);
|
|
|
|
GEM_BUG_ON(!sg);
|
|
|
|
|
|
|
|
*offset = n - base;
|
|
|
|
}
|
|
|
|
|
|
|
|
rcu_read_unlock();
|
|
|
|
|
|
|
|
return sg;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct page *
|
|
|
|
i915_gem_object_get_page(struct drm_i915_gem_object *obj, unsigned int n)
|
|
|
|
{
|
|
|
|
struct scatterlist *sg;
|
|
|
|
unsigned int offset;
|
|
|
|
|
|
|
|
GEM_BUG_ON(!i915_gem_object_has_struct_page(obj));
|
|
|
|
|
|
|
|
sg = i915_gem_object_get_sg(obj, n, &offset);
|
|
|
|
return nth_page(sg_page(sg), offset);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Like i915_gem_object_get_page(), but mark the returned page dirty */
|
|
|
|
struct page *
|
|
|
|
i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned int n)
|
|
|
|
{
|
|
|
|
struct page *page;
|
|
|
|
|
|
|
|
page = i915_gem_object_get_page(obj, n);
|
2016-10-28 19:58:35 +07:00
|
|
|
if (!obj->mm.dirty)
|
2016-10-28 19:58:33 +07:00
|
|
|
set_page_dirty(page);
|
|
|
|
|
|
|
|
return page;
|
|
|
|
}
|
|
|
|
|
|
|
|
dma_addr_t
|
|
|
|
i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj,
|
|
|
|
unsigned long n)
|
|
|
|
{
|
|
|
|
struct scatterlist *sg;
|
|
|
|
unsigned int offset;
|
|
|
|
|
|
|
|
sg = i915_gem_object_get_sg(obj, n, &offset);
|
|
|
|
return sg_dma_address(sg) + (offset << PAGE_SHIFT);
|
|
|
|
}
|