On some stepping of SNB cpu, the first command to be parsed in BLT
command streamer should be MI_BATCHBUFFER_START otherwise the GPU
may hang.
Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
[ickle: rebased for -next]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This restores cache behavior for default AGP_USER_MEMORY as
uncached, and leave default AGP_USER_CACHED_MEMORY as LLC only.
I've seen different cache behavior on one sandybridge desktop CPU vs.
another mobile CPU. Until we figure out how to detect the real cache
config, restore back to the original behavior now.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is broken from 97ef1bdd0b.
Let's set the correct bit for LLC+MLC and LLC only.
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In case of an opregion signature mismatch in intel_opregion_setup(),
iounmap the correct address.
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Cc: stable@kernel.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and trying to set the bit is ineffectual.
Fixes the regression from e380f60 which detected that we were trying to
do undefined operations on the I830_GMCH_CTRL.
Reported-by: Alexey Fisher <bug-track@fisher-privat.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Take two passes to evict everything whilst searching for sufficient free
space to bind the batchbuffer. After searching for sufficient free space
using LRU eviction, evict everything that is purgeable and try again.
Only then if there is insufficient free space (or the GTT is too badly
fragmented) evict everything from the aperture and try one last time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Accessing the uninitialised obj->pages instead of the local page lead to
an OOPs.
Reported-by: Xavier Chantry <chantry.xavier@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
My Sandybridge only reports 0 for the ring buffer registers, causing it
to hang as soon as we exhaust the available ring. As a workaround, take
advantage of our huge ring buffers and use the auto-reporting mechanism
to update the status page with the HEAD location every 64 KiB.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
After switching the MMIO registers to use pci_iomap, remember to dispose
of the mapping with pci_iounmap (for symmetry).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
So long as we adhere to the fence registers rules for alignment and no
overlaps (including with unfenced accesses to linear memory) and account
for the tiled access in our size allocation, we do not have to allocate
the full fenced region for the object. This allows us to fight the bloat
tiling imposed on pre-i965 chipsets and frees up RAM for real use. [Inside
the GTT we still suffer the additional alignment constraints, so it doesn't
magic allow us to render larger scenes without stalls -- we need the
expanded GTT and fence pipelining to overcome those...]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Also spotted by Dan Carpenter.
obj->pin_count is unsigned so the BUG_ON(obj->pin_count<0) will never
trigger.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
The error code is only expected during the actual pruning and not during
the first measurement (nr_to_scan == 0) pass.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
It is possible for the active list to only contain a read-only buffer so
that the ring->gpu_write_list remains entry. This leads to an
inconsistency between i915_gpu_is_active() and i915_gpu_idle() causing
an infinite spin during the shrinker and an assertion failure that
i915_gpu_idle() does indeed flush all buffers from the active lists.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In order to force a page-fault on a GTT mapping after we start using it
from the GPU and so enforce correct CPU/GPU synchronisation, we need to
invalidate the mapping.
Pointed out by Owain G. Ainsworth.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
By using read_cache_page() for individual pages during pwrite/pread we
can eliminate an unnecessary large allocation (and immediate free) of
obj->pages. Also this eliminates any potential nesting of get/put pages,
simplifying the code and preparing the path for greater things.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since we rarely use the mmap_offset and it is easily computable from the
obj->map_list.hash, remove it.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Eliminate the racy device unload by embedding a shrinker into each
device. Smaller, simpler code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Play safe and use the common routines which take care of the cachability
of the memory when setting up the iomapping for the PCI registers.
Whilst they should be cacheable for the current generations, actually
honouring what the device requires is a better long term strategy.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This holds error state from the main graphics arbiter mainly involving
the DMA engine and address translation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is the same value as before, but it just makes the code slightly
more readable to use the local variable than converting the aperture
size into bytes every time.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
More precisely: For those that _need_ to be mappable. Also add two
BUG_ONs in fault and pin to check the consistency of the mappable
flag.
Changes in v2:
- Add tracking of gtt mappable space (to notice mappable/unmappable
balancing issues).
- Improve the mappable working set tracking by tracking fault and pin
separately.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This way we can make some more educated guesses as to why exactly
we can't use 2G apertures to their full potential ;)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
On VT-d supporting platforms the GGTT is allocated in a stolen mem
section separate from graphcis stolen mem. The GMCH register contains
a bitfield specifying the size of that region. Docs suggest that this
region can only be used for GGTT and PPGTT. Hence ensure that the
PPGTT is disabled and use the complete area for the GGTT.
Unfortunately the graphics core on G33/Pineview can't cope with really
large GTTs and the BIOS usually enables the maximum of 512MB. So
don't bother with maximizing the GTT on these platforms.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
... and switch to a more classical store-reg-on-suspend, restore-on-resume
way of doing things. Obviously this is just preparation for the future,
the code is not there at all, yet.
This is needed because the next patch adjusts this register and everything
in it (not just the pagetable address) needs to be restored on resume.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
At least the part that's currently enabled by the BIOS.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In i915_gem_object_pin obviously unbind only if mappable is true.
This is the last part to enable gtt_mappable_end != gtt_size, which
the next patch will do.
v2: Fences on g33/pineview only work in the mappable part of the
gtt.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Like before add a parameter mappable (also to gem_object_pin) and
set it depending upon the context. Only bos that are brought into
the gtt due to an execbuffer call can be put into the unmappable
part of the gtt, everything else (especially pinned objects) need
to be put into the mappable part of the gtt.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Add a mappable parameter to i915_gem_evict_something to distinguish
the two cases (non-restricted vs. mappable gtt allocations). No
functional changes because the mappable limit is set to the end of
the gtt currently.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Currently, we believe the GPU is idle if just the RENDER ring is idle.
This is obviously wrong if we only using either the BLT or the BSD
rings and so masking genuine hangs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>