Commit Graph

123 Commits

Author SHA1 Message Date
Brad Volkin
44e895a8a2 drm/i915: Use hash tables for the command parser
For clients that submit large batch buffers the command parser has
a substantial impact on performance. On my HSW ULT system performance
drops as much as ~20% on some tests. Most of the time is spent in the
command lookup code. Converting that from the current naive search to
a hash table lookup reduces the performance drop to ~10%.

The choice of value for I915_CMD_HASH_ORDER allows all commands
currently used in the parser tables to hash to their own bucket (except
for one collision on the render ring). The tradeoff is that it wastes
memory. Because the opcodes for the commands in the tables are not
particularly well distributed, reducing the order still leaves many
buckets empty. The increased collisions don't seem to have a huge
impact on the performance gain, but for now anyhow, the parser trades
memory for performance.

NB: Ville noticed that the error paths through the ring init code
will leak memory. I've not addressed that here. We can do a follow
up pass to handle all of the leaks.

v2: improved comment describing selection of hash key mask (Damien)
replace a BUG_ON() with an error return (Tvrtko, Ville)
commit message improvements

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-12 19:15:51 +02:00
Ben Widawsky
9bcb144c83 drm/i915: Support 64b execbuf
Previously, our code only had a 32b offset value for where the
batchbuffer starts. With full PPGTT, and 64b canonical GPU address
space, that is an insufficient value. The code to expand is pretty
straight forward, and only one platform needs to do anything with the
extra bits.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 16:01:58 +02:00
Ben Widawsky
024a43e12c drm/i915: Move ring_begin to signal()
Add_request has always contained both the semaphore mailbox updates as
well as the breadcrumb writes. Since the semaphore signal is the one
which actually knows about the number of dwords it needs to emit to the
ring, we move the ring_begin to that function. This allows us to remove
the hideously shared #define

On a related not, gen8 will use a different number of dwords for
semaphores, but not for add request.

v2: Make number of dwords an explicit part of signalling (via function
argument). (Chris)

v3: very slight comment change

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
78325f2d27 drm/i915: Virtualize the ringbuffer signal func
This abstraction again is in preparation for gen8. Gen8 will bring new
semantics for doing this operation.

While here, make the writes of MI_NOOPs explicit for non-existent rings.
This should have been implicit before.

NOTE: This is going to be removed in a few patches.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:53 +02:00
Ben Widawsky
ebc348b2ad drm/i915: Move semaphore specific ring members to struct
This will be helpful in abstracting some of the code in preparation for
gen8 semaphores.

v2: Move mbox stuff to a separate struct

v3: Rebased over VCS2 work

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> (v1)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 10:56:52 +02:00
Zhao Yakui
845f74a701 drm/i915:Initialize the second BSD ring on BDW GT3 machine
Based on the hardware spec, the BDW GT3 machine has two independent
BSD ring that can be used to dispatch the video commands.
So just initialize it.

V3->V4: Follow Imre's comment to do some minor updates. For example:
more comments are added to describe the semaphore between ring.

Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
[danvet: Fix up checkpatch error.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:46 +02:00
Zhao Yakui
b1a93306ed drm/i915: Update the restrict check to filter out wrong Ring ID passed by user-space
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:44 +02:00
Chris Wilson
e3efda49e7 drm/i915: Preserve ring buffers objects across resume
Tearing down the ring buffers across resume is overkill, risks
unnecessary failure and increases fragmentation.

After failure, since the device is still active we may end up trying to
write into the dangling iomapping and trigger an oops.

v2: stop_ringbuffers() was meant to call stop(ring) not
cleanup(ring) during resume!

Reported-by: Jae-hyeon Park <jhyeon@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=72351
References: https://bugs.freedesktop.org/show_bug.cgi?id=76554
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
[danvet: s/ring->obj == NULL/!intel_ring_initialized(ring)/ as
suggested by Oscar.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-05-05 09:08:37 +02:00
Chris Wilson
9991ae787a drm/i915: Move all ring resets before setting the HWS page
In commit a51435a313
Author: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Date:   Wed Mar 12 16:39:40 2014 +0530

    drm/i915: disable rings before HW status page setup

we reordered stopping the rings to do so before we set the HWS register.
However, there is an extra workaround for g45 to reset the rings twice,
and for consistency we should apply that workaround before setting the
HWS to be sure that the rings are truly stopped.

Cc: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 17:16:45 +02:00
Ben Widawsky
057f6a8ad7 drm/i915: Invariably invalidate before ctx switch
We have been setting the bit which was originally BIOS dependent since:
commit f05bb0c7b6
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Jan 20 16:33:32 2013 +0000

    drm/i915: GFX_MODE Flush TLB Invalidate Mode must be '1' for scanline waits

Therefore, we do not need to try to figure it out dynamically and we can
just always invalidate the TLBs.

It's a partial revert of:
commit 12b0286f49
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Mon Jun 4 14:42:50 2012 -0700

    drm/i915: possibly invalidate TLB before context switch

The original commit attempted to only invalidate when necessary
(very much a relic from the old days). Now, we can just always invalidate.

I guess the old TODO still exists. Since we seem to have abandoned ILK
contexts however, there isn't much point in even remembering.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-04-03 11:41:39 +02:00
Chris Wilson
508774452d drm/i915: Broadwell expands ACTHD to 64bit
As Broadwell has an increased virtual address size, it requires more
than 32 bits to store offsets into its address space. This includes the
debug registers to track the current HEAD of the individual rings, which
may be anywhere within the per-process address spaces. In order to find
the full location, we need to read the high bits from a second register.
We then also need to expand our storage to keep track of the larger
address.

v2: Carefully read the two registers to catch wraparound between
    the reads.
v3: Use a WARN_ON rather than loop indefinitely on an unstable
    register read.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Timo Aaltonen <tjaalton@ubuntu.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Drop spurious hunk which conflicted.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-28 18:33:14 +01:00
Naresh Kumar Kachhi
e9fea5747d drm/i915: wait for rings to become idle once disabled
make sure we wait for rings to become idle once they are
disabled. In case of timeout print an error message

Signed-off-by: Naresh Kumar Kachhi <naresh.kumar.kachhi@intel.com>
[danvet: Frob patch as suggested by Chris.]
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-12 16:04:19 +01:00
Daniel Vetter
e8e6e6012d Linux 3.14-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTHSaRAAoJEHm+PkMAQRiG7G8IAJHElwFDNSQE7Y9MmbicrAMG
 kfjhBtBpTaVrJKQXegCNUwDaLLyC4oLIxDheW84oPXbrEGDLqPtBov/hrcFkHVr4
 lh/ZYk02nYtcfpN0JnL/Yj2oKHVmBWs0vFlM7StSFsJCj10DoCVQQdmAJ8XODTPo
 CXMapk+UikTX1TlIO8+B5toyl3R1OqPmW211UV1vQVLKy66hu+MKVN/V+/EyopL0
 1jO81EDpaRaeIJh1/okcyUoIq9pqLkAWNpeQ7uyXZ+Sfivt9RXwLYKmAB3lP20Hc
 ZMIIoHSCyYRFjxLlQvt02bA9nY4wTY7YN5kZ2kk65y7TFfhcGsCw1Sc69iyCoKs=
 =CJcA
 -----END PGP SIGNATURE-----

Merge tag 'v3.14-rc6' into drm-intel-next-queued

Linux 3.14-rc6

I need the hdmi/dvi-dual link fixes in 3.14 to avoid ugly conflicts
when merging Ville's new hdmi cloning support into my -next tree

Conflicts:
	drivers/gpu/drm/i915/Makefile
	drivers/gpu/drm/i915/intel_dp.c

Makefile cleanup conflicts with an acpi build fix, intel_dp.c is
trivial.

Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-10 21:43:46 +01:00
Brad Volkin
351e3db2b3 drm/i915: Implement command buffer parsing logic
The command parser scans batch buffers submitted via execbuffer ioctls before
the driver submits them to hardware. At a high level, it looks for several
things:

1) Commands which are explicitly defined as privileged or which should only be
   used by the kernel driver. The parser generally rejects such commands, with
   the provision that it may allow some from the drm master process.
2) Commands which access registers. To support correct/enhanced userspace
   functionality, particularly certain OpenGL extensions, the parser provides a
   whitelist of registers which userspace may safely access (for both normal and
   drm master processes).
3) Commands which access privileged memory (i.e. GGTT, HWS page, etc). The
   parser always rejects such commands.

See the overview comment in the source for more details.

This patch only implements the logic. Subsequent patches will build the tables
that drive the parser.

v2: Don't set the secure bit if the parser succeeds
Fail harder during init
Makefile cleanup
Kerneldoc cleanup
Clarify module param description
Convert ints to bools in a few places
Move client/subclient defs to i915_reg.h
Remove the bits_count field

OTC-Tracker: AXIA-4631
Change-Id: I50b98c71c6655893291c78a2d1b8954577b37a30
Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
[danvet: Appease checkpatch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-03-07 22:37:00 +01:00
Ville Syrjälä
753b1ad4a2 drm/i915: Add intel_ring_cachline_align()
intel_ring_cachline_align() emits MI_NOOPs until the ring tail is
aligned to a cacheline boundary.

Cc: Bjoern C <lkml@call-home.ch>
Cc: Alexandru DAMIAN <alexandru.damian@intel.com>
Cc: Enrico Tagliavini <enrico.tagliavini@gmail.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org (prereq for the next patch)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-11 23:00:19 +01:00
Mika Kuoppala
b6b0fac04d drm/i915: Use hangcheck score to find guilty context
With full ppgtt using acthd is not enough to find guilty
batch buffer. We get multiple false positives as acthd is
per vm.

Instead of scanning which vm was running on a ring,
to find corressponding context, use a different, simpler,
strategy of finding batches that caused gpu hang:

If hangcheck has declared ring to be hung, find first non complete
request on that ring and claim it was guilty.

v2: Rebase

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73652
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2014-02-04 11:57:24 +01:00
Chris Wilson
092467327c drm/i915: Write RING_TAIL once per-request
Ignoring the legacy DRI1 code, and a couple of special cases (to be
discussed later), all access to the ring is mediated through requests.
The first write to a ring will grab a seqno and mark the ring as having
an outstanding_lazy_request. Either through explicitly adding a request
after an execbuffer or through an implicit wait (either by the CPU or by
a semaphore), that sequence of writes will be terminated with a request.
So we can ellide all the intervening writes to the tail register and
send the entire command stream to the GPU at once. This will reduce the
number of *serialising* writes to the tail register by a factor or 3-5
times (depending upon architecture and number of workarounds, context
switches, etc involved). This becomes even more noticeable when the
register write is overloaded with a number of debugging tools. The
astute reader will wonder if it is then possible to overflow the ring
with a single command. It is not. When we start a command sequence to
the ring, we check for available space and issue a wait in case we have
not. The ring wait will in this case be forced to flush the outstanding
register write and then poll the ACTHD for sufficient space to continue.

The exception to the rule where everything is inside a request are a few
initialisation cases where we may want to write GPU commands via the CS
before userspace wakes up and page flips.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-10 15:35:58 +02:00
Mika Kuoppala
da66146425 drm/i915: include hangcheck action and score in error_state
Score and action reveals what all the rings were doing
and why hang was declared. Add idle state so that
we can distinguish between waiting and idle ring.

v2: - add idle as a hangcheck action
    - consensed hangcheck status to single line (Chris)
    - mark active explicitly when we are making progress (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-06 17:56:17 +02:00
Chris Wilson
3c0e234c84 drm/i915; Preallocate the lazy request
It is possible for us to be forced to perform an allocation for the lazy
request whilst running the shrinker. This allocation may fail, leaving
us unable to reclaim any memory leading to premature OOM. A neat
solution to the problem is to preallocate the request at the same time
as acquiring the seqno for the ring transaction. This means that we can
report ENOMEM prior to touching the rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:53 +02:00
Chris Wilson
1823521d2b drm/i915: Rename ring->outstanding_lazy_request
Prior to preallocating an request for lazy emission, rename the existing
field to make way (and differentiate the seqno from the request struct).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-05 12:03:12 +02:00
Chris Wilson
0d1aacac36 drm/i915: Embed the ring->private within the struct intel_ring_buffer
We now have more devices using ring->private than not, and they all want
the same structure. Worse, I would like to use a scratch page from
outside of intel_ringbuffer.c and so for convenience would like to reuse
ring->private. Embed the object into the struct intel_ringbuffer so that
we can keep the code clean.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-09-03 19:17:55 +02:00
Damien Lespiau
e3ce7633ba drm/i915: Remove I915_READ_{NOPID, SYNC_0, SYNC_1})()
The code directly uses the registers and ring->mmio_base.

Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-23 14:52:24 +02:00
Jani Nikula
f2f4d82faf drm/i915: give more distinctive names to ring hangcheck action enums
The short lowercase names are bound to collide. The default warnings
don't even warn about shadowing.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-08-22 13:31:37 +02:00
Daniel Vetter
c7113cc35f drm/i915: unify ring irq refcounts (again)
With the simplified locking there's no reason any more to keep the
refcounts seperate.

v2: Readd the lost comment that ring->irq_refcount is protected by
dev_priv->irq_lock.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:49 +02:00
Daniel Vetter
59cdb63d52 drm/i915: kill dev_priv->rps.lock
Now that the rps interrupt locking isn't clearly separated (at elast
conceptually) from all the other interrupt locking having a different
lock stopped making sense: It protects much more than just the rps
workqueue it started out with. But with the addition of VECS the
separation started to blurr and resulted in some more complex locking
for the ring interrupt refcount.

With this we can (again) unifiy the ringbuffer irq refcounts without
causing a massive confusion, but that's for the next patch.

v2: Explain better why the rps.lock once made sense and why no longer,
requested by Ben.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-07-11 14:36:43 +02:00
Mika Kuoppala
ad8beaeada drm/i915: store ring hangcheck action
For guilty batchbuffer analysis later on when rings are reset,
store what state the ring was on when hang was declared.
This helps to weed out the waiting rings from the active ones.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-13 17:42:17 +02:00
Chris Wilson
6274f2126a drm/i915: Don't count semaphore waits towards a stuck ring
If we detect a ring is in a valid wait for another, just let it be.
Eventually it will either begin to progress again, or the entire system
will come grinding to a halt and then hangcheck will fire as soon as the
deadlock is detected.

This error was foretold by Ben in
commit 05407ff889
Author: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Date:   Thu May 30 09:04:29 2013 +0300

    drm/i915: detect hang using per ring hangcheck_score

"If ring B is waiting on ring A via semaphore, and ring A is making
progress, albeit slowly - the hangcheck will fire. The check will
determine that A is moving, however ring B will appear hung because
the ACTHD doesn't move. I honestly can't say if that's actually a
realistic problem to hit it probably implies the timeout value is too
low."

v2: Make sure we don't even incur the KICK cost whilst waiting.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65394
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-11 11:49:28 +02:00
Chris Wilson
c65355bbef drm/i915: Track when we dirty the scanout with render commands
This is required for tracking render damage for use with FBC and will be
used in subsequent patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-07 17:56:45 +02:00
Mika Kuoppala
05407ff889 drm/i915: detect hang using per ring hangcheck_score
Keep track of ring seqno progress and if there are no
progress detected, declare hang. Use actual head (acthd)
to distinguish between ring stuck and batchbuffer looping
situation. Stuck ring will be kicked to trigger progress.

This commit adds a hard limit for batchbuffer completion time.
If batchbuffer completion time is more than 4.5 seconds,
the gpu will be declared hung.

Review comment from Ben which nicely clarifies the semantic change:

"Maybe I'm just stating the functional changes of the patch, but in case
they were unintended here is what I see as potential issues:

1. "If ring B is waiting on ring A via semaphore, and ring A is making
   progress, albeit slowly - the hangcheck will fire. The check will
   determine that A is moving, however ring B will appear hung because
   the ACTHD doesn't move. I honestly can't say if that's actually a
   realistic problem to hit it probably implies the timeout value is too
   low.

2. "There's also another corner case on the kick. If the seqno = 2
   (though not stuck), and on the 3rd hangcheck, the ring is stuck, and
   we try to kick it... we don't actually try to find out if the kick
   helped"

v2: use atchd to detect stuck ring from loop (Ben Widawsky)

v3: Use acthd to check when ring needs kicking.
Declare hang on third time in order to give time for
kick_ring to take effect.

v4: Update commit msg

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
[danvet: Paste in Ben's review comment.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-06-03 10:58:21 +02:00
Ben Widawsky
a19d2933cb drm/i915: vebox interrupt get/put
v2: Use the correct lock to protect PM interrupt regs, this was
accidentally lost from earlier (Haihao)
Fix return types (Ben)

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:19 +02:00
Ben Widawsky
aeb0659338 drm/i915: Convert irq_refounct to struct
It's overkill on older gens, but it's useful for newer gens.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:18 +02:00
Ben Widawsky
9a8a2213a7 drm/i915: Vebox ringbuffer init
v2: Add set_seqno which didn't exist before rebase (Haihao)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Xiang, Haihao <haihao.xiang@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:12 +02:00
Ben Widawsky
4a3dd19d94 drm/i915: Introduce VECS: the 4th ring
The video enhancement command streamer is a new ring on HSW which does
what it sounds like it does. This patch provides the most minimal
inception of the ring.

In order to support a new ring, we need to bump the number. The patch
may look trivial to the untrained eye, but bumping the number of rings
is a bit scary. As such the patch is not terribly useful by itself, but
a pretty nice place to find issues during a bisection.

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:09 +02:00
Ben Widawsky
ad776f8b09 drm/i915: Semaphore MBOX update generalization
This replaces the existing MBOX update code with a more generalized
calculation for emitting mbox updates. We also create a sentinel for
doing the updates so we can more abstractly deal with the rings.

When doing MBOX updates the code must be aware of the /other/ rings.
Until now the platforms which supported semaphores had a fixed number of
rings and so it made sense for the code to be very specialized
(hardcoded).

The patch does contain a functional change, but should have no
behavioral changes.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:08 +02:00
Ben Widawsky
5586181fce drm/i915: Comments for semaphore clarification
Semaphores are tied very closely to the rings in the GPU. Trivial patch
adds comments to the existing code so that when we add new rings we can
include comments there as well. It also helps distinguish the ring to
semaphore mailbox interactions by using the ringname in the semaphore
data structures.

This patch should have no functional impact.

v2: The English parts (as opposed to register names) of the comments
were reversed. (Damien)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:54:08 +02:00
Mika Kuoppala
92cab73451 drm/i915: track ring progression using seqnos
Instead of relying in acthd, track ring seqno progression
to detect if ring has hung.

v2: put hangcheck stuff inside struct (Chris Wilson)

v3: initialize hangcheck.seqno (Ben Widawsky)

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-31 20:53:54 +02:00
Chris Wilson
112522f678 drm/i915: put context upon switching
In order to be notified of when the context and all of its associated
objects is idle (for if the context maps to a ppgtt) we need a callback
from the retire handler. We can arrange this by using the kref_get/put
of the context for request tracking and by inserting a request to
demarque the switch away from the old context.

[Ben: fixed minor error to patch compile, AND s/last_context/from/]
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2013-05-06 11:20:48 +02:00
Dave Airlie
b5cc6c0387 Merge tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
- seqno wrap fixes and debug infrastructure from Mika Kuoppala and Chris
  Wilson
- some leftover kill-agp on gen6+ patches from Ben
- hotplug improvements from Damien
- clear fb when allocated from stolen, avoids dirt on the fbcon (Chris)
- Stolen mem support from Chris Wilson, one of the many steps to get to
  real fastboot support.
- Some DDI code cleanups from Paulo.
- Some refactorings around lvds and dp code.
- some random little bits&pieces

* tag 'drm-intel-next-2012-12-21' of git://people.freedesktop.org/~danvet/drm-intel: (93 commits)
  drm/i915: Return the real error code from intel_set_mode()
  drm/i915: Make GSM void
  drm/i915: Move GSM mapping into dev_priv
  drm/i915: Move even more gtt code to i915_gem_gtt
  drm/i915: Make next_seqno debugs entry to use i915_gem_set_seqno
  drm/i915: Introduce i915_gem_set_seqno()
  drm/i915: Always clear semaphore mboxes on seqno wrap
  drm/i915: Initialize hardware semaphore state on ring init
  drm/i915: Introduce ring set_seqno
  drm/i915: Missed conversion to gtt_pte_t
  drm/i915: Bug on unsupported swizzled platforms
  drm/i915: BUG() if fences are used on unsupported platform
  drm/i915: fixup overlay stolen memory leak
  drm/i915: clean up PIPECONF bpc #defines
  drm/i915: add intel_dp_set_signal_levels
  drm/i915: remove leftover display.update_wm assignment
  drm/i915: check for the PCH when setting pch_transcoder
  drm/i915: Clear the stolen fb before enabling
  drm/i915: Access to snooped system memory through the GTT is incoherent
  drm/i915: Remove stale comment about intel_dp_detect()
  ...

Conflicts:
	drivers/gpu/drm/i915/intel_display.c
2013-01-17 20:34:08 +10:00
Mika Kuoppala
f7e98ad4d4 drm/i915: Initialize hardware semaphore state on ring init
Hardware status page needs to have proper seqno set
as our initial seqno can be arbitrary. If initial seqno is close
to wrap boundary on init and i915_seqno_passed() (31bit space)
refers to hw status page which contains zero, errorneous result
will be returned.

v2: clear mboxes and set hws page directly instead of going
through rings. Suggested by Chris Wilson.

v3: hws needs to be updated for all gens. Noticed by Chris
Wilson.

References: https://bugs.freedesktop.org/show_bug.cgi?id=58230
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:17:01 +01:00
Mika Kuoppala
b70ec5bf43 drm/i915: Introduce ring set_seqno
In preparation for setting per ring initial seqno values
add ring::set_seqno().

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-19 11:16:18 +01:00
Daniel Vetter
b45305fce5 drm/i915: Implement workaround for broken CS tlb on i830/845
Now that Chris Wilson demonstrated that the key for stability on early
gen 2 is to simple _never_ exchange the physical backing storage of
batch buffers I've tried a stab at a kernel solution. Doesn't look too
nefarious imho, now that I don't try to be too clever for my own good
any more.

v2: After discussing the various techniques, we've decided to always blit
batches on the suspect devices, but allow userspace to opt out of the
kernel workaround assume full responsibility for providing coherent
batches. The principal reason is that avoiding the blit does improve
performance in a few key microbenchmarks and also in cairo-trace
replays.

Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet:
- Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring
  wrap w/a. Suggested by Chris Wilson.
- Also add the ACTHD check from Chris Wilson for the error state
  dumping, so that we still catch batches when userspace opts out of
  the w/a.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-17 17:27:02 +01:00
Mika Kuoppala
498d2ac15c drm/i915: Add intel_ring_handle_seqno wrap
If there are pre-wrap values in semaphore-mbox registers after wrap,
syncing against some after-wrap request will complete immediately.
Fix this by emitting ring commands to set mbox registers to zero
when the wrap happens.

v2: Use __intel_ring_begin to emit ring commands, from
Chris Wilson.

Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a small comment to handle_seqno_wrap.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-06 13:14:34 +01:00
Ville Syrjälä
633cf8f505 drm/i915: Don't allow ring tail to reach the same cacheline as head
From BSpec:
"If the Ring Buffer Head Pointer and the Tail Pointer are on the same
cacheline, the Head Pointer must not be greater than the Tail
Pointer."

The easiest way to enforce this is to reduce the reported ring space.

References:
Gen2 BSpec "1. Programming Environment" / 1.4.4.6 "Ring Buffer Use"
Gen3 BSpec "vol1c Memory Interface Functions" / 2.3.4.5 "Ring Buffer Use"
Gen4+ BSpec "vol1c Memory Interface and Command Stream" / 5.3.4.5 "Ring Buffer Use"

v2: Include the exact BSpec references in the description

v3: s/64/I915_RING_FREE_SPACE, and add the BSpec information to the code

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-12-03 18:31:20 +01:00
Chris Wilson
3e9605018a drm/i915: Rearrange code to only have a single method for waiting upon the ring
Replace the wait for the ring to be clear with the more common wait for
the ring to be idle. The principle advantage is one less exported
intel_ring_wait function, and the removal of a hardcoded value.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:53 +01:00
Chris Wilson
9d7730914f drm/i915: Preallocate next seqno before touching the ring
Based on the work by Mika Kuoppala, we realised that we need to handle
seqno wraparound prior to committing our changes to the ring. The most
obvious point then is to grab the seqno inside intel_ring_begin(), and
then to reuse that seqno for all ring operations until the next request.
As intel_ring_begin() can fail, the callers must already be prepared to
handle such failure and so we can safely add further checks.

This patch looks like it should be split up into the interface
changes and the tweaks to move seqno wrapping from the execbuffer into
the core seqno increment. However, I found no easy way to break it into
incremental steps without introducing further broken behaviour.

v2: Mika found a silly mistake and a subtle error in the existing code;
inside i915_gem_retire_requests() we were resetting the sync_seqno of
the target ring based on the seqno from this ring - which are only
related by the order of their allocation, not retirement. Hence we were
applying the optimisation that the rings were synchronised too early,
fortunately the only real casualty there is the handling of seqno
wrapping.

v3: Do not forget to reset the sync_seqno upon module reinitialisation,
ala resume.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=863861
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> [v2]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-29 11:43:52 +01:00
Jesse Barnes
9a28977181 drm/i915: TLB invalidation with MI_FLUSH_DW requires a post-sync op v3
So store into the scratch space of the HWS to make sure the invalidate
occurs.

v2: use GTT address space for store, clean up #defines (Chris)
v3: use correct #define in blt ring flush (Chris)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Antti Koskipää <antti.koskipaa@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
References: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1063252
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-11-11 23:51:36 +01:00
Chris Wilson
d7d4eeddb8 drm/i915: Allow DRM_ROOT_ONLY|DRM_MASTER to submit privileged batchbuffers
With the introduction of per-process GTT space, the hardware designers
thought it wise to also limit the ability to write to MMIO space to only
a "secure" batch buffer. The ability to rewrite registers is the only
way to program the hardware to perform certain operations like scanline
waits (required for tear-free windowed updates). So we either have a
choice of adding an interface to perform those synchronized updates
inside the kernel, or we permit certain processes the ability to write
to the "safe" registers from within its command stream. This patch
exposes the ability to submit a SECURE batch buffer to
DRM_ROOT_ONLY|DRM_MASTER processes.

v2: Haswell split up bit8 into a ppgtt bit (still bit8) and a security
bit (bit 13, accidentally not set). Also add a comment explaining why
secure batches need a global gtt binding.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v1)
[danvet: added hsw fixup.]
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-10-17 21:06:59 +02:00
Chris Wilson
b2eadbc85b drm/i915: Lazily apply the SNB+ seqno w/a
Avoid the forcewake overhead when simply retiring requests, as often the
last seen seqno is good enough to satisfy the retirment process and will
be promptly re-run in any case. Only ensure that we force the coherent
seqno read when we are explicitly waiting upon a completion event to be
sure that none go missing, and also for when we are reporting seqno
values in case of error or debugging.

This greatly reduces the load for userspace using the busy-ioctl to
track active buffers, for instance halving the CPU used by X in pushing
the pixels from a software render (flash). The effect will be even more
magnified with userptr and so providing a zero-copy upload path in that
instance, or in similar instances where X is simply compositing DRI
buffers.

v2: Reverse the polarity of the tachyon stream. Daniel suggested that
'force' was too generic for the parameter name and that 'lazy_coherency'
better encapsulated the semantics of it being an optimization and its
purpose. Also notice that gen6_get_seqno() is only used by gen6/7
chipsets and so the test for IS_GEN6 || IS_GEN7 is redundant in that
function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-08-10 11:11:32 +02:00
Chris Wilson
a7b9761d0a drm/i915: Split i915_gem_flush_ring() into seperate invalidate/flush funcs
By moving the function to intel_ringbuffer and currying the appropriate
parameter, hopefully we make the callsites easier to read and
understand.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:55 +02:00
Chris Wilson
69c2fc8913 drm/i915: Remove the per-ring write list
This is now handled by a global flag to ensure we emit a flush before
the next serialisation point (if we failed to queue one previously).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2012-07-25 18:23:53 +02:00