Commit Graph

941 Commits

Author SHA1 Message Date
Chris Wilson
3d5564e910 drm/i915: Only apply one barrier after a breadcrumb interrupt is posted
If we flag the seqno as potentially stale upon receiving an interrupt,
we can use that information to reduce the frequency that we apply the
heavyweight coherent seqno read (i.e. if we wake up a chain of waiters).

v2: Use cmpxchg to replace READ_ONCE/WRITE_ONCE for more explicit
control of the ordering wrt to interrupt generation and interrupt
checking in the bottom-half.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-14-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:00:54 +01:00
Chris Wilson
f8973c217f drm/i915: Add a delay between interrupt and inspecting the final seqno (ilk)
On Ironlake, there is no command nor register to ensure that the write
from a MI_STORE command is completed (and coherent on the CPU) before the
command parser continues. This means that the ordering between the seqno
write and the subsequent user interrupt is undefined (like gen6+). So to
ensure that the seqno write is completed after the final user interrupt
we need to delay the read sufficiently to allow the write to complete.
This delay is undefined by the bspec, and empirically requires 75us even
though a register read combined with a clflush is less than 500ns. Hence,
the delay is due to an on-chip buffer rather than the latency of the write
to memory.

Note that the render ring controls this by filling the PIPE_CONTROL fifo
with stalling commands that force the earliest pipe-control with the
seqno to be completed before the command parser continues. Given that we
need a barrier operation for BSD, we may as well forgo the extra
per-batch latency by using a common per-interrupt barrier.

Studying the impact of adding the usleep shows that in both sequences of
and individual synchronous no-op batches is negligible for the media
engine (where the write now is unordered with the interrupt). Converting
the render engine over from the current glutton of pie-controls over to
the per-interrupt delays speeds up both the sequential and individual
synchronous no-ops by 20% and 60%, respectively. This speed up holds
even when looking at the throughput of small copies (4KiB->4MiB), both
serial and synchronous, by about 20%. This is because despite adding a
significant delay to the interrupt, in all likelihood we will see the
seqno write without having to apply the barrier (only in the rare corner
cases where the write is delayed on the last required is the delay
necessary).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94307
Testcase: igt/gem_sync #ilk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-12-git-send-email-chris@chris-wilson.co.uk
2016-07-01 21:00:53 +01:00
Chris Wilson
1b7744e7ba drm/i915: Use HWS for seqno tracking everywhere
By using the same address for storing the HWS on every platform, we can
remove the platform specific vfuncs and reduce the get-seqno routine to
a single read of a cached memory location.

v2: Fix semaphore_passed() to look at the signaling engine (not the
waiter's)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-8-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:48 +01:00
Chris Wilson
688e6c7258 drm/i915: Slaughter the thundering i915_wait_request herd
One particularly stressful scenario consists of many independent tasks
all competing for GPU time and waiting upon the results (e.g. realtime
transcoding of many, many streams). One bottleneck in particular is that
each client waits on its own results, but every client is woken up after
every batchbuffer - hence the thunder of hooves as then every client must
do its heavyweight dance to read a coherent seqno to see if it is the
lucky one.

Ideally, we only want one client to wake up after the interrupt and
check its request for completion. Since the requests must retire in
order, we can select the first client on the oldest request to be woken.
Once that client has completed his wait, we can then wake up the
next client and so on. However, all clients then incur latency as every
process in the chain may be delayed for scheduling - this may also then
cause some priority inversion. To reduce the latency, when a client
is added or removed from the list, we scan the tree for completed
seqno and wake up all the completed waiters in parallel.

Using igt/benchmarks/gem_latency, we can demonstrate this effect. The
benchmark measures the number of GPU cycles between completion of a
batch and the client waking up from a call to wait-ioctl. With many
concurrent waiters, with each on a different request, we observe that
the wakeup latency before the patch scales nearly linearly with the
number of waiters (before external factors kick in making the scaling much
worse). After applying the patch, we can see that only the single waiter
for the request is being woken up, providing a constant wakeup latency
for every operation. However, the situation is not quite as rosy for
many waiters on the same request, though to the best of my knowledge this
is much less likely in practice. Here, we can observe that the
concurrent waiters incur extra latency from being woken up by the
solitary bottom-half, rather than directly by the interrupt. This
appears to be scheduler induced (having discounted adverse effects from
having a rbtree walk/erase in the wakeup path), each additional
wake_up_process() costs approximately 1us on big core. Another effect of
performing the secondary wakeups from the first bottom-half is the
incurred delay this imposes on high priority threads - rather than
immediately returning to userspace and leaving the interrupt handler to
wake the others.

To offset the delay incurred with additional waiters on a request, we
could use a hybrid scheme that did a quick read in the interrupt handler
and dequeued all the completed waiters (incurring the overhead in the
interrupt handler, not the best plan either as we then incur GPU
submission latency) but we would still have to wake up the bottom-half
every time to do the heavyweight slow read. Or we could only kick the
waiters on the seqno with the same priority as the current task (i.e. in
the realtime waiter scenario, only it is woken up immediately by the
interrupt and simply queues the next waiter before returning to userspace,
minimising its delay at the expense of the chain, and also reducing
contention on its scheduler runqueue). This is effective at avoid long
pauses in the interrupt handler and at avoiding the extra latency in
realtime/high-priority waiters.

v2: Convert from a kworker per engine into a dedicated kthread for the
bottom-half.
v3: Rename request members and tweak comments.
v4: Use a per-engine spinlock in the breadcrumbs bottom-half.
v5: Fix race in locklessly checking waiter status and kicking the task on
adding a new waiter.
v6: Fix deciding when to force the timer to hide missing interrupts.
v7: Move the bottom-half from the kthread to the first client process.
v8: Reword a few comments
v9: Break the busy loop when the interrupt is unmasked or has fired.
v10: Comments, unnecessary churn, better debugging from Tvrtko
v11: Wake all completed waiters on removing the current bottom-half to
reduce the latency of waking up a herd of clients all waiting on the
same request.
v12: Rearrange missed-interrupt fault injection so that it works with
igt/drv_missed_irq_hang
v13: Rename intel_breadcrumb and friends to intel_wait in preparation
for signal handling.
v14: RCU commentary, assert_spin_locked
v15: Hide BUG_ON behind the compiler; report on gem_latency findings.
v16: Sort seqno-groups by priority so that first-waiter has the highest
task priority (and so avoid priority inversion).
v17: Add waiters to post-mortem GPU hang state.
v18: Return early for a completed wait after acquiring the spinlock.
Avoids adding ourselves to the tree if the is already complete, and
skips the awkward question of why we don't do completion wakeups for
waits earlier than or equal to ourselves.
v19: Prepare for init_breadcrumbs to fail. Later patches may want to
allocate during init, so be prepared to propagate back the error code.

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_latency
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
Cc: "Gong, Zhipeng" <zhipeng.gong@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: "Goel, Akash" <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> #v18
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-6-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:58:43 +01:00
Chris Wilson
1f15b76f1e drm/i915: Separate GPU hang waitqueue from advance
Currently __i915_wait_request uses a per-engine wait_queue_t for the dual
purpose of waking after the GPU advances or for waking after an error.
In the future, we may add even more wake sources and require greater
separation, but for now we can conceptually simplify wakeups by separating
the two sources. In particular, this allows us to use different wait-queues
(e.g. one on the engine advancement, a global one for errors and one on
each requests) without any hassle.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-5-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:31 +01:00
Chris Wilson
26a02b8fc3 drm/i915: Make queueing the hangcheck work inline
Since the function is a small wrapper around schedule_delayed_work(),
move it inline to remove the function call overhead for the principle
caller.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-4-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:30 +01:00
Chris Wilson
7774002586 drm/i915: Remove the dedicated hangcheck workqueue
The queue only ever contains at most one item and has no special flags.
It is just a very simple wrapper around the system-wq - a complication
with no benefits.

v2: Use the system_long_wq as we may wish to capture the error state
after detecting the hang - which may take a bit of time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-3-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:29 +01:00
Chris Wilson
05535726d3 drm/i915: Delay queuing hangcheck to wait-request
We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. However, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

As pointed out by Tvrtko, it is possible for a GPU hang to go unnoticed
for as long as nobody is waiting for the GPU. Though this rare, during
that time we may be consuming more power than if we had promptly
recovered, and in the most extreme case we may exhaust all memory before
forcing the hangcheck. Something to be wary off in future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1467390209-3576-2-git-send-email-chris@chris-wilson.co.uk
2016-07-01 20:42:28 +01:00
Tvrtko Ursulin
14bb2c1179 drm/i915: Fix a buch of kerneldoc warnings
Just a bunch of stale kerneldocs generating warnings when
building the docs. Mostly function parameters so not very
useful but still.

v2: Tidy.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1464958937-23344-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-06-06 13:04:26 +01:00
Sagar Arun Kamble
1800ad255c drm/i915: Update GEN6_PMINTRMSK setup with GuC enabled
On Loading, GuC sets PM interrupts routing (bit 31) and clears ARAT
expired interrupt (bit 9). Host turbo also updates this register
in RPS flows. This patch ensures bit 31 and bit 9 setup by GuC persists.
ARAT timer interrupt is needed in GuC for various features. It also
facilitates halting GuC and hence achieving RC6. PM interrupt routing
will not impact RPS interrupt reception by host as GuC will redirect
them.
This patch fixes igt test pm_rc6_residency that was failing with guc
load/submission enabled. Tested with SKL GuC v6.1 and BXT GuC v5.1 and v8.7.

v2: i915_irq/i915_pm decoupling from intel_guc. (ChrisW)

v3: restructuring the mask update and rebase w.r.t Ville's patch. (ChrisW)

v4: Updating the pm_intr_keep during direct_interrupts_to_guc. (Sagar)

Cc: Chris Harris <chris.harris@intel.com>
Cc: Zhe Wang <zhe1.wang@intel.com>
Cc: Deepak S <deepak.s@intel.com>
Cc: Satyanantha, Rama Gopal M <rama.gopal.m.satyanantha@intel.com>
Cc: Akash Goel <akash.goel@intel.com>
Testcase: igt/pm_rc6_residency
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Tested-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1464683307-19475-1-git-send-email-sagar.a.kamble@intel.com
2016-05-31 16:13:47 -07:00
Daniel Vetter
5a21b6650a drm/i915: Revert async unpin and nonblocking atomic commit
This reverts the following patches:

d55dbd06bb drm/i915: Allow nonblocking update of pageflips.
15c86bdb76 drm/i915: Check for unpin correctness.
95c2ccdc82 Reapply "drm/i915: Avoid stalling on pending flips for legacy cursor updates"
a6747b7304 drm/i915: Make unpin async.
03f476e1fc drm/i915: Prepare connectors for nonblocking checks.
2099deffef drm/i915: Pass atomic states to fbc update functions.
ee7171af72 drm/i915: Remove reset_counter from intel_crtc.
2ee004f7c5 drm/i915: Remove queue_flip pointer.
b8d2afae55 drm/i915: Remove use_mmio_flip kernel parameter.
8dd634d922 drm/i915: Remove cs based page flip support.
143f73b3bf drm/i915: Rework intel_crtc_page_flip to be almost atomic, v3.
84fc494b64 drm/i915: Add the exclusive fence to plane_state.
6885843ae1 drm/i915: Convert flip_work to a list.
aa420ddd8e drm/i915: Allow mmio updates on all platforms, v2.
afee4d8707 Revert "drm/i915: Avoid stalling on pending flips for legacy cursor updates"

"drm/i915: Allow nonblocking update of pageflips" should have been
split up, misses a proper commit message and seems to cause issues in
the legacy page_flip path as demonstrated by kms_flip.

"drm/i915: Make unpin async" doesn't handle the unthrottled cursor
updates correctly, leading to an apparent pin count leak. This is
caught by the WARN_ON in i915_gem_object_do_pin which screams if we
have more than DRM_I915_GEM_OBJECT_MAX_PIN_COUNT pins.

Unfortuantely we can't just revert these two because this patch series
came with a built-in bisect breakage in the form of temporarily
removing the unthrottled cursor update hack for legacy cursor ioctl.
Therefore there's no other option than to revert the entire pile :(

There's one tiny conflict in intel_drv.h due to other patches, nothing
serious.

Normally I'd wait a bit longer with doing a maintainer revert, but
since the minimal set of patches we need to revert (due to the bisect
breakage) is so big, time is running out fast. And very soon
(especially after a few attempts at fixing issues) it'll be really
hard to revert things cleanly.

Lessons learned:
- Not a good idea to rush the review (done by someone fairly new to
  the area) and not make sure domain experts had a chance to read it.

- Patches should be properly split up. I only looked at the two
  patches that should be reverted in detail, but both look like the
  mix up different things in one patch.

- Patches really should have proper commit messages. Especially when
  doing more than one thing, and especially when touching critical and
  tricky core code.

- Building a patch series and r-b stamping it when it has a built-in
  bisect breakage is not a good idea.

- I also think we need to stop building up technical debt by
  postponing atomic igt testcases even longer. I think it's clear that
  there's enough corner cases in this beast that we really need to
  have the testcases _before_ the next step lands.

Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
2016-05-25 09:33:04 +02:00
Ville Syrjälä
11825b0dba drm/i915: Enable GSE interrupt on BDW+
We've never actually enabled or unmasked the GSE interrupt on BDW+,
even though the interrupt handler was always prepared for it.
Let's enable it and see what happens.

Credit to Mark Kettenis who fixed this in the OpenBSD fork of the
driver. He reports that it fixed the "ACPI _BCM/_BCQ-based
brightness mechanism on a MacBookPro12,1 and a 3rd gen Lenovo X1
Carbon" for them.

Mark says:
"FWIW, this *is* needed if you want ACPI-based backlight control to
 work.  On Linux you probably don't notice, since "hardware" backlight
 control is preferred over "firmware" or "platform" backlight control.

 It would help me if this did land in the Linux tree though, as it will
 make future imports of the i915 driver into OpenBSD easier."

So even though we don't really need this, let's put it in to help Mark
with future porting efforts. Should be harmless to have it enabled in
any case.

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
References: http://lists.freedesktop.org/archives/intel-gfx/2015-December/081799.html
Reported-by: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463649283-28698-1-git-send-email-ville.syrjala@linux.intel.com
Acked-by: Jani Nikula <jani.nikula@intel.com>
2016-05-23 17:53:52 +03:00
Maarten Lankhorst
8dd634d922 drm/i915: Remove cs based page flip support.
With mmio flips now available on all platforms it's time to remove
support for cs flips.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463490484-19540-13-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
2016-05-19 14:38:15 +02:00
Maarten Lankhorst
51cbaf010f drm/i915: Unify unpin_work and mmio_work into flip_work, v2.
Rename intel_unpin_work to intel_flip_work and use it for mmio flips
and unpinning. Use flip_queued_req to hold the wait request in the
mmio case, and the vblank counter from intel_crtc_get_vblank_counter.

MMIO flips get their own path through intel_finish_page_flip_mmio,
handled on vblank. CS page flips go through *_cs.

Changes since v1:
- Clean up destinction between MMIO and CS flips.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463490484-19540-7-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
2016-05-19 14:37:17 +02:00
Maarten Lankhorst
5251f04e0c drm/i915: Remove intel_prepare_page_flip, v3.
Instead of calling prepare_flip right before calling finish_page_flip
do everything from prepare_page_flip in finish_page_flip.

Putting prepare and finish page_flip in a single step removes the need
for INTEL_FLIP_COMPLETE, so it can be removed. This simplifies the code
slightly.

Changes since v1:
- Invert if case to simplify code.
- Add missing barrier.
- Reword commit message.
Changes since v2:
- intel_page_flip_plane is removed.
- work->pending is turned into a bool.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463490484-19540-5-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
2016-05-19 14:36:59 +02:00
Maarten Lankhorst
ef58319d3f drm/i915: Remove intel_finish_page_flip_plane.
This function is duplicated with intel_finish_page_flip,
and is only ever used from planes that could use the
other function anyway.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1463490484-19540-4-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
2016-05-19 14:36:48 +02:00
Tvrtko Ursulin
7e22dbbbae drm/i915: Replace "INTEL_INFO->gen == x" checks with IS_GENx
This way optimization from a previous patch works even better.

v2: Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
2016-05-11 12:27:27 +01:00
Chris Wilson
dc97997a21 drm/i915: Use drm_i915_private as the native pointer for intel_uncore.c
Pass drm_i915_private to the uncore init/fini routines and their
subservients as it is their native type.

   text    data     bss     dec     hex filename
6309978 3578778  696320 10585076         a183f4 vmlinux
6309530 3578778  696320 10584628         a18234 vmlinux

a modest 400 bytes of saving, but 60 lines of code deleted!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462885804-26750-1-git-send-email-chris@chris-wilson.co.uk
2016-05-10 17:20:20 +01:00
Chris Wilson
c033666a94 drm/i915: Store a i915 backpointer from engine, and use it
text	   data	    bss	    dec	    hex	filename
6309351	3578714	 696320	10584385	 a18141	vmlinux
6308391	3578714	 696320	10583425	 a17d81	vmlinux

Almost 1KiB of code reduction.

v2: More s/INTEL_INFO()->gen/INTEL_GEN()/ and IS_GENx() conversions

   text	   data	    bss	    dec	    hex	filename
6304579	3578778	 696320	10579677	 a16edd	vmlinux
6303427	3578778	 696320	10578525	 a16a5d	vmlinux

Now over 1KiB!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1462545621-30125-3-git-send-email-chris@chris-wilson.co.uk
2016-05-09 13:41:24 +01:00
Tvrtko Ursulin
91d14251bb drm/i915: Small display interrupt handlers tidy
I have noticed some of our interrupt handlers use both dev and
dev_priv while they could get away with only dev_priv in the
huge majority of cases.

Tidying that up had a cascading effect on changing functions
prototypes, so relatively big churn factor, but I think it is
for the better.

For example even where changes cascade out of i915_irq.c, for
functions prefixed with intel_, genX_ or <plat>_, it makes more
sense to take dev_priv directly anyway.

This allows us to eliminate local variables and intermixed usage
of dev and dev_priv where only one is good enough.

End result is shrinkage of both source and the resulting binary.

i915.ko:

 - .text         000b0899
 + .text         000b0619

Or if we look at the Gen8 display irq chain:

 -00000000000006ad t gen8_irq_handler
 +0000000000000663 t gen8_irq_handler
   -0000000000000028 T intel_opregion_asle_intr
   +0000000000000024 T intel_opregion_asle_intr
   -000000000000008c t ilk_hpd_irq_handler
   +000000000000007f t ilk_hpd_irq_handler
   -0000000000000116 T intel_check_page_flip
   +0000000000000112 T intel_check_page_flip
   -000000000000011a T intel_prepare_page_flip
   +0000000000000119 T intel_prepare_page_flip
   -0000000000000014 T intel_finish_page_flip_plane
   +0000000000000013 T intel_finish_page_flip_plane
   -0000000000000053 t hsw_pipe_crc_irq_handler
   +000000000000004c t hsw_pipe_crc_irq_handler
   -000000000000022e t cpt_irq_handler
   +0000000000000213 t cpt_irq_handler

So small shrinkage but it is all fast paths so doesn't harm.

Situation is similar in other interrupt handlers as well.

v2: Tidy intel_queue_rps_boost_for_request as well. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-05-09 13:38:16 +01:00
Tvrtko Ursulin
98735739cf drm/i915/gen8+: Do not enable DPF interrupt since the handler does not exist
Looks like DPF was not implemented for gen8+ but the IER and IMR
are still enabled on initialization.

Since there is no code to handle this interrupt, gate the irq
enablement behind HAS_L3_DPF in case the feature gets enabled
in the future.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
2016-04-20 09:59:16 +01:00
Ville Syrjälä
e30e251acc drm/i915: Split gen8_gt_irq_handler into ack+handle
As we did on VLV, split the gt irq handling to ack and handler phases on
CHV. Leave the BDW+ codepath mostly intact for now.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-13-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:46:00 +03:00
Ville Syrjälä
261e40b8b7 drm/i915: Eliminate passing dev+dev_priv to {snb,ilk}_gt_irq_handler()
It looks silly to pass both dev and dev_priv to the snb/ilk gt irq
handlers. Just pass dev_priv.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-12-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:49 +03:00
Ville Syrjälä
528948745f drm/i915: Move gt/pm irq handling out from irq disabled section on VLV
No need to actually handle the GT/PM interrupt while we have interrupt
sources disabled. Move the actual processing to happen after we've
restored VLV_IER and master interrupt enable.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-11-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:45 +03:00
Ville Syrjälä
2ecb8ca4a0 drm/i915: Split VLV/CVH PIPESTAT handling into ack+handler
Minimize the amount of stuff we do with interrupt sources disabled by
splitting the PIPESTAT irq handling into ack+handler phases.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-10-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:42 +03:00
Ville Syrjälä
1ae3c34c09 drm/i915: Split PORT_HOTPLUG_STAT ack out from i9xx_hpd_irq_handler()
Split the VLV/CHV hoplug irq handling to ack and handler phases. This
way we can move the actual irq handling outside the section where
we have disabled the interrupt sources.

For now, we leave things as is for pre-VLV GMCH platforms, but
eventually they could get the same treatment.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-9-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:37 +03:00
Ville Syrjälä
6e814800a2 drm/i915: Move variables to narrower scope in VLV/CHV irq handlers
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-8-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:33 +03:00
Ville Syrjälä
1e1cace942 drm/i915: Eliminate loop from VLV irq handler
Now that we've dealt with the races in clearing IIR bits via VLV_IER
and the master interrupt enable, we can go ahead aliminate the loop
from the VLV interrupt handler.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-7-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:27 +03:00
Ville Syrjälä
a5e485a95c drm/i915: Clear VLV_IER around irq processing
On VLV/CHV the master interrupt enable bit only affects GT/PM
interrupts. Display interrupts are not affected by the master
irq control.

Also it seems that the CPU interrupt will only be generated when
the combined result of all GT/PM/display interrupts has a 0->1
edge. We already use the master interrupt enable bit to make sure
GT/PM interrupt can generate such an edge if we don't end up clearing
all IIR bits. We must do the same for display interrupts, and for
that we can simply clear out VLV_IER, and restore after we've acked
all the interrupts we are about to process.

So with both master interrupt enable and VLV_IER cleared out, we will
guarantee that there will be a 0->1 edge if any IIR bits are still set
at the end, and thus another CPU interrupt will be generated.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 579de73b04 ("drm/i915: Exit cherryview_irq_handler() after one pass")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-6-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:22 +03:00
Ville Syrjälä
4a0a0202b0 drm/i915: Clear VLV_MASTER_IER around irq processing
Like on CHV, let's clear out the master irq enable bit when we ack
GT/PM interrupts. This will allow GT/PM interrupts to re-raise the
CPU interrupt if we fail to clear all the bits from the IIR(s).

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-5-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:19 +03:00
Ville Syrjälä
7ce4d1f273 drm/i915: Clear VLV_IIR after PIPESTAT
On VLV/CHV VLV_IIR is not double double buffered, and it doesn't detect
edges from PIPESTAT & co. like it does on gen4. Instead it just
directly latches the level from PIPESTAT & co. That means we must clear
VLV_IIR after PIPESTAT & co. or else we'll get a spurious bit in VLV_IIR
every single time.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:15 +03:00
Ville Syrjälä
34c7b8a7b8 drm/i915: Set up VLV_MASTER_IER consistently
We're lacking VLV_MASTER_IER setup from valleyview_irq_preinstall(), so
add it there. Also cargo cult in some POSTING_READ()s to match the other
platforms.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-3-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:11 +03:00
Ville Syrjälä
e5328c43d4 drm/i915: Use GEN8_MASTER_IRQ_CONTROL consistently
Use GEN8_MASTER_IRQ_CONTROL instead of DE_MASTER_IRQ_CONTROL or
MASTER_INTERRUPT_ENABLE with the GEN8_MASTER_IRQ register. They're
all bit 31 so there's no actual bug here, but let's be consistent
which name we use for the bit.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460571598-24452-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-04-14 14:45:05 +03:00
Chris Wilson
d98c52cf4f drm/i915: Tighten reset_counter for reset status
In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.

The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.

PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Chris Wilson
c19ae989b0 drm/i915: Hide the atomic_read(reset_counter) behind a helper
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.

v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00
Ville Syrjälä
71b8b41d5b drm/i915: Move DPINVGTT setup to vlv_display_irq_reset()
DPINVGTT lives inside the disp2d power well so we can't frob it unless
we know the power well is active. Let's this stuff into
vlv_display_irq_reset() which is only called at the right times so that
we don't get unclaimed register access errors.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94164
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-10-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:09:21 +03:00
Ville Syrjälä
6b7eafc1b4 drm/i915: Warn if irq_mask isn't ~0 during vlv/cvh display irq postinstall
We expect vlv_display_irq_reset() to have been called prior to
vlv_display_irq_postinstall() so let's WARN if that isn't the case.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-8-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:08:51 +03:00
Ville Syrjälä
9ab981f22b drm/i915: Use GEN5_IRQ_INIT() in vlv_display_irq_postinstall()
Replace the hand rolled IMR/IER setup in vlv_display_irq_postinstall()
with GEN5_IRQ_INIT(). Also rename the iir_mask to enable_mask to avoid
consusion since we no longer deal with IIR here.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-7-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:08:32 +03:00
Ville Syrjälä
d6c6980358 drm/i915: Clear display interrupt before enabling when turning on the power well
For a bit of extra paranoia make sure the display irqs are all cleared
before we enabled them when turning on the power well. This should
really be the case already since the power well was off which resets
everything.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-6-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:07:38 +03:00
Ville Syrjälä
8bb613068a drm/i915: Move vlv/chv display irq code to a more logical place
Reshuffle the code a bit to move the vlv/chv display irq functions away
from the main irq hooks, next to the other sub (de,gt,etc.) hooks.

v2: Rebased due to changes in vlv_display_irq_reset()

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
Link: http://patchwork.freedesktop.org/patch/msgid/1460476604-2035-1-git-send-email-ville.syrjala@linux.intel.com
2016-04-12 19:07:24 +03:00
Ville Syrjälä
9918271efc drm/i915: Skip display irq setup if display irqs aren't flagged as enabled
During runtime PM we'll be reinitializing interrupt support from the
ground up. However since the display power well will be off at that
time, well end up with a ton of unclaimed register accesses from the
display irq setup. Since we turned off the power well already before
runtime suspend, we've flagged display irqs as disabled during runtime
PM transitions. So we can just check that flag to see if we should do
skip display irqs during irq setup.

During driver load display irqs will be flagged as enabled since we've
turned on the power well already, however the power well code will have
skipped the display irq setup since irq support as a whole wasn't yet
enabled when the power well was enabled. So we'll want to do the display
irq setup in that case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94164
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:07:13 +03:00
Ville Syrjälä
ad22d10654 drm/i915: Fix up vlv/chv display irq setup
The vlv/chv display irq setup was a bit of mess after I ran out of steam
when working on it last. Fix it up so that we just have a _reset() and
_postinstall() hooks for the display irqs, and use those consistently.

v2: Clear out pipestat_irq_mask[] and PIPE_FIFO_UNDERRUN_STATUS in
    vlv_display_irq_reset() (Imre)

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
Link: http://patchwork.freedesktop.org/patch/msgid/1460476574-1921-1-git-send-email-ville.syrjala@linux.intel.com
2016-04-12 19:06:59 +03:00
Ville Syrjälä
93de68f940 drm/i915: Remove "VLV magic" from irq setup
No clue what this is supposed to achieve. I think it's been there since
the very beginning, so presumably some kind of kludge for very early
silicon. Let's just throw it out.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460382992-28728-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
2016-04-12 19:06:52 +03:00
Chris Wilson
12471ba87a drm/i915: Harden detection of missed interrupts
Only declare a missed interrupt if we find that the GPU is idle with
waiters and a hangcheck interval has passed in which no new user
interrupts have been raised.

v2: Clear the stuck interrupt marker between successful batches

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-3-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:29 +01:00
Chris Wilson
c04e0f3b4e drm/i915: Separate out the seqno-barrier from engine->get_seqno
In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.

v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).

Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]

v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
2016-04-09 12:09:05 +01:00
Chris Wilson
cffa781e59 drm/i915: Simplify check for idleness in hangcheck
Having fixed the tracking of the engine's last_submitted_seqno, we can
now rely on it for detecting when the engine is idle (and not have to
touch the requests pointer).

Testcase: igt/gem_exec_whisper
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-9-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2016-04-08 11:45:05 +01:00
Chris Wilson
7c90b7de73 drm/i915: Apply a mb between emitting the request and hangcheck
Seal the request and mark it as pending execution before we submit it to
hardware. We assume that the actual submission cannot fail (that
guarantee is provided by preallocating space in the request for the
submission). As we may inspect this state without holding any locks
during hangcheck we should apply a barrier to ensure that we do
not see a more recent value in the HWS than we are tracking.

Based on a patch by Mika Kuoppala.

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460010558-10705-8-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2016-04-08 11:44:48 +01:00
Joonas Lahtinen
2d1fe07340 drm/i915: Do not use {HAS_*, IS_*, INTEL_INFO}(dev_priv->dev)
dev_priv is what the macro works hard to extract, pass it directly.

> sed 's/\([A-Z].*(dev_priv\)->dev)/\1)/g'

v2:
- Include all wrapper macros too (Chris)

v3:
- Include sed cmdline (Chris)

v4:
- Break long line
- Rebase

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460016485-8089-1-git-send-email-joonas.lahtinen@linux.intel.com
2016-04-07 14:50:26 +03:00
Shubhangi Shrivastava
d252bf68b7 drm/i915: Set invert bit for hpd based on VBT
This patch sets the invert bit for hpd detection for each port
based on VBT configuration. Since each AOB can be designed to
depend on invert bit or not, it is expected if an AOB requires
invert bit, the user will set respective bit in VBT.

v2: Separated VBT parsing from the rest of the logic. (Jani)

v3: Moved setting invert bit logic to bxt_hpd_irq_setup()
    and changed its logic to avoid looping twice. (Ville)

v4: Changed the logic to mask out the bits first and then
    set them to remove need of temporary variable. (Ville)

v5: Moved defines to existing set of defines for the register
    and added required breaks. (Ville)

Signed-off-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Jani: fixed some checkpatch noise, added kernel-doc.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459420907-11383-2-git-send-email-shubhangi.shrivastava@intel.com
2016-04-06 14:22:48 +03:00
Tvrtko Ursulin
27af5eea54 drm/i915: Move execlists irq handler to a bottom half
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.

Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:

 NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
 Modules linked in: [redacted for brevity]
 CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G     U       L  4.5.0-160321+ #183
 Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
 Workqueue: i915 gen6_pm_rps_work [i915]
 task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
 RIP: 0010:[<ffffffff8104a3c2>]  [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
 RSP: 0000:ffff88014f403f38  EFLAGS: 00000206
 RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
 RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
 RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
 R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
 R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
 FS:  0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Stack:
  042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
  0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
  0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
 Call Trace:
  <IRQ>
  [<ffffffff8104a716>] irq_exit+0x86/0x90
  [<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
  [<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
  <EOI>
  [<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
  [<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
  [<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
  [<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
  [<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
  [<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
  [<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
  [<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
  [<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
  [<ffffffff8105ab29>] process_one_work+0x139/0x350
  [<ffffffff8105b186>] worker_thread+0x126/0x490
  [<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
  [<ffffffff8105fa64>] kthread+0xc4/0xe0
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
  [<ffffffff814f351f>] ret_from_fork+0x3f/0x70
  [<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170

I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.

When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.

By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.

Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:

gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us

This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:

gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us

Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.

Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)

More interesting scenario with regards to throughput is
"gem_latency -n 100" which  shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.

I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.

v2:
   * execlists_lock should be taken as spin_lock_bh when
     queuing work from userspace now. (Chris Wilson)
   * uncore.lock must be taken with spin_lock_irq when
     submitting requests since that now runs from either
     softirq or process context.

v3:
   * Expanded commit message with more testing data;
   * converted missed locking sites to _bh;
   * added execlist_lock comment. (Chris Wilson)

v4:
   * Mention dispatch parallelism in commit. (Chris Wilson)
   * Do not hold uncore.lock over MMIO reads since the block
     is already serialised per-engine via the tasklet itself.
     (Chris Wilson)
   * intel_lrc_irq_handler should be static. (Chris Wilson)
   * Cancel/sync the tasklet on GPU reset. (Chris Wilson)
   * Document and WARN that tasklet cannot be active/pending
     on engine cleanup. (Chris Wilson/Imre Deak)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
2016-04-04 14:08:52 +01:00