Thanks to Ville for suggesting this as a potential solution to pipe
underruns on Skylake.
On Skylake all of the registers for configuring planes, including the
registers for configuring their watermarks, are double buffered. New
values written to them won't take effect until said registers are
"armed", which is done by writing to the PLANE_SURF (or in the case of
cursor planes, the CURBASE register) register.
With this in mind, up until now we've been updating watermarks on skl
like this:
non-modeset {
- calculate (during atomic check phase)
- finish_atomic_commit:
- intel_pre_plane_update:
- intel_update_watermarks()
- {vblank happens; new watermarks + old plane values => underrun }
- drm_atomic_helper_commit_planes_on_crtc:
- start vblank evasion
- write new plane registers
- end vblank evasion
}
or
modeset {
- calculate (during atomic check phase)
- finish_atomic_commit:
- crtc_enable:
- intel_update_watermarks()
- {vblank happens; new watermarks + old plane values => underrun }
- drm_atomic_helper_commit_planes_on_crtc:
- start vblank evasion
- write new plane registers
- end vblank evasion
}
Now we update watermarks atomically like this:
non-modeset {
- calculate (during atomic check phase)
- finish_atomic_commit:
- intel_pre_plane_update:
- intel_update_watermarks() (wm values aren't written yet)
- drm_atomic_helper_commit_planes_on_crtc:
- start vblank evasion
- write new plane registers
- write new wm values
- end vblank evasion
}
modeset {
- calculate (during atomic check phase)
- finish_atomic_commit:
- crtc_enable:
- intel_update_watermarks() (actual wm values aren't written
yet)
- drm_atomic_helper_commit_planes_on_crtc:
- start vblank evasion
- write new plane registers
- write new wm values
- end vblank evasion
}
So this patch moves all of the watermark writes into the right place;
inside of the vblank evasion where we update all of the registers for
each plane. While this patch doesn't fix everything, it does allow us to
update the watermark values in the way the hardware expects us to.
Changes since original patch series:
- Remove mutex_lock/mutex_unlock since they don't do anything and we're
not touching global state
- Move skl_write_cursor_wm/skl_write_plane_wm functions into
intel_pm.c, make externally visible
- Add skl_write_plane_wm calls to skl_update_plane
- Fix conditional for for loop in skl_write_plane_wm (level < max_level
should be level <= max_level)
- Make diagram in commit more accurate to what's actually happening
- Add Fixes:
Changes since v1:
- Use IS_GEN9() instead of IS_SKYLAKE() since these fixes apply to more
then just Skylake
- Update description to make it clear this patch doesn't fix everything
- Check if pipes were actually changed before writing watermarks
Changes since v2:
- Write PIPE_WM_LINETIME during vblank evasion
Changes since v3:
- Rebase against new SAGV patch changes
Changes since v4:
- Add a parameter to choose what skl_wm_values struct to use when
writing new plane watermarks
Changes since v5:
- Remove cursor ddb entry write in skl_write_cursor_wm(), defer until
patch 6
- Write WM_LINETIME in intel_begin_crtc_commit()
Changes since v6:
- Remove redundant dirty_pipes check in skl_write_plane_wm (we check
this in all places where we call this function, and it was supposed
to have been removed earlier anyway)
- In i9xx_update_cursor(), use dev_priv->info.gen >= 9 instead of
IS_GEN9(dev_priv). We do this everywhere else and I'd imagine this
needs to be done for gen10 as well
Changes since v7:
- Fix rebase fail (unused variable obj)
- Make struct skl_wm_values *wm const
- Fix indenting
- Use INTEL_GEN() instead of dev_priv->info.gen
Changes since v8:
- Don't forget calls to skl_write_plane_wm() when disabling planes
- Use INTEL_GEN(), not INTEL_INFO()->gen in intel_begin_crtc_commit()
Fixes: 2d41c0b59a ("drm/i915/skl: SKL Watermark Computation")
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471884608-10671-1-git-send-email-cpaul@redhat.com
Link: http://patchwork.freedesktop.org/patch/msgid/1471884608-10671-1-git-send-email-cpaul@redhat.com
Now that conn_state is passed in as argument to compute_config, it's
guaranteed that there is a connector for the argument. The code that
looks for the connector is now dead, and completely unused. Delete it.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-8-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is mostly code churn, with exception of a few places:
- intel_display.c has changes in intel_sanitize_encoder
- intel_ddi.c has intel_ddi_fdi_disable calling intel_ddi_post_disable,
and required a function change. Also affects intel_display.c
- intel_dp_mst.c passes a NULL crtc_state and conn_state to
intel_ddi_post_disable for shutting down the real encoder.
If we would pass conn_state, then conn_state->connector !=
intel_dig_port->connector and conn_state->best_encoder !=
to_intel_encoder(intel_dig_port).
We also shouldn't pass crtc_state, because in that case the
disabling sequence may potentially be different depending on
which crtc is disabled last. Nice way to introduce bugs.
No other functional changes are done, diff stat is already huge.
Each encoder type will need to be fixed to use the atomic states
separately.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-6-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This cleans up another possible use of the connector list,
encoder->crtc is legacy state and should not be used.
With the atomic state as argument it's easy to find the encoder from
the connector it belongs to.
intel_opregion_notify_encoder is a noop for !HAS_DDI, so it's harmless
to unconditionally include it in encoder enable/disable.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-5-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is required for supporting nonblocking modesets. Iterating over
the connector list will no longer be allowed when we don't hold
connection_mutex, so we have to use the atomic state.
Fix disable_noatomic by populating the minimal state required to
disable a connector.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470755054-32699-3-git-send-email-maarten.lankhorst@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we do many register reads within a very short period of time, hold
the GMBUS powerwell from start to finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819164503.17845-1-chris@chris-wilson.co.uk
Reviewed-by: David Weinehall <david.weinehall@linux.intel.com>
If the engine isn't being retired (worker starvation?) then it is
possible for us to repeatedly observe that between consecutive
hangchecks the seqno on the ring to be the same and there remain
unretired requests. Ignore these completely and only regard the engine
as busy for the purpose of hang detection (not stall detection) if there
are outstanding breadcrumbs.
In recent history we have looked at using both the request and seqno as
indication of activity on the engine, but that was reduced to just
inspecting seqno in commit cffa781e59 ("drm/i915: Simplify check for
idleness in hangcheck"). However, in commit dcff85c844 ("drm/i915:
Enable i915_gem_wait_for_idle() without holding struct_mutex"), I made
the decision to use the new common lockless function, under the
assumption that request retirement was more frequent than hangcheck and
so we would not have a stuck busy check. The flaw there was in
forgetting that we accumulate the hang score, and so successive checks
seeing a stuck request, albeit with the GPU advancing elsewhere and so
not necessary the same stuck request, would eventually trigger the hang.
Fixes: dcff85c844 ("drm/i915: Enable i915_gem_wait_for_idle()...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160820145408.32180-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
As we never need to directly access the pages we allocate for scratch and
the pagetables, and always remap them into the GTT through the dma
remapper, we do not need to limit the allocations to lowmem i.e. we can
pass in the __GFP_HIGHMEM flag to the page allocation.
For backwards compatibility, e.g. certain old GPUs not liking highmem
for certain functions that may be accidentally mapped to the scratch
page by userspace, keep the GMCH probe as only allocating from DMA32.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-3-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
As the scratch page is no longer shared between all VM, and each has
their own, forgo the small allocation and simply embed the scratch page
struct into the i915_address_space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-2-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@intel.com>
Just like with sysfs, we do some major overhaul.
Pass dev_priv instead of dev to all feature macros (IS_, HAS_,
INTEL_, etc.). This has the side effect that a bunch of functions
now get dev_priv passed instead of dev.
All calls to INTEL_INFO()->gen have been replaced with
INTEL_GEN().
We want access to to_i915(node->minor->dev) in a lot of places,
so add the node_to_i915() helper to accommodate for this.
Finally, we have quite a few cases where we get a void * pointer,
and need to cast it to drm_device *, only to run to_i915() on it.
Add cast_to_i915() to do this.
v2: Don't introduce extra dev (Chris)
v3: Make pipe_crc_info have a pointer to drm_i915_private instead of
drm_device. This saves a bit of space, since we never use
drm_device anywhere in these functions.
Also some minor fixup that I missed in the previous version.
v4: Changed the code a bit so that dev_priv is passed directly
to various functions, thus removing the need for the
cast_to_i915() helper. Also did some additional cleanup.
v5: Additional cleanup of newly introduced changes.
v6: Rebase again because of conflict.
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822105931.pcbe2lpsgzckzboa@boom
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
In an effort to simplify things for a future push of dev_priv instead
of dev wherever possible, always take pdev via dev_priv where
feasible, eliminating the direct access from dev. Right now this
only eliminates a few cases of dev, but it also obviates that we pass
dev into a lot of functions where dev_priv would be the more obvious
choice.
v2: Fixed one more place missing in the previous patch set
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-5-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Various cleanup for i915_sysfs.c; we now use dev_priv whenever
possible. The kdev_to_drm_minor() helper function has been
replaced by one that converts from struct device *
to struct drm_i915_private *.
We already have a seemingly identical helper (kdev_to_i915())
in i915_drv.h. But that one cannot be used here.
Unlike the version in i915_drv.h, this helper
reaches i915 through drm_minor.
v2: Rename kdev_to_i915_dm() to kdev_minor_to_i915() (Chris)
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-4-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
We currently have a mix of struct device *device, struct device *kdev,
and struct device *dev (the latter forcing us to refer to
struct drm_device as something else than the normal dev).
To simplify things, always use kdev when referring to struct device.
v2: Replace the dev_to_drm_minor() macro with the inline function
kdev_to_drm_minor().
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-3-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If we're enabling a pipe, we'll need to modify the watermarks on all
active planes. Since those planes won't be added to the state on
their own, we need to add them ourselves.
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-6-git-send-email-cpaul@redhat.com
When we write watermark values to the hardware, those values are stored
in dev_priv->wm.skl_hw. However with recent watermark changes, the
results structure we're copying from only contains valid watermark and
DDB values for the pipes that are actually changing; the values for
other pipes remain 0. Thus a blind copy of the entire skl_wm_values
structure will clobber the values for unchanged pipes...we need to be
more selective and only copy over the values for the changing pipes.
This mistake was hidden until recently due to another bug that caused us
to erroneously re-calculate watermarks for all active pipes rather than
changing pipes. Only when that bug was fixed was the impact of this bug
discovered (e.g., modesets failing with "Requested display configuration
exceeds system watermark limitations" messages and leaving watermarks
non-functional, even ones initiated by intel_fbdev_restore_mode).
Changes since v1:
- Add a function for copying a pipe's wm values
(skl_copy_wm_for_pipe()) so we can reuse this later
Fixes: 734fa01f3a ("drm/i915/gen9: Calculate watermarks during atomic 'check' (v2)")
Fixes: 9b61302274 ("drm/i915/gen9: Re-allocate DDB only for changed pipes")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lyude <cpaul@redhat.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-4-git-send-email-cpaul@redhat.com
Since the watermark calculations for Skylake are still broken, we're apt
to hitting underruns very easily under multi-monitor configurations.
While it would be lovely if this was fixed, it's not. Another problem
that's been coming from this however, is the mysterious issue of
underruns causing full system hangs. An easy way to reproduce this with
a skylake system:
- Get a laptop with a skylake GPU, and hook up two external monitors to
it
- Move the cursor from the built-in LCD to one of the external displays
as quickly as you can
- You'll get a few pipe underruns, and eventually the entire system will
just freeze.
After doing a lot of investigation and reading through the bspec, I
found the existence of the SAGV, which is responsible for adjusting the
system agent voltage and clock frequencies depending on how much power
we need. According to the bspec:
"The display engine access to system memory is blocked during the
adjustment time. SAGV defaults to enabled. Software must use the
GT-driver pcode mailbox to disable SAGV when the display engine is not
able to tolerate the blocking time."
The rest of the bspec goes on to explain that software can simply leave
the SAGV enabled, and disable it when we use interlaced pipes/have more
then one pipe active.
Sure enough, with this patchset the system hangs resulting from pipe
underruns on Skylake have completely vanished on my T460s. Additionally,
the bspec mentions turning off the SAGV with more then one pipe enabled
as a workaround for display underruns. While this patch doesn't entirely
fix that, it looks like it does improve the situation a little bit so
it's likely this is going to be required to make watermarks on Skylake
fully functional.
This will still need additional work in the future: we shouldn't be
enabling the SAGV if any of the currently enabled planes can't enable WM
levels that introduce latencies >= 30 µs.
Changes since v11:
- Add skl_can_enable_sagv()
- Make sure we don't enable SAGV when not all planes can enable
watermarks >= the SAGV engine block time. I was originally going to
save this for later, but I recently managed to run into a machine
that was having problems with a single pipe configuration + SAGV.
- Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
- Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
- Move printks outside of mutexes
- Don't print error messages twice
Changes since v10:
- Apparently sandybridge_pcode_read actually writes values and reads
them back, despite it's misleading function name. This means we've
been doing this mostly wrong and have been writing garbage to the
SAGV control. Because of this, we no longer attempt to read the SAGV
status during initialization (since there are no helpers for this).
- mlankhorst noticed that this patch was breaking on some very early
pre-release Skylake machines, which apparently don't allow you to
disable the SAGV. To prevent machines from failing tests due to SAGV
errors, if the first time we try to control the SAGV results in the
mailbox indicating an invalid command, we just disable future attempts
to control the SAGV state by setting dev_priv->skl_sagv_status to
I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
- Move mutex_unlock() a little higher in skl_enable_sagv(). This
doesn't actually fix anything, but lets us release the lock a little
sooner since we're finished with it.
Changes since v9:
- Only enable/disable sagv on Skylake
Changes since v8:
- Add intel_state->modeset guard to the conditional for
skl_enable_sagv()
Changes since v7:
- Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
all we use it for anyway)
- Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
- Fix a styling error that snuck past me
Changes since v6:
- Protect skl_enable_sagv() with intel_state->modeset conditional in
intel_atomic_commit_tail()
Changes since v5:
- Don't use is_power_of_2. Makes things confusing
- Don't use the old state to figure out whether or not to
enable/disable the sagv, use the new one
- Split the loop in skl_disable_sagv into it's own function
- Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
Changes since v4:
- Use is_power_of_2 against active_crtcs to check whether we have > 1
pipe enabled
- Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
enabled
- Call skl_sagv_enable/disable() from pre/post-plane updates
Changes since v3:
- Use time_before() to compare timeout to jiffies
Changes since v2:
- Really apply minor style nitpicks to patch this time
Changes since v1:
- Added comments about this probably being one of the requirements to
fixing Skylake's watermark issues
- Minor style nitpicks from Matt Roper
- Disable these functions on Broxton, since it doesn't have an SAGV
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com
[mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
In order to add proper support for the SAGV, we need to be able to know
what the cause of a failure to change the SAGV through the pcode mailbox
was. The reasoning for this is that some very early pre-release Skylake
machines don't actually allow you to control the SAGV on them, and
indicate an invalid mailbox command was sent.
This also might come in handy in the future for debugging.
Changes since v1:
- Add functions for interpreting gen6 mailbox error codes along with
gen7+ error codes, and actually interpret those codes properly
- Renamed patch to reflect new behavior
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-2-git-send-email-cpaul@redhat.com
[mlankhorst: -ENOSYS -> -ENXIO for checkpatch]
intel_fbc_pre_update() depends upon the new state being already pinned
in place in the Global GTT (primarily for both fencing which wants both
an offset and a fence register, if assigned). This requires the call to
intel_fbc_pre_update() be after intel_pin_and_fence_fb() - but commit
e8216e502a ("drm/i915/fbc: call intel_fbc_pre_update earlier during
page flips") moved the code way too much up in its attempt to call it
before the page flip.
v2 (from Paulo):
- Point the original bad commit.
- Add a comment to maybe prevent further regressions.
Fixes: e8216e502a ("drm/i915/fbc: call intel_fbc_pre_update earlier...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: drm-intel-fixes@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471462904-842-1-git-send-email-paulo.r.zanoni@intel.com
Cc: stable@vger.kernel.org
This issue here is (I think) purely theoretical, since a compiler
would need to be especially foolish to recompute the value of
i915_gem_request_completed right after it was already used. Hence the
additional barrier() is also not really a restriction.
But I believe this to be at least permissible, and since our rcu
trickery is a beast it's worth to annotate all the corner cases.
Chris proposed to instead just wrap a READ_ONCE around
request->fence.seqno in i915_gem_request_completed. But that has a
measurable impact on code size, and everywhere we hold a full
reference to the underlying request it's also not needed. And
personally I'd like to have just enough barriers and locking needed
for correctness, but not more - it makes it much easier in the future
to understand what's going on.
Since the busy ioctl has now fully embraced it's races there's no
point annotating it there too. We really only need it in
active_get_rcu, since that function _must_ deliver a correct snapshot
of the active fences (and not chase something else).
v2: Polish the comment a bit more (Chris).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471856122-466-1-git-send-email-daniel.vetter@ffwll.ch
Since by design, if not entirely by practice, nothing is allowed to
access the scratch page we use to background fill the VM, then we do not
need to ensure that it is coherent between the CPU and GPU.
set_pages_uc() does a stop_machine() after changing the PAT, and that
significantly impacts upon context creation throughput.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822074431.26872-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The passed in flag that distinguishes i915_gem_pin_display from
i915_gem_gtt is from node->info_ent->data not the data function
parameter.
Fixes: 6da8482936 ("drm/i915: Focus debugfs/i915_gem_pinned to show...")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819115625.17688-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Very old numbers indicate this is a 66% improvement when remapping the
entire object for fence contention - due to the elimination of
track_pfn_insert and its strcmp.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Testcase: igt/gem_fence_upload/performance
Testcase: igt/gem_mmap_gtt
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-6-chris@chris-wilson.co.uk
As io_mapping.h now always allocates the struct, we can avoid that
allocation and extra pointer dance by embedding the struct inside
drm_i915_private
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-5-chris@chris-wilson.co.uk
Currently, we only allocate a structure to hold metadata if we need to
allocate an ioremap for every access, such as on x86-32. However, it
would be useful to store basic information about the io-mapping, such as
its page protection, on all platforms.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: linux-mm@kvack.org
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-4-chris@chris-wilson.co.uk
Only fbc1 is tied to using a fence. Later iterations of fbc are more
flexible and allow operation on unfenced frontbuffers.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-3-chris@chris-wilson.co.uk
If the frontbuffer doesn't have an associated fence, it will have a
fence reg of -1. If we attempt to OR in this register into the FBC
control register we end up setting all control bits, oops!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: "Zanoni, Paulo R" <paulo.r.zanoni@intel.com>
Reviwed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160819155428.1670-2-chris@chris-wilson.co.uk
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
A significant proportion of the cmdparsing time for some batches is the
cost to find the register in the mmiotable. We ensure that those tables
are in ascending order such that we could do a binary search if it was
ever merited. It is.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-38-chris@chris-wilson.co.uk
On the blitter (and in test code), we see long sequences of repeated
commands, e.g. XY_PIXEL_BLT, XY_SCANLINE_BLT, or XY_SRC_COPY. For these,
we can skip the hashtable lookup by remembering the previous command
descriptor and doing a straightforward compare of the command header.
The corollary is that we need to do one extra comparison before lookup
up new commands.
v2: Less magic mask (ok, it is still magic, but now you cannot see!)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-36-chris@chris-wilson.co.uk
The existing code's hashfunction is very suboptimal (most 3D commands
use the same bucket degrading the hash to a long list). The code even
acknowledge that the issue was known and the fix simple:
/*
* If we attempt to generate a perfect hash, we should be able to look at bits
* 31:29 of a command from a batch buffer and use the full mask for that
* client. The existing INSTR_CLIENT_MASK/SHIFT defines can be used for this.
*/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-35-chris@chris-wilson.co.uk
For simplicity, we want to continue using a contiguous mapping of the
command buffer, but we can reduce the number of vmappings we hold by
switching over to a page-by-page copy from the user batch buffer to the
shadow. The cost for saving one linear mapping is about 5% in trivial
workloads - which is more or less the overhead in calling kmap_atomic().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-34-chris@chris-wilson.co.uk
The single largest factor in the overhead of parsing the commands is the
setup of the virtual mapping to provide a continuous block for the batch
buffer. If we keep those vmappings around (against the better judgement
of mm/vmalloc.c, which we offset by handwaving and looking suggestively
at the shrinker) we can dramatically improve the performance of the
parser for small batches (such as media workloads). Furthermore, we can
use the prepare shmem read/write functions to determine how best we
need to clflush the range (rather than every page of the object).
The impact of caching both src/dst vmaps is +80% on ivb and +140% on byt
for the throughput on small batches. (Caching just the dst vmap and
iterating over the src, doing a page by page copy is roughly 5% slower
on both platforms. That may be an acceptable trade-off to eliminate one
cached vmapping, and we may be able to reduce the per-page copying overhead
further.) For *this* simple test case, the cmdparser is now within a
factor of 2 of ideal performance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-33-chris@chris-wilson.co.uk