Seems that only skl needs to have SAGV turned off
for multipipe scenarios, so lets do it this way.
If anything blows up - we can always revert this patch.
v2: Changed if condition to look better (Ville).
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
[vsyrjala: wrapped long line to appease checkpatch]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513093816.11466-4-stanislav.lisovskiy@intel.com
Introduce platform dependent SAGV checking in
combination with bandwidth state pipe SAGV mask.
This is preparation to adding TGL support, which
requires different way of SAGV checking.
v2, v3, v4, v5, v6: Fix rebase conflict
v7: - Nuke icl specific function, use skl
for icl as well, gen specific active_pipes
check to be added in the next patch(Ville)
v8: - Use more generic intel_crtc_can_enable_sagv
for checking(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513093816.11466-3-stanislav.lisovskiy@intel.com
For future Gen12 SAGV implementation we need to
seemlessly alter wm levels calculated, depending
on whether we are allowed to enable SAGV or not.
So this accessor will give additional flexibility
to do that.
Currently this accessor is still simply working
as "pass-through" function. This will be changed
in next coming patches from this series.
v2: - plane_id -> plane->id(Ville Syrjälä)
- Moved wm_level var to have more local scope
(Ville Syrjälä)
- Renamed yuv to color_plane(Ville Syrjälä) in
skl_plane_wm_level
v3: - plane->id -> plane_id(this time for real, Ville Syrjälä)
- Changed colorplane id type from boolean to int as index
(Ville Syrjälä)
- Moved crtc_state param so that it is first now
(Ville Syrjälä)
- Moved wm_level declaration to tigher scope in
skl_write_plane_wm(Ville Syrjälä)
v4: - Started to use enum values for color plane
- Do sizeof for a type what we are memset'ing
- Zero out wm_uv as well(Ville Syrjälä)
v5: - Fixed rebase conflict caused by COLOR_PLANE_*
enum removal
v6: - Do not use skl_plane_wm_level accessor in skl_allocate_pipe_ddb
v7: - Get rid of wm_uv, which is not used in skl_plane_write_wm(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513093816.11466-2-stanislav.lisovskiy@intel.com
Upon gt resume, we first poison then sanitize the engine. However, our
testing shows that gen9 will very rarely retain the poisoned value from
the HWSP mappings of the execlists status registers. This suggests that
it is reading back from the HWSP, so rejig the register reset.
v2: Maybe RING_CONTEXT_STATUS_PTR is write masked. It is.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1812
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200513100120.11617-1-chris@chris-wilson.co.uk
i915_gem_evict_something() is charged with finding a slot within the GTT
that we may reuse. Since our goal is not to stall, we first look for a
slot that only overlaps idle vma. To this end, on the first pass we move
any active vma to the end of the search list. However, we only stopped
moving active vma after we see the first active vma twice. If during the
search, that first active vma completed, we would not notice and keep on
extending the search list.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1746
Fixes: 2850748ef8 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Fixes: b1e3177bd1 ("drm/i915: Coordinate i915_active with its own mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200509115217.26853-1-chris@chris-wilson.co.uk
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.
The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.
Fixes: c81471f5e9 ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
(cherry picked from commit 6b6cd2ebd8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We need to preserve fatal errors from fences that are being terminated
as we hook them up.
Fixes: ef46884975 ("drm/i915: Propagate fence errors")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk
(cherry picked from commit 24fe5f2ab2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This reverts commit 0b718ba1e8.
There are still some residual issues with asynchronous binding and
execution, but since commit 92581f9fb9 ("drm/i915: Immediately execute
the fenced work") we prefer not to use asynchronous binds, and the
remaining issues do not seem restricted to Cherryview [at least the ones
seen over a few dozen CI runs, less frequent issues are sure to be
discovered!]
These issues seem to be mitigated, if not eliminated entirely, by the
previous commit 84eac0c659 ("drm/i915/gt: Force pte cacheline to main
memory").
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200510102431.21959-3-chris@chris-wilson.co.uk
We have problems of tgl not seeing a valid pte entry when iommu is
enabled. Add heavy handed flushing of entry modification by flushing the
cpu, cacheline and then wcb. This forces the pte out to main memory past
this point regarless of promises of coherency.
This is an evolution of an experimental patch from Chris Wilson of adding
wmb for coherent partners, by adding a clflush to force the cache->memory
step.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1840
Testcase: igt/gem_exec_fence/parallel
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511160803.15407-1-mika.kuoppala@linux.intel.com
Be consistent, and even when we know we had used a WC, flush the mapped
object after writing into it. The flush understands the mapping type and
will only clflush if !I915_MAP_WC, but will always insert a wmb [sfence]
so that we can be sure that all writes are visible.
v2: Add the unconditional wmb so we are know that we always flush the
writes to memory/HW at that point.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511141304.599-1-chris@chris-wilson.co.uk
Be consistent and ensure that we always emit the asynchronous waits
prior to issuing instructions that use the address. This ensures that if
we do emit GPU commands to do the await, they are before our use!
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200510102431.21959-1-chris@chris-wilson.co.uk
Get rid of several platform specific variants of
intel_digital_port_connected() and just use the ISR bits we've
stashed away.
v2: Duplicate stuff to avoid exposing platform specific
functions across files (Jani)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311155422.3043-4-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Instead of constnantly having to figure out which hpd status bit
array to use let's store them under dev_priv.
Should perhaps take this further and stash even more stuff to
make the hpd handling more abstract yet.
v2: Remeber cnp (Imre)
Add MISSING_CASE() for unknown PCHs (Imre)
Cc: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507114808.6150-1-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
Let's get rid of the platform if ladders in
intel_digital_port_connected() and make it a vfunc. Now the if
ladders are at the encoder initialization which makes them a bit
less convoluted.
v2: Add forward decl for intel_encoder in intel_tc.h
v3: Duplicate stuff to avoid exposing platform specific
functions across files (Jani)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200311155422.3043-2-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com>
GLK wants the +1 adjustement for the "blocks per line" value
for x-tile/y-tile, just like cnl+.
Also the x-tile and linear cases are almost identical. The only
difference is this +1 which is always done for glk+, and only
done for linear on skl/bxt. Let's unify it to a single branch
with a special case for the +1, just like we do for y-tile.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430125822.21985-1-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Just tidy up the return handling for completed dma-fences. While it may
return errors for invalid fence, we already know that we have a good
fence and the only error will be an already signaled fence.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511075722.13483-5-chris@chris-wilson.co.uk
Commit fb5970da1b ("drm/i915/gt: Use the kernel_context to measure the
breadcrumb size") removed the last external user for intel_timeline_init.
Mark it static.
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200511102201.9275-1-mika.kuoppala@linux.intel.com
As i915 won't allocate extra PDP for current default PML4 table,
so for 3-level ppgtt guest, we would hit kernel pointer access
failure on extra PDP pointers. So this trys to bypass that now.
It won't impact real shadow PPGTT setup, so guest context still
works.
This is verified on 4.15 guest kernel with i915.enable_ppgtt=1
to force on old aliasing ppgtt behavior.
Fixes: 4f15665ccb ("drm/i915: Add ppgtt to GVT GEM context")
Reviewed-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200506095918.124913-1-zhenyuw@linux.intel.com
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507185408.GA14561@embeddedor
Expose the hardcoded timeout for unsignaled foreign fences as a Kconfig
option, primarily to allow brave systems to disable the timeout and
solely rely on correct signaling.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200509105021.12542-1-chris@chris-wilson.co.uk
Init value of some display vregs rea inherited from host pregs. When
host display in different status, i.e. all monitors unpluged, different
display configurations, etc., GVT virtual display setup don't consistent
thus may lead to guest driver consider display goes malfunctional.
The added init vreg values are based on PRMs and fixed by calcuation
from current configuration (only PIPE_A) and the virtual EDID.
Fixes: 04d348ae3f ("drm/i915/gvt: vGPU display virtualization")
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Colin Xu <colin.xu@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200508060506.216250-1-colin.xu@intel.com
The downside of using semaphores is that we lose metadata passing
along the signaling chain. This is particularly nasty when we
need to pass along a fatal error such as EFAULT or EDEADLK. For
fatal errors we want to scrub the request before it is executed,
which means that we cannot preload the request onto HW and have
it wait upon a semaphore.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-3-chris@chris-wilson.co.uk
To allow faster engine to engine synchronization, peel the layer of
dma-fence-chain to expose potential i915 fences so that the
i915_request code can emit HW semaphore wait/signal operations in the
ring which is faster than waking up the host to submit unblocked
workloads after interrupt notification.
This is similar to the peeling we do for e.g. dma_fence_array.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508185448.29709-1-chris@chris-wilson.co.uk
As a means for a small code consolidation, but primarily to start
thinking more carefully about internal-vs-external linkage, pull the
pair of i915_sw_fence_await_dma_fence() calls into a common routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-2-chris@chris-wilson.co.uk
While we ordinarily do not skip submit-fences due to the accompanying
hook that we want to callback on execution, a submit-fence on the same
timeline is meaningless.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200508092933.738-1-chris@chris-wilson.co.uk
UAPI Changes:
Cross-subsystem Changes:
* MAINTAINERS: restore alphabetical order; update cirrus driver
* Dcomuentation: document visionix, chronteli, ite vendor prefices; update
documentation for Chrontel CH7033, IT6505, IVO, BOE,
Panasonic, Chunghwa, AUO bindings; convert dw_mipi_dsi.txt
to YAML; remove todo item for drm_display_mode.hsync removal;
Core Changes:
* drm: add devm_drm_dev_alloc() for managed allocations of drm_device;
use DRM_MODESET_LOCK_ALL_*() in mode-object code; remove
drm_display_mode.hsync; small cleanups of unused variables,
compiler warnings and static functions
* drm/client: dual-lincensing: GPL-2.0 or MIT
* drm/mm: optimize tree searches in rb_hole_addr()
Driver Changes:
* drm/{many}: use devm_drm_dev_alloc(); don't use drm_device.dev_private
* drm/ast: don't double-assign to drm_crtc_funcs.set_config; drop
drm_connector_register()
* drm/bochs: drop drm_connector_register()
* drm/bridge: add support for Chrontel ch7033; fix stack usage with
old gccs; return error pointer in drm_panel_bridge_add()
* drm/cirrus: Move to tiny
* drm/dp_mst: don't use 2nd sideband tx slot; revert "Remove single tx
msg restriction"
* drm/lima: support runtime PM;
* drm/meson: limit modes wrt chipset
* drm/panel: add support for Visionox rm69299; fix clock on
boe-tv101wum-n16; fix panel type for AUO G101EVN10;
add support for Ivo M133NFW4 R0; add support for BOE
NV133FHM-N61; add support for AUO G121EAN01.4, G156XTN01.0,
G190EAN01
* drm/pl111: improve vexpress init; fix module auto-loading
* drm/stm: read number of endpoints from device tree
* drm/vboxvideo: use managed PCI functions; drop DRM_MTRR_WC
* drm/vkms: fix use-after-free in vkms_gem_create(); enable cursor
support by default
* fbdev: use boolean values in several drivers
* fbdev/controlfb: fix COMPILE_TEST
* fbdev/w100fb: fix double-free bug
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEchf7rIzpz2NEoWjlaA3BHVMLeiMFAl6ztuYACgkQaA3BHVML
eiMWaAgAvYpXvBqBfE/5jOTa1zWQUX9VMbmMDUqSAgec5r/HVHjmlbsDWkLff2XG
rH7T2shMNJwluoCleHfRFYUfyUWMmXoMgri1elnf4EgddmekLZ6vNQjtAYMRFUfL
7VegcFw5D5Z28uxK5YGnaaSQob34LvD8UTgvnCVfDQzCjcEzGYi1d9XELm7kTHje
dF2FOWPeF8FTqP0XHM8RrHNXW/pUPy9qErpmb2bWYBQp7ebkI2Q4p6IZIpRQm757
bnKPIdMEcy4BWMk1xtzjEYEb4bGSZhozJUWKFwpm/AHNvTrQCt9hn0GSVrXiqBcR
UKb42BRxEVR29C/4n10pUcnnQmmi+Q==
=yzZ9
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2020-05-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.8:
UAPI Changes:
Cross-subsystem Changes:
* MAINTAINERS: restore alphabetical order; update cirrus driver
* Dcomuentation: document visionix, chronteli, ite vendor prefices; update
documentation for Chrontel CH7033, IT6505, IVO, BOE,
Panasonic, Chunghwa, AUO bindings; convert dw_mipi_dsi.txt
to YAML; remove todo item for drm_display_mode.hsync removal;
Core Changes:
* drm: add devm_drm_dev_alloc() for managed allocations of drm_device;
use DRM_MODESET_LOCK_ALL_*() in mode-object code; remove
drm_display_mode.hsync; small cleanups of unused variables,
compiler warnings and static functions
* drm/client: dual-lincensing: GPL-2.0 or MIT
* drm/mm: optimize tree searches in rb_hole_addr()
Driver Changes:
* drm/{many}: use devm_drm_dev_alloc(); don't use drm_device.dev_private
* drm/ast: don't double-assign to drm_crtc_funcs.set_config; drop
drm_connector_register()
* drm/bochs: drop drm_connector_register()
* drm/bridge: add support for Chrontel ch7033; fix stack usage with
old gccs; return error pointer in drm_panel_bridge_add()
* drm/cirrus: Move to tiny
* drm/dp_mst: don't use 2nd sideband tx slot; revert "Remove single tx
msg restriction"
* drm/lima: support runtime PM;
* drm/meson: limit modes wrt chipset
* drm/panel: add support for Visionox rm69299; fix clock on
boe-tv101wum-n16; fix panel type for AUO G101EVN10;
add support for Ivo M133NFW4 R0; add support for BOE
NV133FHM-N61; add support for AUO G121EAN01.4, G156XTN01.0,
G190EAN01
* drm/pl111: improve vexpress init; fix module auto-loading
* drm/stm: read number of endpoints from device tree
* drm/vboxvideo: use managed PCI functions; drop DRM_MTRR_WC
* drm/vkms: fix use-after-free in vkms_gem_create(); enable cursor
support by default
* fbdev: use boolean values in several drivers
* fbdev/controlfb: fix COMPILE_TEST
* fbdev/w100fb: fix double-free bug
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507072503.GA10979@linux-uq9g
The PPGTT in context image can be overridden by LRI cmd with another
PPGTT's pdps. In such case, the load mm is used instead of the one in
the context image. So we need to load its shadow mm in GVT and replace
ppgtt pointers in command.
This feature is used by guest IGD driver to share gfx VM between
different contexts. Verified by IGT "gem_ctx_clone" test.
v4:
- consolidate shadow mm handlers (Yan)
- fix cmd shadow mm pin error path
v3: (Zhenyu Wang)
- Cleanup PDP register offset check
- Add debug check for guest context ppgtt update
- Skip 3-level ppgtt guest handling code. The reason is that all
guests now use 4-level ppgtt table and the only left case for
3-level table is ancient aliasing ppgtt case. But those guest
kernel has no use of PPGTT LRI command. So 3-level ppgtt guest
for this feature becomes simply un-testable.
v2: (Zhenyu Wang)
- Change to list for handling possible multiple ppgtt table loads
in one submission. Make sure shadow mm is to replace for each one.
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200508031409.2562-1-zhenyuw@linux.intel.com
To let execlist.c only handle execlist handling and keep other
workload cleanup function in scheduler.c to align with other
workload specific handling there. This doesn't change current
code behavior.
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20200506094318.105604-1-zhenyuw@linux.intel.com
All engines, exception being blitter as it does not
care about the form, can access compressed surfaces.
So we need to add forced aux table invalidates
for those engines.
v2: virtual instance masking (Chris)
v3: bug on if not found (Chris)
References: d248b371f7 ("drm/i915/gen12: Invalidate aux table entries forcibly")
References bspec#43904, hsdes#1809175790
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507142045.8668-1-mika.kuoppala@linux.intel.com
Upon waiting a request (when asked), we gave that request a small
priority boost, not enough for it to cause preemption, but enough for it
to be scheduled next before all equals. We also used that bit to give
new clients a small priority boost, similar to FQ_CODEL, such that we
favoured short interactive tasks ahead of long running streams.
However, this is causing lots of complications with timeslicing where we
both want to honour the boost and yet ignore it. Those complications
cause unexpected user behaviour (tasks not being timesliced and run
concurrently as epxected), and the easiest way to resolve that is to
remove the boost. Hopefully, we can find a compromise again if we need
to, but in theory timeslicing itself and future more advanced schedulers
should give us the interactivity boost we seek.
Testcase: igt/gem_exec_schedule/lateslice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
We recorded the dependencies for WAIT_FOR_SUBMIT in order that we could
correctly perform priority inheritance from the parallel branches to the
common trunk. However, for the purpose of timeslicing and reset
handling, the dependency is weak -- as we the pair of requests are
allowed to run in parallel and not in strict succession.
The real significance though is that this allows us to rearrange
groups of WAIT_FOR_SUBMIT linked requests along the single engine, and
so can resolve user level inter-batch scheduling dependencies from user
semaphores.
Fixes: c81471f5e9 ("drm/i915: Copy across scheduler behaviour flags across submit fences")
Testcase: igt/gem_exec_fence/submit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v5.6+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507155109.8892-1-chris@chris-wilson.co.uk
Aux table invalidation can fail on update. So
next access may cause memory access to be into stale entry.
Proposed workaround is to invalidate entries between
all batchbuffers.
v2: correct register address (Yang)
v3: respect the order (Chris)
References bspec#43904, hsdes#1809175790
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>
Cc: Yang A Shi <yang.a.shi@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506165310.1239-1-mika.kuoppala@linux.intel.com
HDC pipeline flush is bit on the first dword of
the PIPE_CONTROL, not the second. Make it so.
v2: function naming (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506144734.29297-2-mika.kuoppala@linux.intel.com
The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.
However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.
Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk
(cherry picked from commit 5c4a53e3b1)
(cherry picked from commit 134711240307d5586ae8e828d2699db70a8b74f2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.
To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
(cherry picked from commit 2632f174a2)
(cherry picked from commit a4b70fcc587860f4b972f68217d8ebebe295ec15)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If we find ourselves waiting on a MI_SEMAPHORE_WAIT, either within the
user batch or in our own preamble, the engine raises a
GT_WAIT_ON_SEMAPHORE interrupt. We can unmask that interrupt and so
respond to a semaphore wait by yielding the timeslice, if we have
another context to yield to!
The only real complication is that the interrupt is only generated for
the start of the semaphore wait, and is asynchronous to our
process_csb() -- that is, we may not have registered the timeslice before
we see the interrupt. To ensure we don't miss a potential semaphore
blocking forward progress (e.g. selftests/live_timeslice_preempt) we mark
the interrupt and apply it to the next timeslice regardless of whether it
was active at the time.
v2: We use semaphores in preempt-to-busy, within the timeslicing
implementation itself! Ergo, when we do insert a preemption due to an
expired timeslice, the new context may start with the missed semaphore
flagged by the retired context and be yielded, ad infinitum. To avoid
this, read the context id at the time of the semaphore interrupt and
only yield if that context is still active.
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200407130811.17321-1-chris@chris-wilson.co.uk
(cherry picked from commit c4e8ba7390)
(cherry picked from commit cd60e4ac4738a6921592c4f7baf87f9a3499f0e2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We need to preserve fatal errors from fences that are being terminated
as we hook them up.
Fixes: ef46884975 ("drm/i915: Propagate fence errors")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200506162136.3325-1-chris@chris-wilson.co.uk
Unmask/enable AUX interrupts on all ports on TGL+. So far the interrupts
worked only on port A, which meant each transaction on other ports took
10ms.
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504075828.20348-1-imre.deak@intel.com
(cherry picked from commit 054318c7e3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We need to toggle a SDE chicken bit on and then off as the final
step when disabling interrupts in preparation for runtime suspend.
Bspec: 33450
Bspec: 8402
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501213701.371443-1-matthew.d.roper@intel.com
Reviewed-by: Bob Paauwe <bob.j.paauwe@intel.com>
As we only restore the default context state upon banning a context, we
only need enough of the state to run the ring and nothing more. That is
we only need our bare protocontext.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504180745.15645-1-chris@chris-wilson.co.uk
If we cannot trust the reset will flush out the CS event queue such that
process_csb() reports an accurate view of HW, we will need to search the
active and pending contexts to determine which was actually running at
the time we issued the reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200505084629.31365-1-chris@chris-wilson.co.uk
We need a new PCode request commands and reply codes
to be added as a prepartion patch for QGV points
restricting for new SAGV support.
v2: - Extracted those changes into separate patch
(Ville Syrjälä)
v3: - Moved new PCode masks to another place from
PCode commands(Ville)
v4: - Moved new PCode masks to correspondent PCode
command, with identation(Ville)
- Changed naming to ICL_ instead of GEN11_
to fit more nicely into existing definition
style.
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200505102247.32452-5-stanislav.lisovskiy@intel.com
Unmask/enable AUX interrupts on all ports on TGL+. So far the interrupts
worked only on port A, which meant each transaction on other ports took
10ms.
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504075828.20348-1-imre.deak@intel.com
This was sort of annoying me:
random:~$ dmesg | tail -1
[523884.039227] [drm] Reducing the compressed framebuffer size. This may lead to less power savings than a non-reduced-size. Try to increase stolen memory size if available in BIOS.
random:~$ dmesg | grep -c "Reducing the compressed"
47
This patch makes it DRM_INFO_ONCE() just like the similar message
farther down in that function is pr_info_once().
Cc: stable@vger.kernel.org
Signed-off-by: Peter Jones <pjones@redhat.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1745
Link: https://patchwork.freedesktop.org/patch/msgid/20180706190424.29194-1-pjones@redhat.com
[vsyrjala: Rebase due to per-device logging]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit 6b7fc6a3e6)
[Rodrigo: port back to DRM_INFO_ONCE]
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In commit 5a7d202b15, a logical AND was erroneously changed to an OR,
causing WaIncreaseLatencyIPCEnabled to be enabled unconditionally for
kabylake and coffeelake, even when IPC is disabled. Fix the logic so
that WaIncreaseLatencyIPCEnabled is only used when IPC is enabled.
Fixes: 5a7d202b15 ("drm/i915: Drop WaIncreaseLatencyIPCEnabled/1140 for cnl")
Cc: stable@vger.kernel.org # 5.3.x+
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430214654.51314-1-sultan@kerneltoast.com
(cherry picked from commit 690d22dafa)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501122249.12417-1-chris@chris-wilson.co.uk
(cherry picked from commit a211da9c77)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In commit 5a7d202b15, a logical AND was erroneously changed to an OR,
causing WaIncreaseLatencyIPCEnabled to be enabled unconditionally for
kabylake and coffeelake, even when IPC is disabled. Fix the logic so
that WaIncreaseLatencyIPCEnabled is only used when IPC is enabled.
Fixes: 5a7d202b15 ("drm/i915: Drop WaIncreaseLatencyIPCEnabled/1140 for cnl")
Cc: stable@vger.kernel.org # 5.3.x+
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430214654.51314-1-sultan@kerneltoast.com
All these ROUNDING_FACTORs and whatnot are making this thing hard to
read. Get rid of them. And let's massage some of the fractions to
give us less questionable intermediate results and perhaps less
divisions.
Also looks like a good helping of 64bit math stuff is needed to
avoid some of overflows present in the current code. There
might still be a few overflows, namely when calculating
link_clks_available/samples_room (would require a huge hblank
though), and potentially when calculating hblank_rise (not sure
how large link_clks_active can get).
It looks like we're still not calculating exactly what the spec says
since we truncate tu_data and tu_line early. But I'm too lazy to
figure out if we could avoid that.
v2: Fix typo in commit msg (Uma)
Remove ROUNDING_FACTOR define (Uma)
s/5*link_clk+5*cdclk/5*(link_clk+cdclk)/ (Chris)
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429185457.26235-3-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Since the code seems insistent on using the variable names from the
bspec formulat, let's be consistent and use those names for all
the things. For some reason 'link_clk' and 'lanes' were left out
in the code until now.
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429185457.26235-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
mode.vrefresh is rounded to the nearest integer. You don't want to use
it anywhere that requires precision. Also I want to nuke it.
vtotal*vrefresh == 1000*clock/htotal, so let's use the latter.
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429185457.26235-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Remove all the stepping dependent cnl workarounds. Bspec lists
more steppings than this so presumably these are classed as
pre-production. And this is cnl after all so no one should
really care anyway.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430125822.21985-2-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Display WA #1105 says that FBC requires PLANE_STRIDE to be a multiple
of 512 bytes on gen9 and glk.
This is definitely true for glk as certain tests (such as
igt/kms_big_fb/linear-16bpp-rotate-0) are now failing when the
display resolution results in a plane stride which is not a
multiple of 512 bytes.
Curiously I was not able to reproduce this on a KBL. First I
suspected that our use of the FBC override stride explain this,
but after trying to use the override stride on glk the test
still failed. I did try both the old CHICKEN_MISC_4 way and
the new FBC_STRIDE way, neither had any effect on the result.
Anyways, we need this at least on glk. But let's trust the spec
and apply the w/a for all gen9 as well, despite being unable to
reproduce the problem.
v2: s/FBC_CHICKEN/FBC_STRIDE/ in commit msg
Cc: José Roberto de Souza <jose.souza@intel.com>
Fixes: 691f7ba58d ("drm/i915/display/fbc: Make fences a nice-to-have for GEN9+")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429101034.8208-2-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
That is a preparation patch before next one where we
introduce old_bw_state and a bunch of other changes
as well.
In a review comment it was suggested to split out
at least that renaming into a separate patch, what
is done here.
v2: Removed spurious space
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423075902.21892-8-stanislav.lisovskiy@intel.com
We need to calculate SAGV mask also in a non-modeset
commit, however currently active_pipes are only calculated
for modesets in global atomic state, thus now we will be
tracking those also in bw_state in order to be able to
properly access global data.
v2: - Removed pre/post plane SAGV updates from modeset(Ville)
- Now tracking active pipes in intel_can_enable_sagv(Ville)
v3: - lock global state if active_pipes change as well(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430195634.7666-1-stanislav.lisovskiy@intel.com
Future platforms require per-crtc SAGV evaluation
and serializing global state when those are changed
from different commits.
v2: - Add has_sagv check to intel_crtc_can_enable_sagv
so that it sets bit in reject mask.
- Use bw_state in intel_pre/post_plane_enable_sagv
instead of atomic state
v3: - Fixed rebase conflict, now using
intel_atomic_crtc_state_for_each_plane_state in
order to call it from atomic check
v4: - Use fb modifier from plane state
v5: - Make intel_has_sagv static again(Ville)
- Removed unnecessary NULL assignments(Ville)
- Removed unnecessary SAGV debug(Ville)
- Call intel_compute_sagv_mask only for modesets(Ville)
- Serialize global state only if sagv results change, but
not mask itself(Ville)
v6: - use lock global state instead of serialize(Ville)
v7: - use both global state lock and serialize depending on
if we need to change only global state or access hw
(Ville)
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Cc: Ville Syrjälä <ville.syrjala@intel.com>
Cc: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430191757.18206-1-stanislav.lisovskiy@intel.com
The older arches did not convert MI_STORE_DATA_IMM to using the GTT, but
left them writing to a physical address. The notes suggest that the
primary reason would be so that the writes were cache coherent, as the
CPU cache uses physical tagging. As such we did not implement the
legacy variant of MI_STORE_DATA_IMM and so left all the relocations
synchronous -- but with a small function to convert from the vma address
into the physical address, we can implement asynchronous relocs on these
older arches, fixing up a few tests that require them.
In order to be able to test the legacy paths, refactor the gpu
relocations so that we can hook them up to a selftest.
v2: Use an array of offsets not enum labels for the selftest
v3: Refactor the common igt_hexdump()
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/757
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504140629.28240-1-chris@chris-wilson.co.uk
It is required that a chained batch be in the same address domain as its
parent, and also that must be specified in the command for earlier gen
as it is not inferred from the chaining until gen6.
Fixes: 964a9b0f61 ("drm/i915/gem: Use chained reloc batches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200504125149.4396-1-chris@chris-wilson.co.uk
We only need the device wakeref on freeing the objects if we have to
unbind the object from the global GTT, or otherwise update device
information. If the objects are clean, we never need the wakeref, so
avoid taking until required.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200503171513.18704-1-chris@chris-wilson.co.uk
If at first we don't succeed, try try again.
Not all engines may support the MI ops we need to perform asynchronous
relocation patching, and so we end up falling back to a synchronous
operation that has a liability of blocking. However, Tvrtko pointed out
we don't need to use the same engine to perform the relocations as we
are planning to execute the execbuf on, and so if we switch over to a
working engine, we can perform the relocation asynchronously. The user
execbuf will be queued after the relocations by virtue of fencing.
This patch creates a new context per execbuf requiring asynchronous
relocations on an unusable engines. This is perhaps a bit excessive and
can be ameliorated by a small context cache, but for the moment we only
need it for working around a little used engine on Sandybridge, and only
if relocations are actually required to an active batch buffer.
Now we just need to teach the relocation code to handle physical
addressing for gen2/3, and we should then have universal support!
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_reloc/basic-spin # snb
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-3-chris@chris-wilson.co.uk
As we can now keep chaining together a relocation batch to process any
number of relocations, we can keep building that relocation batch for
all of the target vma. This avoiding emitting a new request into the
ring for each target, consuming precious ring space and a potential
stall.
v2: Propagate the failure from submitting the relocation batch.
Testcase: igt/gem_exec_reloc/basic-wide-active
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-2-chris@chris-wilson.co.uk
The ring is a precious resource: we anticipate to only use a few hundred
bytes for a request, and only try to reserve that before we start. If we
go beyond our guess in building the request, then instead of waiting at
the start of execbuf before we hold any locks or other resources, we
may trigger a wait inside a critical region. One example is in using gpu
relocations, where currently we emit a new MI_BB_START from the ring
every time we overflow a page of relocation entries. However, instead of
insert the command into the precious ring, we can chain the next page of
relocation entries as MI_BB_START from the end of the previous.
v2: Delay the emit_bb_start until after all the chained vma
synchronisation is complete. Since the buffer pool batches are idle, this
_should_ be a no-op, but one day we may some fancy async GPU bindings
for new vma!
v3: Use pool/batch consitently, once we start thinking in terms of the
batch vma, use batch->obj.
v4: Explain the magic number 4.
Tvrtko spotted that we lose propagation of the error for failing to
submit the relocation request; that's easier to fix up in the next
patch.
Testcase: igt/gem_exec_reloc/basic-many-active
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501192945.22215-1-chris@chris-wilson.co.uk
gdb uses ptrace() to peek and poke bytes of the target's address space.
The driver must implement an vm_ops->access() handler or else gdb will
be unable to inspect the pointer and report it as out-of-bounds.
Worse than useless as it causes immediate suspicion of the valid GTT
pointer, distracting the poor programmer trying to find his bug.
v2: Write-protect readonly objects (Matthew).
Testcase: igt/gem_mmap_gtt/ptrace
Testcase: igt/gem_mmap_offset/ptrace
Suggested-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501145120.18830-1-chris@chris-wilson.co.uk
In order to allow userspace to rely on timeslicing to reorder their
batches, we must support preemption of those user batches. Declare
timeslicing as an explicit property that is a combination of having the
kernel support and HW support.
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200501122249.12417-1-chris@chris-wilson.co.uk
When i915_gem_execbuffer2_ioctl() is using user_access_begin(),
that's only to perform unsafe_put_user() so use
user_write_access_begin() in order to only open write access.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ebf1250b6d4f351469fb339e5399d8b92aa8a1c1.1585898438.git.christophe.leroy@c-s.fr
Since the introduction of 'soft-rc6', we aim to park the device quickly
and that results in frequent idling of the whole device. Currently upon
idling we free the batch buffer pool, and so this renders the cache
ineffective for many workloads. If we want to have an effective cache of
recently allocated buffers available for reuse, we need to decouple that
cache from the engine powermanagement and make it timer based. As there
is no reason then to keep it within the engine (where it once made
retirement order easier to track), we can move it up the hierarchy to the
owner of the memory allocations.
v2: Hook up to debugfs/drop_caches to clear the cache on demand.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430111819.10262-2-chris@chris-wilson.co.uk
Extend coverage of the blitter client by exercising conversion to and
from tiled sources. In the process we perform spot checks to verify that
the tiling/detiling is being applied correctly, along with position
invariance of the tiling parameters.
Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200430064957.14942-1-chris@chris-wilson.co.uk
We reduced the clocks slowly after a boost event based on the
observation that the smoothness of animations suffered. However, since
reducing the evalution intervals, we should be able to respond to the
rapidly fluctuating workload of a simple desktop animation and so
restore the more aggressive downclocking.
References: 2a8862d2f3 ("drm/i915: Reduce the RPS shock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-6-chris@chris-wilson.co.uk
We treat parking as a manual RPS timeout event, and downclock the GPU
for the next unpark and batch execution. However, having restored the
aggressive downclocking and observed that we have very light workloads
whose only interaction is through the manual parking events, carry over
the aggressive downclocking to the fake RPS events.
References: 21abf0bf16 ("drm/i915/gt: Treat idling as a RPS downclock event")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-5-chris@chris-wilson.co.uk
As with the realisation for soft-rc6, we respond to idling the engines
within microseconds, far faster than the response times for HW RC6 and
RPS. Furthermore, our fast parking upon idle, prevents HW RPS from
running for many desktop workloads, as the RPS evaluation intervals are
on the order of tens of milliseconds, but the typical workload is just a
couple of milliseconds, but yet we still need to determine the best
frequency for user latency versus power.
Recognising that the HW evaluation intervals are a poor fit, and that
they were deprecated [in bspec at least] from gen10, start to wean
ourselves off them and replace the EI with a timer and our accurate
busy-stats. The principle benefit of manually evaluating RPS intervals
is that we can be more responsive for better performance and powersaving
for both spiky workloads and steady-state.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1698
Fixes: 98479ada42 ("drm/i915/gt: Treat idling as a RPS downclock event")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-4-chris@chris-wilson.co.uk
In the near future, we will utilize the busy-stats on each engine to
approximate the C0 cycles of each, and use that as an input to a manual
RPS mechanism. That entails having busy-stats always enabled and so we
can remove the enable/disable routines and simplify the pmu setup. As a
consequence of always having the stats enabled, we can also show the
current active time via sysfs/engine/xcs/active_time_ns.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429205446.3259-1-chris@chris-wilson.co.uk
We need to keep the default context state around to instantiate new
contexts (aka golden rendercontext), and we also keep it pinned while
the engine is active so that we can quickly reset a hanging context.
However, the default contexts are large enough to merit keeping in
swappable memory as opposed to kernel memory, so we store them inside
shmemfs. Currently, we use the normal GEM objects to create the default
context image, but we can throw away all but the shmemfs file.
This greatly simplifies the tricky power management code which wants to
run underneath the normal GT locking, and we definitely do not want to
use any high level objects that may appear to recurse back into the GT.
Though perhaps the primary advantage of the complex GEM object is that
we aggressively cache the mapping, but here we are recreating the
vm_area everytime time we unpark. At the worst, we add a lightweight
cache, but first find a microbenchmark that is impacted.
Having started to create some utility functions to make working with
shmemfs objects easier, we can start putting them to wider use, where
GEM objects are overkill, such as storing persistent error state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429172429.6054-1-chris@chris-wilson.co.uk
Let's just calculate the hsync rate on demand. No point in wasting
space storing it and risking the cached value getting out of sync
with reality.
v2: Move drm_mode_hsync() next to its only users
Drop the TODO
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> #v1
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428171940.19552-2-ville.syrjala@linux.intel.com
If intel_context_create() fails then it leads to an error pointer
dereference. I shuffled things around to make error handling easier.
Fixes: 1dd47b54ba ("drm/i915: Add live selftests for indirect ctx batchbuffers")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429132425.GE815283@mwanda
When building with clang + -Wuninitialized:
drivers/gpu/drm/i915/gt/debugfs_gt_pm.c:407:7: warning: variable
'rpcurupei' is uninitialized when used here [-Wuninitialized]
rpcurupei,
^~~~~~~~~
drivers/gpu/drm/i915/gt/debugfs_gt_pm.c:304:16: note: initialize the
variable 'rpcurupei' to silence this warning
u32 rpcurupei, rpcurup, rpprevup;
^
= 0
1 warning generated.
rpupei is assigned twice; based on the second argument to
intel_uncore_read, it seems this one should have been assigned to
rpcurupei.
Fixes: 9c878557b1 ("drm/i915/gt: Use the RPM config register to determine clk frequencies")
Link: https://github.com/ClangBuiltLinux/linux/issues/1016
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200429030051.920203-1-natechancellor@gmail.com
The IRQ postinstall handling had open-coded pipe fault mask selection
that never got updated for gen11. Switch it to use
gen8_de_pipe_fault_mask() to ensure we don't miss updates for new
platforms.
Cc: José Roberto de Souza <jose.souza@intel.com>
Fixes: d506a65d56 ("drm/i915: Catch GTT fault errors for gen11+ planes")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424231423.4065231-1-matthew.d.roper@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit 869129ee0c)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The presumption is that by using a circular counter that is twice as
large as the maximum ELSP submission, we would never reuse the same CCID
for two inflight contexts.
However, if we continually preempt an active context such that it always
remains inflight, it can be resubmitted with an arbitrary number of
paired contexts. As each of its paired contexts will use a new CCID,
eventually it will wrap and submit two ELSP with the same CCID.
Rather than use a simple circular counter, switch over to a small bitmap
of inflight ids so we can avoid reusing one that is still potentially
active.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk
The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.
To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
The IRQ postinstall handling had open-coded pipe fault mask selection
that never got updated for gen11. Switch it to use
gen8_de_pipe_fault_mask() to ensure we don't miss updates for new
platforms.
Cc: José Roberto de Souza <jose.souza@intel.com>
Fixes: d506a65d56 ("drm/i915: Catch GTT fault errors for gen11+ planes")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424231423.4065231-1-matthew.d.roper@intel.com
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
While the ggtt vma are protected by their object lifetime, the list
continues until it hits a non-ggtt vma, and that vma is not protected
and may be freed as we inspect it. Hence, we require the obj->vma.lock
to protect the list as we iterate.
An example of forgetting to hold the obj->vma.lock is
[1642834.464973] general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP PTI
[1642834.464977] CPU: 3 PID: 1954 Comm: Xorg Not tainted 5.6.0-300.fc32.x86_64 #1
[1642834.464979] Hardware name: LENOVO 20ARS25701/20ARS25701, BIOS GJET94WW (2.44 ) 09/14/2017
[1642834.465021] RIP: 0010:i915_gem_object_set_tiling+0x2c0/0x3e0 [i915]
[1642834.465024] Code: 8b 84 24 18 01 00 00 f6 c4 80 74 59 49 8b 94 24 a0 00 00 00 49 8b 84 24 e0 00 00 00 49 8b 74 24 10 48 8b 92 30 01 00 00 89 c7 <80> ba 0a 06 00 00 03 0f 87 86 00 00 00 ba 00 00 08 00 b9 00 00 10
[1642834.465025] RSP: 0018:ffffa98780c77d60 EFLAGS: 00010282
[1642834.465028] RAX: ffff8d232bfb2578 RBX: 0000000000000002 RCX: ffff8d25873a0000
[1642834.465029] RDX: dead000000000122 RSI: fffff0af8ac6e408 RDI: 000000002bfb2578
[1642834.465030] RBP: ffff8d25873a0000 R08: ffff8d252bfb5638 R09: 0000000000000000
[1642834.465031] R10: 0000000000000000 R11: ffff8d252bfb5640 R12: ffffa987801cb8f8
[1642834.465032] R13: 0000000000001000 R14: ffff8d233e972e50 R15: ffff8d233e972d00
[1642834.465034] FS: 00007f6a3d327f00(0000) GS:ffff8d25926c0000(0000) knlGS:0000000000000000
[1642834.465036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1642834.465037] CR2: 00007f6a2064d000 CR3: 00000002fb57c001 CR4: 00000000001606e0
[1642834.465038] Call Trace:
[1642834.465083] i915_gem_set_tiling_ioctl+0x122/0x230 [i915]
[1642834.465121] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
[1642834.465151] drm_ioctl_kernel+0x86/0xd0 [drm]
[1642834.465156] ? avc_has_perm+0x3b/0x160
[1642834.465178] drm_ioctl+0x206/0x390 [drm]
[1642834.465216] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
[1642834.465221] ? selinux_file_ioctl+0x122/0x1c0
[1642834.465226] ? __do_munmap+0x24b/0x4d0
[1642834.465231] ksys_ioctl+0x82/0xc0
[1642834.465235] __x64_sys_ioctl+0x16/0x20
[1642834.465238] do_syscall_64+0x5b/0xf0
[1642834.465243] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1642834.465245] RIP: 0033:0x7f6a3d7b047b
[1642834.465247] Code: 0f 1e fa 48 8b 05 1d aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed a9 0c 00 f7 d8 64 89 01 48
[1642834.465249] RSP: 002b:00007ffe71adba28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1642834.465251] RAX: ffffffffffffffda RBX: 000055f99048fa40 RCX: 00007f6a3d7b047b
[1642834.465253] RDX: 00007ffe71adba30 RSI: 00000000c0106461 RDI: 000000000000000e
[1642834.465254] RBP: 0000000000000002 R08: 000055f98f3f1798 R09: 0000000000000002
[1642834.465255] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000080
[1642834.465257] R13: 000055f98f3f1690 R14: 00000000c0106461 R15: 00007ffe71adba30
Now to take the spinlock during the list iteration, we need to break it
down into two phases. In the first phase under the lock, we cannot sleep
and so must defer the actual work to a second list, protected by the
ggtt->mutex.
We also need to hold the spinlock during creation of a new vma to
serialise with updates of the tiling on the object.
Reported-by: Dave Airlie <airlied@redhat.com>
Fixes: 2850748ef8 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422072805.17340-1-chris@chris-wilson.co.uk
(cherry picked from commit cb593e5d2b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
igt_ppgtt_pin_update() invokes i915_gem_context_get_vm_rcu(), which
returns a reference of the i915_address_space object to "vm" with
increased refcount.
When igt_ppgtt_pin_update() returns, "vm" becomes invalid, so the
refcount should be decreased to keep refcount balanced.
The reference counting issue happens in two exception handling paths of
igt_ppgtt_pin_update(). When i915_gem_object_create_internal() returns
IS_ERR, the refcnt increased by i915_gem_context_get_vm_rcu() is not
decreased, causing a refcnt leak.
Fix this issue by jumping to "out_vm" label when
i915_gem_object_create_internal() returns IS_ERR.
Fixes: a4e7ccdac3 ("drm/i915: Move context management under GEM")
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1587361342-83494-1-git-send-email-xiyuyang19@fudan.edu.cn
(cherry picked from commit e07c7606a0)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The bspec lists both the clock frequency and the effective interval. The
interval corresponds to observed behaviour, so adjust the frequency to
match.
v2: Mika rightfully asked if we could measure the clock frequency from a
selftest.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200427154554.12736-1-chris@chris-wilson.co.uk
We see that if the HW doesn't actually sleep, the HW may eat the poison
we set in its write-only HWSP during sanitize:
intel_gt_resume.part.8: 0000:00:02.0
__gt_unpark: 0000:00:02.0
gt_sanitize: 0000:00:02.0 force:yes
process_csb: 0000:00:02.0 vcs0: cs-irq head=5, tail=90
process_csb: 0000:00:02.0 vcs0: csb[0]: status=0x5a5a5a5a:0x5a5a5a5a
assert_pending_valid: Nothing pending for promotion!
The CS TAIL pointer should have been reset by reset_csb_pointers(), so
in this case it is likely that we have read back from the CPU cache and
so we must clflush our control over that page. In doing so, push the
sanitisation to the start of the GT sequence so that our poisoning is
assuredly before we start talking to the HW.
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1794
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200427084000.10999-1-chris@chris-wilson.co.uk
We evaluate *active, which is a pointer into execlists->inflight[]
during dequeue to decide how long a preempt-timeout we need to apply.
However, as soon as we do the submit_ports, the HW may send its ACK
interrupt causing us to promote execlists->pending[] tp
execlists->inflight[], overwriting the value of *active. We know *active
is only stable until we submit (as we only submit when there is no
pending promotion).
[ 16.102328] BUG: KCSAN: data-race in execlists_dequeue+0x1449/0x1600 [i915]
[ 16.102356]
[ 16.102375] race at unknown origin, with read to 0xffff8881e9500488 of 8 bytes by task 429 on cpu 1:
[ 16.102780] execlists_dequeue+0x1449/0x1600 [i915]
[ 16.103160] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 16.103540] execlists_submit_request+0x38e/0x3c0 [i915]
[ 16.103940] submit_notify+0x8f/0xc0 [i915]
[ 16.104308] __i915_sw_fence_complete+0x61/0x420 [i915]
[ 16.104683] i915_sw_fence_complete+0x58/0x80 [i915]
[ 16.105054] i915_sw_fence_commit+0x16/0x20 [i915]
[ 16.105457] __i915_request_queue+0x60/0x70 [i915]
[ 16.105843] i915_gem_do_execbuffer+0x2d6b/0x4230 [i915]
[ 16.106227] i915_gem_execbuffer2_ioctl+0x2b0/0x580 [i915]
[ 16.106257] drm_ioctl_kernel+0xe9/0x130
[ 16.106279] drm_ioctl+0x27d/0x45e
[ 16.106311] ksys_ioctl+0x89/0xb0
[ 16.106336] __x64_sys_ioctl+0x42/0x60
[ 16.106370] do_syscall_64+0x6e/0x2c0
[ 16.106397] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200426094231.21995-1-chris@chris-wilson.co.uk
Use indirect ctx bb to load cmd buffer control value
from context image to avoid corruption.
v2: add to lrc layout (Chris)
v3: end to a cacheline (Chris)
v4: add to lrc fixed (Chris)
v5: value in offset+1
Testcase: igt/i915_selftest/gt_lrc
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230632.30333-1-mika.kuoppala@linux.intel.com
Indirect ctx batchbuffers are a hw feature of which
batch can be run, by hardware, during context restoration stage.
Driver can setup a batchbuffer and also an offset into the
context image. When context image is marshalled from
memory to registers, and when the offset from the start of
context register state is equal of what driver pre-determined,
batch will run. So one can manipulate context restoration
process at cacheline granularity, given some limitations,
as you need to have rudimentaries in place before you can
run a batch.
Add selftest which will write the ring start register
to a canary spot. This will test that hardware will run a
batchbuffer for the context in question.
v2: request wait fix, naming (Chris)
v3: test order (Chris)
v4: rebase
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424214841.28076-3-mika.kuoppala@linux.intel.com
Restoration of a previous timestamp can collide
with updating the timestamp, causing a value corruption.
Combat this issue by using indirect ctx bb to
modify the context image during restoring process.
We can preload value into scratch register. From which
we then do the actual write with LRR. LRR is faster and
thus less error prone as probability of race drops.
v2: tidying (Chris)
v3: lrr for all engines
v4: grp
v5: reg bit
v6: wa_bb_offset, virtual engines (Chris)
References: HSDES#16010904313
Testcase: igt/i915_selftest/gt_lrc
Suggested-by: Joseph Koston <joseph.koston@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424230546.30271-1-mika.kuoppala@linux.intel.com
We only hold the active spinlock while dumping the error state, and this
does not prevent another thread from retiring the request -- as it is
quite possible that despite us capturing the current state, the GPU has
completed the request. As such, it is dangerous to dereference state
below the request as it may already be freed, and the simplest way to
avoid the danger is not include it in the error state.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1788
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424191410.27570-1-chris@chris-wilson.co.uk
Rename DPM_FLAG_NEVER_SKIP to DPM_FLAG_NO_DIRECT_COMPLETE which
matches its purpose more closely.
No functional impact.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> # for PCI parts
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
For many configuration details within RC6 and RPS we are programming
intervals for the internal clocks. From gen11, these clocks are
configuration via the RPM_CONFIG and so for convenience, we would like
to convert to/from more natural units (ns).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424162805.25920-2-chris@chris-wilson.co.uk
Add tracek to the RPS events (interrupts, worker, enabling, threshold
selection, frequency setting), so that if we have to debug reticent HW
we have some traces to start from.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200424162805.25920-1-chris@chris-wilson.co.uk
The RPS DOWN_TIMEOUT interrupt is signaled after a period of rc6, and
upon receipt of that interrupt we reprogram the GPU clocks down to the
next idle notch [to help convserve power during rc6]. However, on
execlists, we benefit from soft-rc6 immediately parking the GPU and
setting idle frequencies upon idling [within a jiffie], and here the
interrupt prevents us from restarting from our last frequency.
In the process, we can simply opt for a static pm_events mask and rely
on the enable/disable interrupts to flush the worker on parking.
This will reduce the amount of oscillation observed during steady
workloads with microsleeps, as each time the rc6 timeout occurs we
immediately follow with a waitboost for a dropped frame.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422001703.1697-1-chris@chris-wilson.co.uk
Pass the entire connector state to intel_{gmch,pch}_panel_fitting().
For now we just need to get at .scaling_mode but in the future we'll
want access to the margin properties as well.
v2: Deal with intel_dp_ycbcr420_config()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422161917.17389-5-ville.syrjala@linux.intel.com
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Make things a bit more abstract by replacing the pch_pfit.pos/size
raw register values with a drm_rect. Makes it slighly more convenient
to eg. compute the scaling factors.
v2: Use drm_rect_init()
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422161917.17389-3-ville.syrjala@linux.intel.com
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Most of the pfit functions are of the form:
func()
{
if (pfit_enabled) {
...
}
}
Flip the pfit_enabled check around to flatten the functions.
And while we're touching all this let's do the usual
s/pipe_config/crtc_state/ replacement.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422161917.17389-2-ville.syrjala@linux.intel.com
Fix skl_update_scaler_crtc() to deal with different scaling
modes correctly. The current implementation assumes
DRM_MODE_SCALE_FULLSCREEN. Fortunately we don't expose any
border properties currently so the code does actually end
up doing the right thing (assigning a scaler for pfit).
The code does need to be fixed before any borders are
exposed.
Also we have redundant calls to skl_update_scaler_crtc() in
dp/hdmi .compute_config() which can be nuked. They were anyway
called before we had even computed the pfit state so were
basically nonsense. The real call we need to keep is in
intel_crtc_atomic_check().
v2: Deal witrh skl_update_scaler_crtc() in intel_dp_ycbcr420_config()
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422161917.17389-1-ville.syrjala@linux.intel.com
The history of i915_vma_close() is confusing, as is its use. As the
lifetime of the i915_vma is currently bounded by the object it is
attached to, we needed a means of identify when a vma was no longer in
use by userspace (via the user's fd). This is further complicated by
that only ppgtt vma should be closed at the user's behest, as the ggtt
were always shared.
Now that we attach the vma to a lut on the user's context, the open
count does indicate how many unique and open context/vm are referencing
this vma from the user. As such, we can and should just use the
open_count to track when the vma is still in use by userspace.
It's a poor man's replacement for reference counting.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/1193
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422190558.30509-1-chris@chris-wilson.co.uk
Under ideal circumstances, the driver should be able to keep the GPU
fully saturated with work. Measure how close to ideal we get under the
harshest of conditions with no user payload.
v2: Also measure throughput using only one thread.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422074203.9799-1-chris@chris-wilson.co.uk
intel_gt_wait_for_idle() tries to wait until all the outstanding requests
are retired and the GPU is idle. As a side effect of retiring requests,
we may submit more work to flush any pm barriers, and so the
wait-for-idle tries to flush the background pm work and catch the new
requests. However, if the work completed in the background before we
were able to flush, it would queue the extra barrier request without us
noticing -- and so we would return from wait-for-idle with one request
remaining. (This breaks e.g. record_default_state where we need to wait
until that barrier is retired, and it may slow suspend down by causing
us to wait on the background retirement worker as opposed to immediately
retiring the barrier.)
However, since we track if there has been a submission since the engine
pm barrier, we can very quickly detect if the idle barrier is still
outstanding.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1763
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423085940.28168-1-chris@chris-wilson.co.uk
During the virtual engine's submission tasklet, we take the request and
insert into the submission queue on each of our siblings. This seems
quite simply, and so no problems with ordering. However, the sibling
execlists' submission tasklets may run concurrently with the virtual
engine's tasklet, submitting the request to HW before the virtual
finishes its task of telling all the siblings. If this happens, the
sibling tasklet may *reorder* the ve->sibling[] array that the virtual
engine tasklet is processing. This can *only* reorder within the
elements already processed by the virtual engine, nevertheless the
race is detected by KCSAN:
[ 185.580014] BUG: KCSAN: data-race in execlists_dequeue [i915] / virtual_submission_tasklet [i915]
[ 185.580054]
[ 185.580076] write to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 2:
[ 185.580553] execlists_dequeue+0x6ad/0x1600 [i915]
[ 185.581044] __execlists_submission_tasklet+0x48/0x60 [i915]
[ 185.581517] execlists_submission_tasklet+0xd3/0x170 [i915]
[ 185.581554] tasklet_action_common.isra.0+0x42/0x90
[ 185.581585] __do_softirq+0xc8/0x206
[ 185.581613] run_ksoftirqd+0x15/0x20
[ 185.581641] smpboot_thread_fn+0x15a/0x270
[ 185.581669] kthread+0x19a/0x1e0
[ 185.581695] ret_from_fork+0x1f/0x30
[ 185.581717]
[ 185.581736] read to 0xffff8881f1919860 of 8 bytes by interrupt on cpu 0:
[ 185.582231] virtual_submission_tasklet+0x10e/0x5c0 [i915]
[ 185.582265] tasklet_action_common.isra.0+0x42/0x90
[ 185.582291] __do_softirq+0xc8/0x206
[ 185.582315] run_ksoftirqd+0x15/0x20
[ 185.582340] smpboot_thread_fn+0x15a/0x270
[ 185.582368] kthread+0x19a/0x1e0
[ 185.582395] ret_from_fork+0x1f/0x30
[ 185.582417]
We can prevent this race by checking for the ve->request after looking
up the sibling array.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200423115315.26825-1-chris@chris-wilson.co.uk
Fix the check for when an AUX power well enabling timeout is expected on
a legacy TypeC port.
Fixes: 89e01caac6 ("drm/i915: Use single set of AUX powerwell ops for gen11+")
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422123440.19522-1-imre.deak@intel.com
When we migrated to execlists, one of the conditions we wanted to test
for was whether the breadcrumb seqno was being written before the
breadcumb interrupt was delivered. This was following on from issues
observed on previous generations which were not so strongly ordered. With
the removal of the missed interrupt detection, we have not reliable
means of detecting the out-of-order seqno/interrupt but instead tried to
assert that the relationship between the CS event interrupt and the
breadwrite should be strongly ordered. However, Icelake proves it is
possible for the HW implementation to forget about minor little details
such as write ordering and so the order between *processing* the CS
event and the breadcrumb is unreliable.
Remove the unreliable assertion, but leave a debug telltale in case we
have reason to suspect.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1658
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422141749.28709-1-chris@chris-wilson.co.uk
While the ggtt vma are protected by their object lifetime, the list
continues until it hits a non-ggtt vma, and that vma is not protected
and may be freed as we inspect it. Hence, we require the obj->vma.lock
to protect the list as we iterate.
An example of forgetting to hold the obj->vma.lock is
[1642834.464973] general protection fault, probably for non-canonical address 0xdead000000000122: 0000 [#1] SMP PTI
[1642834.464977] CPU: 3 PID: 1954 Comm: Xorg Not tainted 5.6.0-300.fc32.x86_64 #1
[1642834.464979] Hardware name: LENOVO 20ARS25701/20ARS25701, BIOS GJET94WW (2.44 ) 09/14/2017
[1642834.465021] RIP: 0010:i915_gem_object_set_tiling+0x2c0/0x3e0 [i915]
[1642834.465024] Code: 8b 84 24 18 01 00 00 f6 c4 80 74 59 49 8b 94 24 a0 00 00 00 49 8b 84 24 e0 00 00 00 49 8b 74 24 10 48 8b 92 30 01 00 00 89 c7 <80> ba 0a 06 00 00 03 0f 87 86 00 00 00 ba 00 00 08 00 b9 00 00 10
[1642834.465025] RSP: 0018:ffffa98780c77d60 EFLAGS: 00010282
[1642834.465028] RAX: ffff8d232bfb2578 RBX: 0000000000000002 RCX: ffff8d25873a0000
[1642834.465029] RDX: dead000000000122 RSI: fffff0af8ac6e408 RDI: 000000002bfb2578
[1642834.465030] RBP: ffff8d25873a0000 R08: ffff8d252bfb5638 R09: 0000000000000000
[1642834.465031] R10: 0000000000000000 R11: ffff8d252bfb5640 R12: ffffa987801cb8f8
[1642834.465032] R13: 0000000000001000 R14: ffff8d233e972e50 R15: ffff8d233e972d00
[1642834.465034] FS: 00007f6a3d327f00(0000) GS:ffff8d25926c0000(0000) knlGS:0000000000000000
[1642834.465036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1642834.465037] CR2: 00007f6a2064d000 CR3: 00000002fb57c001 CR4: 00000000001606e0
[1642834.465038] Call Trace:
[1642834.465083] i915_gem_set_tiling_ioctl+0x122/0x230 [i915]
[1642834.465121] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
[1642834.465151] drm_ioctl_kernel+0x86/0xd0 [drm]
[1642834.465156] ? avc_has_perm+0x3b/0x160
[1642834.465178] drm_ioctl+0x206/0x390 [drm]
[1642834.465216] ? i915_gem_object_set_tiling+0x3e0/0x3e0 [i915]
[1642834.465221] ? selinux_file_ioctl+0x122/0x1c0
[1642834.465226] ? __do_munmap+0x24b/0x4d0
[1642834.465231] ksys_ioctl+0x82/0xc0
[1642834.465235] __x64_sys_ioctl+0x16/0x20
[1642834.465238] do_syscall_64+0x5b/0xf0
[1642834.465243] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1642834.465245] RIP: 0033:0x7f6a3d7b047b
[1642834.465247] Code: 0f 1e fa 48 8b 05 1d aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed a9 0c 00 f7 d8 64 89 01 48
[1642834.465249] RSP: 002b:00007ffe71adba28 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1642834.465251] RAX: ffffffffffffffda RBX: 000055f99048fa40 RCX: 00007f6a3d7b047b
[1642834.465253] RDX: 00007ffe71adba30 RSI: 00000000c0106461 RDI: 000000000000000e
[1642834.465254] RBP: 0000000000000002 R08: 000055f98f3f1798 R09: 0000000000000002
[1642834.465255] R10: 0000000000001000 R11: 0000000000000246 R12: 0000000000000080
[1642834.465257] R13: 000055f98f3f1690 R14: 00000000c0106461 R15: 00007ffe71adba30
Now to take the spinlock during the list iteration, we need to break it
down into two phases. In the first phase under the lock, we cannot sleep
and so must defer the actual work to a second list, protected by the
ggtt->mutex.
We also need to hold the spinlock during creation of a new vma to
serialise with updates of the tiling on the object.
Reported-by: Dave Airlie <airlied@redhat.com>
Fixes: 2850748ef8 ("drm/i915: Pull i915_vma_pin under the vm->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422072805.17340-1-chris@chris-wilson.co.uk
Since batch buffers dominant execution time, most preemption requests
should naturally occur during execution of a batch buffer. We wish to
verify that should a preemption occur within a batch buffer, when we
come to restart that batch buffer, it occurs at the interrupted
instruction and most importantly does not rollback to an earlier point.
v2: Do not clear the GPR at the start of the batch, but rely on them
being clear for new contexts.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200422100903.25216-1-chris@chris-wilson.co.uk
kernel + tools/perf:
Alexey Budankov:
- Introduce CAP_PERFMON to kernel and user space.
callchains:
Adrian Hunter:
- Allow using Intel PT to synthesize callchains for regular events.
Kan Liang:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
perf script:
Andreas Gerstmayr:
- Add flamegraph.py script
BPF:
Jiri Olsa:
- Synthesize bpf_trampoline/dispatcher ksymbol events.
perf stat:
Arnaldo Carvalho de Melo:
- Honour --timeout for forked workloads.
Stephane Eranian:
- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.
perf bench:
Ian Rogers:
- Add event synthesis benchmark.
tools api fs:
Stephane Eranian:
- Make xxx__mountpoint() more scalable
libtraceevent:
He Zhe:
- Handle return value of asprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+
J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ
qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU=
=FHm4
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo:
kernel + tools/perf:
Alexey Budankov:
- Introduce CAP_PERFMON to kernel and user space.
callchains:
Adrian Hunter:
- Allow using Intel PT to synthesize callchains for regular events.
Kan Liang:
- Stitch LBR records from multiple samples to get deeper backtraces,
there are caveats, see the csets for details.
perf script:
Andreas Gerstmayr:
- Add flamegraph.py script
BPF:
Jiri Olsa:
- Synthesize bpf_trampoline/dispatcher ksymbol events.
perf stat:
Arnaldo Carvalho de Melo:
- Honour --timeout for forked workloads.
Stephane Eranian:
- Force error in fallback on :k events, to avoid counting nothing when
the user asks for kernel events but is not allowed to.
perf bench:
Ian Rogers:
- Add event synthesis benchmark.
tools api fs:
Stephane Eranian:
- Make xxx__mountpoint() more scalable
libtraceevent:
He Zhe:
- Handle return value of asprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>