Once the Windows guest is shutdown, the display pipe will be disabled
and intel_gvt_check_vblank_emulation will be called to check if the
vblank timer is turned off. Given the scenario of creating VM1 ,VM2,
destoying VM2 in current code, VM1 has pipe enabled and continues to
check VM2, the flag have_enabled_pipe is always false since all the VM2
pipes are disabled, so the vblank timer will be canceled and TDR happens
in Windows VM1 guest due to the vsync timeout.
In this patch the vblank timer will be never canceled once one pipe is
enabled.
v2:
- remove have_enabled_pipe flag and check pipe enabled directly. (Zhenyu)
Cc: Wang Hongbo <hongbo.wang@intel.com>
Signed-off-by: fred gao <fred.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The req->fence.error will be set if this request caused GPU hang so
we can use this value to workload->status to indicate whether this
GVT request caused any problem. If it caused GPU hang, we shouldn't
trigger any context switch back to the guest.
v2:
- only take -EIO from fence->error. (Zhenyu)
Fixes: 8f1117abb4 (drm/i915/gvt: handle workload lifecycle properly)
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
For the vGPU workloads, now GVT-g use per vGPU scheduler, the per-ring
work_thread only pick workload belongs to the current vGPU. And with time
slice based scheduler, it waits all the engines become idle before do vGPU
switch. So we can run free dispatch in per-ring work_thread, different ring
running in different 'vGPU' won't happen.
For the workloads between vGPU and Host, this scheduler_mutex can't block
host to dispatch workload into other ring engines.
Here remove this mutex since it impacts the performance when applications
use more than 1 ring engines in 1 vgpu.
ring0 running in vGPU1, ring1 running in Host. Will happen.
ring0 running in vGPU1, ring1 running in vGPU2. Won't happen.
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
This reverts commit 62d02fd1f8.
The rwsem recursive trace should not be fixed from kvmgt side by using
a workqueue and it is an issue should be fixed in VFIO. So this one
should be reverted.
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
The command buffer address in context like ring buffer base address
and wa_ctx address need to be audit to make sure they are in the
valid GGTT range.
Signed-off-by: Ping Gao <ping.a.gao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
It will causes memory leak, if the function setup_spt_oos() fail,
in the function intel_gvt_init_gtt(),
which allocated by get_zeroed_page() and mapped by dma_map_page().
Unmap and free the page, after STP oos initialize fail,
it will fix this issue.
Signed-off-by: Zhou, Wenjia <zhiyuan_zhu@htc.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Commit fabef82562 ("drm/i915: Drop struct_mutex around frontbuffer
flushes") adds a dependency to ifbdev->vma when flushing the framebufer,
but the checks are only against the existence of the ifbdev->fb and not
against ifbdev->vma. This leaves a window of opportunity where we may
try to operate on the fbdev prior to it being probed (thanks to
asynchronous booting).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101534
Fixes: fabef82562 ("drm/i915: Drop struct_mutex around frontbuffer flushes")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170622160211.783-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 15727ed0d9)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
OA buffer initialization involves access to HW registers to set
the OA base, head and tail. Ensure device is awake while setting
these. With this, all oa.ops are covered under RPM and forcewake
wakelock.
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498585181-23048-1-git-send-email-sagar.a.kamble@intel.com
Fixes: d79651522e ("drm/i915: Enable i915 perf stream for Haswell OA unit")
Cc: <stable@vger.kernel.org> # v4.11+
(cherry picked from commit 987f8c444a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The Cursor Coeff is lower 6 bits in the PORT_TX_DW4 register
and hence the CURSOR_COEFF_MASK should be (0x3F << 0)
Fixes: 04416108cc ("drm/i915/cnl: Add registers related to voltage
swing sequences.")
Signed-off-by: Manasi Navare <manasi.d.navare@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1498785241-21138-1-git-send-email-manasi.d.navare@intel.com
(cherry picked from commit fcace3b9b7)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
During the review of Coffee Lake workarounds Mika pointed out
that WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC
should be removed from CFL and with that I should carry the rv-b.
However when doing the v2 I removed another Workaround that should
remain because although not mentioned by spec the history of hangs
around it advocates on its favor.
On some follow-up patches I continued operating on the wrong
workardound, but Ville noticed that, so here is the fix for the
current CFL code that is upstream already.
Fixes: 46c26662d2 ("drm/i915/cfl: Introduce Coffee Lake workarounds.")
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
(cherry picked from commit 98eed3d1ad)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When computing a hash for looking up relocation target handles in an
execbuf, we start with a large size for the hashtable and proceed to
halve it until the allocation succeeds. The final attempt is with an
order of 0 (i.e. a single element). This means that we then pass bits=0
to hash_32() which then computes "hash >> (32 - 0)" to lookup the single
element. Right shifting a value by the width of the operand is
undefined, so limit the smallest hash table we use to order 1.
v2: Keep the retry allocation flag for the final pass
Fixes: 4ff4b44cbb ("drm/i915: Store a direct lookup from object handle to vma")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170629150425.27508-1-chris@chris-wilson.co.uk
(cherry picked from commit 4d470f7359)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
There are still cases on these platforms where an attempt is made to
configure the CDCLK while the power domain is off, like when coming back
from a suspend. So the workaround below is still needed.
This effectively reverts commit 63ff304425 ("drm/i915: Nuke the
VLV/CHV PFI programming power domain workaround").
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101517
Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170628210605.4994-1-krisman@collabora.co.uk
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
(cherry picked from commit 886015a0ad)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
'dma_buf_vmap' returns NULL on error, not an error pointer.
Fixes: 6cca22ede8 ("drm/i915: Add some mock tests for dmabuf interop")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: http://patchwork.freedesktop.org/patch/msgid/20170627053854.21152-1-christophe.jaillet@wanadoo.fr
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 7c3f5317b8)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
We have pretty clear evidence that MSIs are getting lost on g4x and
somehow the interrupt logic doesn't seem to recover from that state
even if we try hard to clear the IIR.
Disabling IER around the normal IIR clearing in the irq handler isn't
sufficient to avoid this, so the problem really seems to be further
up the interrupt chain. This should guarantee that there's always
an edge if any IIR bits are set after the interrupt handler is done,
which should normally guarantee that the CPU interrupt is generated.
That approach seems to work perfectly on VLV/CHV, but apparently
not on g4x.
MSI is documented to be broken on 965gm at least. The chipset spec
says MSI is defeatured because interrupts can be delayed or lost,
which fits well with what we're seeing on g4x. Previously we've
already disabled GMBUS interrupts on g4x because somehow GMBUS
manages to raise legacy interrupts even when MSI is enabled.
Since there's such widespread MSI breakahge all over in the pre-gen5
land let's just give up on MSI on these platforms.
Seqno reporting might be negatively affected by this since the legcy
interrupts aren't guaranteed to be ordered with the seqno writes,
whereas MSI interrupts may be? But an occasioanlly missed seqno
seems like a small price to pay for generally working interrupts.
Cc: stable@vger.kernel.org
Cc: Diego Viola <diego.viola@gmail.com>
Tested-by: Diego Viola <diego.viola@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101261
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170626203051.28480-1-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit e38c2da01f)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The dpy_reg_mmio_read_x functions directly copy 4 bytes data to the
target address with considering the length. If may cause the target
memory corrupted if the requested length less than 4 bytes. Fix it
for safety even we already have some checking to avoid this happen.
And for convince, the 3 functions are merged.
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
When host connects a crt screen, linux guest will detect two
screens: crt and dp. This is wrong as linux guest has only
one dp.
In order to avoid guest get host crt screen, we should set
ADPA_CRT_HOTPLUG_MONITOR to none. But MMIO_RO(PCH_ADPA) prevent
from that. So MMIO_DH should be used instead of MMIO_RO.
v2: Clear its staus to none at initialize, so guest don't
get host crt.(Zhangyu)
v3: SKL doesn't have this register, limit it to pre_skl.(xiong)
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
On BDW, when host physical screen and guest virtual screen aren't on
the same DDI port, guest i915 driver prints the following error and
stop running.
[ 6.775873] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000068
[ 6.775928] IP: intel_ddi_clock_get+0x81/0x430 [i915]
[ 6.776206] Call Trace:
[ 6.776233] ? vgpu_read32+0x4f/0x100 [i915]
[ 6.776264] intel_ddi_get_config+0x11c/0x230 [i915]
[ 6.776298] intel_modeset_setup_hw_state+0x313/0xd40 [i915]
[ 6.776334] intel_modeset_init+0xe49/0x18d0 [i915]
[ 6.776368] ? vgpu_write32+0x53/0x100 [i915]
[ 6.776731] ? intel_i2c_reset+0x42/0x50 [i915]
[ 6.777085] ? intel_setup_gmbus+0x32a/0x350 [i915]
[ 6.777427] i915_driver_load+0xabc/0x14d0 [i915]
[ 6.777768] i915_pci_probe+0x4f/0x70 [i915]
The null pointer is guest intel_crtc_state->shared_dpll which is
setted in haswell_get_ddi_pll(). When guest and host screen are
on different DDI port, host driver won't set PORT_CLK_SET(guest_port),
so haswell_get_ddi_pll() will return null and don't set
pipe_config->shared_dpll, once the following program refernce this
structure, it will print the above error.
This patch set the initial val of guest PORT_CLK_SEL(guest_port) to
LCPLL_810. And guest i915 driver will reset this value according to
guest screen mode.
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
commit 2889caa923 ("drm/i915: Eliminate lots of iterations over the
execobjects array") jiggled around the error handling and replace a test
that we cleaned up properly after ourselves with an assertion. That
assertion failed because in the release function (moments after the
assertion) we were indeed forgetting to mark the vma as cleared. The
consequence was when testing an invalid relocation address, we would try
to release the vma twice (following the couple of attempts to verify the
address) and on the second release notice that the first release was
incomplete.
Testcase: igt/gem_reloc_overflow/invalid-address
Fixes: 2889caa923 ("drm/i915: Eliminate lots of iterations over the execobjects array")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170622104722.2583-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
(cherry picked from commit 51d05e1b29)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Paulo noticed that we were missing few bits clear
before writing values back to the register on
these RMW MMIO operations.
v2: Remove "POST_" from CURSOR_COEFF_MASK. (Paulo).
v3: Remove unnecessary braces. (Jani).
Fixes: cf54ca8bc5 ("drm/i915/cnl: Implement voltage swing sequence.")
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Manasi Navare <manasi.d.navare@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497897572-22520-1-git-send-email-rodrigo.vivi@intel.com
(cherry picked from commit 1f588aeb60)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There are two kinds of locking sequence.
One is in the thread which is started by vfio ioctl to do
the iommu unmapping. The locking sequence is:
down_read(&group_lock) ----> mutex_lock(&cached_lock)
The other is in the vfio release thread which will unpin all
the cached pages. The lock sequence is:
mutex_lock(&cached_lock) ---> down_read(&group_lock)
And, the cache_lock is used to protect the rb tree of the cache
node and doing vfio unpin doesn't require this lock. Move the
vfio unpin out of the cache_lock protected region.
v2:
- use for style instead of do{}while(1). (Zhenyu)
Fixes: f30437c5e7 ("drm/i915/gvt: add KVMGT support")
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Final pile of features for 4.13
New uabi:
- batch bo in first slot, for faster execbuf assembly in userspace
(Chris Wilson)
- (sub)slice getparam, needed for mesa perf support (Robert Bragg)
First pile of patches for cnl/cfl support, maintained by Rodrigo but
with lots of contributions from others. Still incomplete since public
review still ongoing.
Features/refactoring:
- Make execbuf faster (Chris Wilson), a pile of series to make execbuf
buffer handling have fewer passes, use less list walking, postpone
more work to async workers and shuffle buffers less, all to make the
common case much faster (in some cases at least).
- cold boot support for glk dsi (Madhav Chauhan)
- Clean up pipe A quirk and related old platform hacks (Ville)
- perf sampling support for kbl/glk (Lionel)
- perf cleanups (Robert Bragg)
- wire atomic state to backlight code, to avoid pipe lookup hacks
(Maarten)
- reduce request waiting latency/overhead to remove the spinning and
associated cpu cycle wasting (Chris)
- fix 90/270 rotation wm computation (Ville)
- new ddb allocation algo for skl (Kumar Mahesh)
- fix regression due to system suspend optimiazatino (Imre)
- the usual pile of small cleanups and refactors all over
GVT updates contained in this tag:
- optimization for per-VM mmio save/restore (Changbin)
- optimization for mmio hash table (Changbin)
- scheduler optimization with event (Ping)
- vGPU reset refinement (Fred)
- other misc refactor and cleanups, etc.
* tag 'drm-intel-next-2017-06-19' of git://anongit.freedesktop.org/git/drm-intel: (170 commits)
drm/i915: Update DRIVER_DATE to 20170619
drm/i915/cfl: Introduce Coffee Lake workarounds.
drm/i915: Store 9 bits of PCI Device ID for platforms with a LP PCH
drm/i915: Stash a pointer to the obj's resv in the vma
drm/i915: Async GPU relocation processing
drm/i915: Allow execbuffer to use the first object as the batch
drm/i915: Wait upon userptr get-user-pages within execbuffer
drm/i915: First try the previous execbuffer location
drm/i915: Store a persistent reference for an object in the execbuffer cache
drm/i915: Eliminate lots of iterations over the execobjects array
drm/i915: Disable EXEC_OBJECT_ASYNC when doing relocations
drm/i915: Pass vma to relocate entry
drm/i915: Store a direct lookup from object handle to vma
drm/i915: Fix retrieval of hangcheck stats
drm/i915: Store i915_gem_object_is_coherent() as a bit next to cache-dirty
drm/i915: Mark CPU cache as dirty on every transition for CPU writes
drm/i915: Make i915_vma_destroy() static
drm/i915: Actually attach the tv_format property to the SDVO connector
Revert "drm/i915/skl: New ddb allocation algorithm"
drm/i915/glk: Add cold boot sequence for GLK DSI
...
This time around, the biggest thing is a bunch of GEM rework for more
fine grained locking and prep work to handle multiple address spaces
(ie. per-process pagetables). Also some HDMI fixes for 8x96
(snapdragon 820).
One unrelated bus patch, for something that seems to get merged
through whatever random tree (and has all the right ack's).
* tag 'drm-msm-next-2017-06-20' of git://people.freedesktop.org/~robclark/linux:
drm/msm: Fix potential buffer overflow issue
bus: SIMPLE_PM_BUS does not depend on ARCH_RENESAS
drm/msm: Separate locking of buffer resources from struct_mutex
drm/msm/hdmi: Fix HDMI pink strip issue seen on 8x96
drm/msm/hdmi: 8996 PLL: Populate unprepare
drm/msm/hdmi: Use bitwise operators when building register values
drm/msm: update generated headers
drm/msm: remove address-space id
drm/msm: support for an arbitrary number of address spaces
drm/msm: refactor how we handle vram carveout buffers
drm/msm: pass address-space to _get_iova() and friends
drm/msm/mdp4+5: move aspace/id to base class
drm/msm/mdp5: kill pipe_lock
drm/msm: fix locking inconsistency for gpu->hw_init()
drm/msm: Remove memptrs->wptr
drm/msm: Add a struct to pass configuration to msm_gpu_init()
drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
drm/msm: Remove idle function hook
drm/msm: Remove DRM_MSM_NUM_IOCTLS
drm/msm: gpu: Enable zap shader for A5XX
A few more things for 4.13:
- Semaphore support using sync objects
- Drop fb location programming
- Optimize bo list ioctl
* 'drm-next-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: Optimize mutex usage (v4)
drm/amdgpu: Optimization of AMDGPU_BO_LIST_OP_CREATE (v2)
amdgpu: use drm sync objects for shared semaphores (v6)
amdgpu/cs: split out fence dependency checking (v2)
drm/amdgpu: don't check the default value for vm size
Here are the Mali DP driver changes. They include the mali-dp specific
changes from Jose Abreu on crtc->mode_valid() as well as a couple of
patches for fixing the sharing of IRQ lines and use of DRM CMA helper
for framebuffer physical address calculation. Please pull!
* 'for-upstream/mali-dp' of git://linux-arm.org/linux-ld:
drm/arm: mali-dp: Use CMA helper for plane buffer address calculation
drm/mali-dp: Check PM status when sharing interrupt lines
drm/arm: malidp: Use crtc->mode_valid() callback
- HDMI stereoscopic support
- Rework of display code to properly support SOR pad macro routing on
>=GM20x GPUs
- Various other fixes/improvements.
* 'linux-4.13' of git://github.com/skeggsb/linux: (73 commits)
drm/nouveau/disp/nv50-: avoid creating ORs that aren't present on HW
drm/nouveau: use proper prototype in nouveau_pmops_runtime() definition
drm/nouveau: Skip vga_fini on non-PCI device
drm/nouveau/tegra: Don't leave GPU in reset
drm/nouveau/tegra: Skip manual unpowergating when not necessary
drm/nouveau/hwmon: Change permissions to numeric
drm/nouveau/hwmon: expose the auto_point and pwm_min/max attrs
drm/nouveau/hwmon: Remove old code, add .write/.read operations
drm/nouveau/hwmon: Add nouveau_hwmon_ops structure with .is_visible/.read_string
drm/nouveau/hwmon: Add config for all sensors and their settings
drm/nouveau/disp/gm200-: allow non-identity mapping of SOR <-> macro links
drm/nouveau/disp/nv50-: implement a common supervisor 3.0
drm/nouveau/disp/nv50-: implement a common supervisor 2.2
drm/nouveau/disp/nv50-: implement a common supervisor 2.1
drm/nouveau/disp/nv50-: implement a common supervisor 2.0
drm/nouveau/disp/nv50-: implement a common supervisor 1.0
drm/nouveau/disp/nv50-gt21x: remove workaround for dp->tmds hotplug issues
drm/nouveau/disp/dp: use new devinit script interpreter entry-point
drm/nouveau/disp/dp: determine link bandwidth requirements from head state
drm/nouveau/disp: introduce acquire/release display path methods
...
This starts off with the addition of more documentation for the host1x
and DRM drivers and finishes with a slew of fixes and enhancements for
the staging IOCTLs as a result of the awesome work done by Dmitry and
Erik on the grate reverse-engineering effort.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAllDjYUTHHRyZWRpbmdA
bnZpZGlhLmNvbQAKCRDdI6zXfz6zoS2hD/90X7glXgD2PReNqaopGI6o9f5Zdhqv
YULoVoMUAkDRESxPGtGSwLsNXXFBCxshYHT79bygoEabk0xccV7CWMxgenZ56S3s
JbkwdFoFeJyRVOPhcLgfHk3vjhf4nFFoTtny4ahe43JJZjSC7i+mY9b9VhrCAOg5
FhhexSHwLqRIxe/jvIYarypBFVk38iFa4GUrvkYO1fDbi+zJyOA3Od6OwbEWJ+HZ
DyVF3xJB+He5uZ7zn+Q465QKtIIyUstpS2aZYAmkJG054USKZ9RczeppsNUkmyIC
LnoGnIpw/PvKjAWvchdybjDUX2dv4/oZs2JPa3pDIgyXeyTFAu9K2i/ScW8D1WHu
Hl1dL0vVNSvwsuqCPrqZQioN3aLefazp9iccjd80Lrg47x9wHgijzyTAiN3Heswn
CY7/uuDmoXPTci1h4sti8XfpPnkWPuwgY23J/XCNJFZjDKZiiKg5sWDV0DLCnIQi
l4BemypsQOO+ye4vt72YJo2TQKJUM212TzC6KbimWPorJANr05L/fXlRDAF8RZ/c
nXdGoSGVL457M3PZJQWlwM+pKGqu1Uec/p6JYBQ9m2Nt4I7Oi9NpbEPdoNFEf6Tg
c8oqQiw3d4jp8WOfWsucttgvqsFhr13dBtPIVpTfpudQ6cit1pl6hxlOrFiL1gmX
xNxekgTrdNuwBg==
=zz/Z
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Changes for v4.13-rc1
This starts off with the addition of more documentation for the host1x
and DRM drivers and finishes with a slew of fixes and enhancements for
the staging IOCTLs as a result of the awesome work done by Dmitry and
Erik on the grate reverse-engineering effort.
* tag 'drm/tegra/for-4.13-rc1' of git://anongit.freedesktop.org/tegra/linux:
gpu: host1x: At first try a non-blocking allocation for the gather copy
gpu: host1x: Refactor channel allocation code
gpu: host1x: Remove unused host1x_cdma_stop() definition
gpu: host1x: Remove unused 'struct host1x_cmdbuf'
gpu: host1x: Check waits in the firewall
gpu: host1x: Correct swapped arguments in the is_addr_reg() definition
gpu: host1x: Forbid unrelated SETCLASS opcode in the firewall
gpu: host1x: Forbid RESTART opcode in the firewall
gpu: host1x: Forbid relocation address shifting in the firewall
gpu: host1x: Do not leak BO's phys address to userspace
gpu: host1x: Correct host1x_job_pin() error handling
gpu: host1x: Initialize firewall class to the job's one
drm/tegra: dc: Disable plane if it is invisible
drm/tegra: dc: Apply clipping to the plane
drm/tegra: dc: Avoid reset asserts on Tegra20
drm/tegra: Check syncpoint ID in the 'submit' IOCTL
drm/tegra: Correct copying of waitchecks and disable them in the 'submit' IOCTL
drm/tegra: Check for malformed offsets and sizes in the 'submit' IOCTL
drm/tegra: Add driver documentation
gpu: host1x: Flesh out kerneldoc
In function submit_create, if nr_cmds or nr_bos is assigned with
negative value, the allocated buffer may be small than intended.
Using this buffer will lead to buffer overflow issue.
Signed-off-by: Kasin Li <donglil@codeaurora.org>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
In original function amdgpu_bo_list_get, the waiting
for result->lock can be quite long while mutex
bo_list_lock was holding. It can make other tasks
waiting for bo_list_lock for long period.
Secondly, this patch allows several tasks(readers of idr)
to proceed at the same time.
v2: use rcu and kref (Dave Airlie and Christian König)
v3: update v1 commit message (Michel Dänzer)
v4: rebase on upstream (Alex Deucher)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
v2: Remove duplication of zeroing of bo list (Christian König)
Move idr_alloc function to end of ioctl (Christian König)
Call kfree bo_list when amdgpu_bo_list_set return error.
Combine the previous two patches into this patch.
Add amdgpu_bo_list_set function prototype.
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.
Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Coffee Lake inherit most of Kabylake production
workarounds.
v2: Fix typo on commit message and remove
WaDisableKillLogic and GEN9_DISABLE_OCL_OOB_SUPPRESS_LOGIC,
since as Mika pointed out they shouldn't be here for cfl
according to BSpec.
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497653398-15722-1-git-send-email-rodrigo.vivi@intel.com
This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.
Sync objects are managed via the drm syncobj ioctls.
The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.
This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.
In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.
NOTE: this interface addition needs a version bump to expose
it to userspace.
TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)
v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis
v6: lookup post deps earlier, and just replace fences
in post deps stage (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This just splits out the fence depenency checking into it's
own function to make it easier to add semaphore dependencies.
v2: rebase onto other changes.
v1-Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoids printing spurious messages like this:
[ 3.102059] amdgpu 0000:01:00.0: VM size (-1) must be a power of 2
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Although we use 9 bits of Device ID for identifying PCH, only 8 bits are
stored in dev_priv->pch_id. This makes HAS_PCH_CNP_LP() and
HAS_PCH_SPT_LP() incorrect. Fix this by storing all the 9 bits for the
platforms with LP PCH.
v2: Drop PCH_LPT_LP change (Imre)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Fixes: commit ec7e0bb35f ("drm/i915/cnp: Add PCI ID for Cannonpoint LP PCH")
Reported-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1497641774-29104-1-git-send-email-dhinakaran.pandiyan@intel.com
During execbuf, a mandatory step is that we add this request (this
fence) to each object's reservation_object. Inside execbuf, we track the
vma, and to add the fence to the reservation_object then means having to
first chase the obj, incurring another cache miss. We can reduce the
number of cache misses by stashing a pointer to the reservation_object
in the vma itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616140525.6394-1-chris@chris-wilson.co.uk
If the user requires patching of their batch or auxiliary buffers, we
currently make the alterations on the cpu. If they are active on the GPU
at the time, we wait under the struct_mutex for them to finish executing
before we rewrite the contents. This happens if shared relocation trees
are used between different contexts with separate address space (and the
buffers then have different addresses in each), the 3D state will need
to be adjusted between execution on each context. However, we don't need
to use the CPU to do the relocation patching, as we could queue commands
to the GPU to perform it and use fences to serialise the operation with
the current activity and future - so the operation on the GPU appears
just as atomic as performing it immediately. Performing the relocation
rewrites on the GPU is not free, in terms of pure throughput, the number
of relocations/s is about halved - but more importantly so is the time
under the struct_mutex.
v2: Break out the request/batch allocation for clearer error flow.
v3: A few asserts to ensure rq ordering is maintained
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Currently, the last object in the execlist is the always the batch.
However, when building the batch buffer we often know the batch object
first and if we can use the first slot in the execlist we can emit
relocation instructions relative to it immediately and avoid a separate
pass to adjust the relocations to point to the last execlist slot.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
This simply hides the EAGAIN caused by userptr when userspace causes
resource contention. However, it is quite beneficial with highly
contended userptr users as we avoid repeating the setup costs and
kernel-user context switches.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
When choosing a slot for an execbuffer, we ideally want to use the same
address as last time (so that we don't have to rebind it) and the same
address as expected by the user (so that we don't have to fixup any
relocations pointing to it). If we first try to bind the incoming
execbuffer->offset from the user, or the currently bound offset that
should hopefully achieve the goal of avoiding the rebind cost and the
relocation penalty. However, if the object is not currently bound there
we don't want to arbitrarily unbind an object in our chosen position and
so choose to rebind/relocate the incoming object instead. After we
report the new position back to the user, on the next pass the
relocations should have settled down.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com>
If we take a reference to the object/vma when it is first used in an
execbuf, we can keep that reference until the object's file-local handle
is closed. Thereby saving a frequent ref/unref pair.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
The major scaling bottleneck in execbuffer is the processing of the
execobjects. Creating an auxiliary list is inefficient when compared to
using the execobject array we already have allocated.
Reservation is then split into phases. As we lookup up the VMA, we
try and bind it back into active location. Only if that fails, do we add
it to the unbound list for phase 2. In phase 2, we try and add all those
objects that could not fit into their previous location, with fallback
to retrying all objects and evicting the VM in case of severe
fragmentation. (This is the same as before, except that phase 1 is now
done inline with looking up the VMA to avoid an iteration over the
execobject array. In the ideal case, we eliminate the separate reservation
phase). During the reservation phase, we only evict from the VM between
passes (rather than currently as we try to fit every new VMA). In
testing with Unreal Engine's Atlantis demo which stresses the eviction
logic on gen7 class hardware, this speed up the framerate by a factor of
2.
The second loop amalgamation is between move_to_gpu and move_to_active.
As we always submit the request, even if incomplete, we can use the
current request to track active VMA as we perform the flushes and
synchronisation required.
The next big advancement is to avoid copying back to the user any
execobjects and relocations that are not changed.
v2: Add a Theory of Operation spiel.
v3: Fall back to slow relocations in preparation for flushing userptrs.
v4: Document struct members, factor out eb_validate_vma(), add a few
more comments to explain some magic and hide other magic behind macros.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
If we write a relocation into the buffer, we require our own implicit
synchronisation added after the start of the execbuf, outside of the
user's control. As we may end up clflushing, or doing the patch itself
on the GPU, asynchronously we need to look at the implicit serialisation
on obj->resv and hence need to disable EXEC_OBJECT_ASYNC for this
object.
If the user does trigger a stall for relocations, we make sure the stall
is complete enough so that the batch is not submitted before we complete
those relocations.
Fixes: 77ae995789 ("drm/i915: Enable userspace to opt-out of implicit fencing")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>