linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 08:00:16 +07:00

Author	SHA1	Message	Date
Jani Nikula	642525d871	MAINTAINERS: drop dri-devel list for i915 In practice, none of the i915 developers Cc dri-devel for strictly i915 specific patches. Make MAINTAINERS reflect reality, and reduce random i915 specific noise on dri-devel. Also, we have a fairly large crowd reading and responding on intel-gfx, and we're pretty good at involving dri-devel when that is appropriate. Cc: dri-devel@lists.freedesktop.org Acked-by: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477498292-9808-1-git-send-email-jani.nikula@intel.com	2016-10-29 15:45:18 +03:00
Lyude	b64b540931	drm/i915/vlv: Prevent enabling hpd polling in late suspend One of the CI machines began to run into issues with the hpd poller suddenly waking up in the midst of the late suspend phase. It looks like this is getting caused by the fact we now deinitialize power wells in late suspend, which means that intel_hpd_poll_init() gets called in late suspend causing polling to get re-enabled. So, when deinitializing power wells on valleyview we now refrain from enabling polling in the midst of suspend. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98040 Fixes: `19625e85c6` ("drm/i915: Enable polling when we don't have hpd") Signed-off-by: Lyude <lyude@redhat.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Jani Saarinen <jani.saarinen@intel.com> Cc: Petry Latvala <petri.latvala@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477499769-1966-1-git-send-email-lyude@redhat.com	2016-10-29 14:55:43 +03:00
Chris Wilson	80b204bce8	drm/i915: Enable multiple timelines With the infrastructure converted over to tracking multiple timelines in the GEM API whilst preserving the efficiency of using a single execution timeline internally, we can now assign a separate timeline to every context with full-ppgtt. v2: Add a comment to indicate the xfer between timelines upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-35-chris@chris-wilson.co.uk	2016-10-28 20:53:57 +01:00
Chris Wilson	f2d13290e3	drm/i915: Defer setting of global seqno on request to submission Defer the assignment of the global seqno on a request to its submission. In the next patch, we will only allocate the global seqno at that time, here we are just enabling the wait-for-submission before wait-for-seqno paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-34-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	28176ef4cf	drm/i915: Reserve space in the global seqno during request allocation A restriction on our global seqno is that they cannot wrap, and that we cannot use the value 0. This allows us to detect when a request has not yet been submitted, its global seqno is still 0, and ensures that hardware semaphores are monotonic as required by older hardware. To meet these restrictions when we defer the assignment of the global seqno, we must check that we have an available slot in the global seqno space during request construction. If that test fails, we wait for all requests to be completed and reset the hardware back to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-33-chris@chris-wilson.co.uk	2016-10-28 20:53:56 +01:00
Chris Wilson	f6168e3304	drm/i915: Convert breadcrumbs spinlock to be irqsafe The breadcrumbs are about to be used from within IRQ context sections (e.g. nouveau signals a fence from an interrupt handler causing us to submit a new request) and/or from bottom-half tasklets (i.e. intel_lrc_irq_handler), therefore we need to employ the irqsafe spinlock variants. For example, deferring the request submission to the intel_lrc_irq_handler generates this trace: [ 66.388639] ================================= [ 66.388650] [ INFO: inconsistent lock state ] [ 66.388663] 4.9.0-rc2+ #56 Not tainted [ 66.388672] --------------------------------- [ 66.388682] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 66.388695] swapper/1/0 [HC0[0]:SC1[1]:HE0:SE0] takes: [ 66.388706] (&(&b->lock)->rlock){+.?...} , at: [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150 [ 66.388761] {SOFTIRQ-ON-W} state was registered at: [ 66.388772] [ 66.388783] [<ffffffff810bd842>] __lock_acquire+0x682/0x1870 [ 66.388795] [ 66.388803] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0 [ 66.388814] [ 66.388824] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40 [ 66.388835] [ 66.388845] [<ffffffff81401e41>] intel_engine_reset_breadcrumbs+0x21/0xb0 [ 66.388857] [ 66.388866] [<ffffffff81403ae7>] gen8_init_common_ring+0x67/0x100 [ 66.388878] [ 66.388887] [<ffffffff81403b92>] gen8_init_render_ring+0x12/0x60 [ 66.388903] [ 66.388912] [<ffffffff813f8707>] i915_gem_init_hw+0xf7/0x2a0 [ 66.388927] [ 66.388936] [<ffffffff813f899b>] i915_gem_init+0xbb/0xf0 [ 66.388950] [ 66.388959] [<ffffffff813b4980>] i915_driver_load+0x7e0/0x1330 [ 66.388978] [ 66.388988] [<ffffffff813c09d8>] i915_pci_probe+0x28/0x40 [ 66.389003] [ 66.389013] [<ffffffff812fa0db>] pci_device_probe+0x8b/0xf0 [ 66.389028] [ 66.389037] [<ffffffff8147737e>] driver_probe_device+0x21e/0x430 [ 66.389056] [ 66.389065] [<ffffffff8147766e>] __driver_attach+0xde/0xe0 [ 66.389080] [ 66.389090] [<ffffffff814751ad>] bus_for_each_dev+0x5d/0x90 [ 66.389105] [ 66.389113] [<ffffffff81477799>] driver_attach+0x19/0x20 [ 66.389134] [ 66.389144] [<ffffffff81475ced>] bus_add_driver+0x15d/0x260 [ 66.389159] [ 66.389168] [<ffffffff81477e3b>] driver_register+0x5b/0xd0 [ 66.389183] [ 66.389281] [<ffffffff812fa19b>] __pci_register_driver+0x5b/0x60 [ 66.389301] [ 66.389312] [<ffffffff81aed333>] i915_init+0x3e/0x45 [ 66.389326] [ 66.389336] [<ffffffff81ac2ffa>] do_one_initcall+0x8b/0x118 [ 66.389350] [ 66.389359] [<ffffffff81ac323a>] kernel_init_freeable+0x1b3/0x23b [ 66.389378] [ 66.389387] [<ffffffff8160fc39>] kernel_init+0x9/0x100 [ 66.389402] [ 66.389411] [<ffffffff816180e7>] ret_from_fork+0x27/0x40 [ 66.389426] irq event stamp: 315865 [ 66.389438] hardirqs last enabled at (315864): [<ffffffff816178f1>] _raw_spin_unlock_irqrestore+0x31/0x50 [ 66.389469] hardirqs last disabled at (315865): [<ffffffff816176b3>] _raw_spin_lock_irqsave+0x13/0x50 [ 66.389499] softirqs last enabled at (315818): [<ffffffff8107a04c>] _local_bh_enable+0x1c/0x50 [ 66.389530] softirqs last disabled at (315819): [<ffffffff8107a50e>] irq_exit+0xbe/0xd0 [ 66.389559] [ 66.389559] other info that might help us debug this: [ 66.389580] Possible unsafe locking scenario: [ 66.389580] [ 66.389598] CPU0 [ 66.389609] ---- [ 66.389620] lock(&(&b->lock)->rlock); [ 66.389650] <Interrupt> [ 66.389661] lock(&(&b->lock)->rlock); [ 66.389690] [ 66.389690] * DEADLOCK * [ 66.389690] [ 66.389715] 2 locks held by swapper/1/0: [ 66.389728] #0: (&(&tl->lock)->rlock){..-...}, at: [<ffffffff81403e01>] intel_lrc_irq_handler+0x201/0x3c0 [ 66.389785] #1: (&(&req->lock)->rlock/1){..-...}, at: [<ffffffff813fc0af>] __i915_gem_request_submit+0x8f/0x170 [ 66.389854] [ 66.389854] stack backtrace: [ 66.389959] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.9.0-rc2+ #56 [ 66.389976] Hardware name: / , BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015 [ 66.389999] ffff88027fd03c58 ffffffff812beae5 ffff88027696e680 ffffffff822afe20 [ 66.390036] ffff88027fd03ca8 ffffffff810bb420 0000000000000001 0000000000000000 [ 66.390070] 0000000000000000 0000000000000006 0000000000000004 ffff88027696ee10 [ 66.390104] Call Trace: [ 66.390117] <IRQ> [ 66.390128] [<ffffffff812beae5>] dump_stack+0x68/0x93 [ 66.390147] [<ffffffff810bb420>] print_usage_bug+0x1d0/0x1e0 [ 66.390164] [<ffffffff810bb8a0>] mark_lock+0x470/0x4f0 [ 66.390181] [<ffffffff810ba9d0>] ? print_shortest_lock_dependencies+0x1b0/0x1b0 [ 66.390203] [<ffffffff810bd75d>] __lock_acquire+0x59d/0x1870 [ 66.390221] [<ffffffff810bedbc>] lock_acquire+0x6c/0xb0 [ 66.390237] [<ffffffff810bedbc>] ? lock_acquire+0x6c/0xb0 [ 66.390255] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150 [ 66.390273] [<ffffffff8161753a>] _raw_spin_lock+0x2a/0x40 [ 66.390291] [<ffffffff81401c88>] ? intel_engine_enable_signaling+0x78/0x150 [ 66.390309] [<ffffffff81401c88>] intel_engine_enable_signaling+0x78/0x150 [ 66.390327] [<ffffffff813fc170>] __i915_gem_request_submit+0x150/0x170 [ 66.390345] [<ffffffff81403e8b>] intel_lrc_irq_handler+0x28b/0x3c0 [ 66.390363] [<ffffffff81079d97>] tasklet_action+0x57/0xc0 [ 66.390380] [<ffffffff8107a249>] __do_softirq+0x119/0x240 [ 66.390396] [<ffffffff8107a50e>] irq_exit+0xbe/0xd0 [ 66.390414] [<ffffffff8101afd5>] do_IRQ+0x65/0x110 [ 66.390431] [<ffffffff81618806>] common_interrupt+0x86/0x86 [ 66.390446] <EOI> [ 66.390457] [<ffffffff814ec6d1>] ? cpuidle_enter_state+0x151/0x200 [ 66.390480] [<ffffffff814ec7a2>] cpuidle_enter+0x12/0x20 [ 66.390498] [<ffffffff810b639e>] call_cpuidle+0x1e/0x40 [ 66.390516] [<ffffffff810b65ae>] cpu_startup_entry+0x10e/0x1f0 [ 66.390534] [<ffffffff81036133>] start_secondary+0x103/0x130 (This is split out of the defer global seqno allocation patch due to realisation that we need a more complete conversion if we want to defer request submission even further.) v2: lockdep was warning about mixed SOFTIRQ contexts not HARDIRQ contexts so we only need to use spin_lock_bh and not disable interrupts. v3: We need full irq protection as we may be called from a third party interrupt handler (via fences). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-32-chris@chris-wilson.co.uk	2016-10-28 20:53:55 +01:00
Chris Wilson	562f5d4550	drm/i915: Create a unique name for the context This will be used for communicating issues with this context to userspace, so we want to identify the parent process and the individual context. Note that the name isn't quite unique, it makes the presumption of there only being a single device fd per process. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-31-chris@chris-wilson.co.uk	2016-10-28 20:53:55 +01:00
Chris Wilson	85e17f5974	drm/i915: Move the global sync optimisation to the timeline Currently we try to reduce the number of synchronisations (now the number of requests we need to wait upon) by noting that if we have earlier waited upon a request, all subsequent requests in the timeline will be after the wait. This only applies to requests in this timeline, as other timelines will not be ordered by that waiter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-30-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	caddfe7192	drm/i915: Defer breadcrumb emission Move the actual emission of the breadcrumb for closing the request from i915_add_request() to the submit callback. (It can be moved later when required.) This allows us to defer the allocation of the global_seqno from request construction to actual submission, allowing us to emit the requests out of order (wrt to the order of their construction, they still will only be executed one all of their dependencies are resolved including that all earlier requests on their timeline have been submitted.) We have to specialise how we then emit the request in order to write into the preallocated space, rather than at the tail of the ringbuffer (which will have been advanced by the addition of new requests). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-29-chris@chris-wilson.co.uk	2016-10-28 20:53:54 +01:00
Chris Wilson	98f29e8d90	drm/i915: Record space required for breadcrumb emission In the next patch, we will use deferred breadcrumb emission. That requires reserving sufficient space in the ringbuffer to emit the breadcrumb, which first requires us to know how large the breadcrumb is. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-28-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	9b81d556b1	drm/i915: Rename ->emit_request to ->emit_breadcrumb Now that the emission of the request tail and its submission to hardware are two separate steps, engine->emit_request() is confusing. engine->emit_request() is called to emit the breadcrumb commands for the request into the ring, name it such (engine->emit_breadcrumb). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-27-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	65e4760e39	drm/i915: Introduce a global_seqno for each request Though we will have multiple timelines, we still have a single timeline of execution. This we can use to provide an execution and retirement order of requests. This keeps tracking execution of requests simple, and vital for preserving a single waiter (i.e. so that we can order the waiters so that only the earliest to wakeup need be woken). To accomplish this we distinguish the seqno used to order requests per-context (external) and that used internally for execution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-26-chris@chris-wilson.co.uk	2016-10-28 20:53:53 +01:00
Chris Wilson	4680816be3	drm/i915: Wait first for submission, before waiting for request completion In future patches, we will no longer be able to wait on a static global seqno and instead have to break our wait up into phases. First we wait for the global seqno assignment (upon submission to hardware), and once submitted we wait for the hardware to complete. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-25-chris@chris-wilson.co.uk	2016-10-28 20:53:52 +01:00
Chris Wilson	3033acab07	drm/i915: Queue the idling context switch after all other timelines Before suspend, we wait for the switch to the kernel context. In order for all the other context images to be complete upon suspend, that switch must be the last operation by the GPU (i.e. this idling request must not overtake any pending requests). To make this request execute last, we make it depend on every other inflight request. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-24-chris@chris-wilson.co.uk	2016-10-28 20:53:52 +01:00
Chris Wilson	73cb97010d	drm/i915: Combine seqno + tracking into a global timeline struct Our timelines are more than just a seqno. They also provide an ordered list of requests to be executed. Due to the restriction of handling individual address spaces, we are limited to a timeline per address space but we use a fence context per engine within. Our first step to introducing independent timelines per context (i.e. to allow each context to have a queue of requests to execute that have a defined set of dependencies on other requests) is to provide a timeline abstraction for the global execution queue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk	2016-10-28 20:53:51 +01:00
Chris Wilson	c004a90b72	drm/i915: Restore nonblocking awaits for modesetting After combining the dma-buf reservation object and the GEM reservation object, we lost the ability to do a nonblocking wait on the i915 request (as we blocked upon the reservation object during prepare_fb). We can instead convert the reservation object into a fence upon which we can asynchronously wait (including a forced timeout in case the DMA fence is never signaled). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-22-chris@chris-wilson.co.uk	2016-10-28 20:53:51 +01:00
Chris Wilson	d07f0e59b2	drm/i915: Move GEM activity tracking into a common struct reservation_object In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) v2: Reimplement busy-ioctl (by walking the reservation object), postpone the ABI change for another day. Similarly use the reservation object to find the last_write request (if active and from i915) for choosing display CS flips. Caveats: * busy-ioctl: busy-ioctl only reports on the native fences, it will not warn of stalls (in set-domain-ioctl, pread/pwrite etc) if the object is being rendered to by external fences. It also will not report the same busy state as wait-ioctl (or polling on the dma-buf) in the same circumstances. On the plus side, it does retain reporting of which i915 engines are engaged with this object. * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering, see "drm/i915: Restore nonblocking awaits for modesetting" * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 66s), mainly due to atomic operations (maintaining the fence refcounts). * loss of object-level retirement callbacks, emulated by VMA retirement tracking. * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-21-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	f0cd518206	drm/i915: Use lockless object free Having moved the locked phase of freeing an object to a separate worker, we can now declare to the core that we only need the unlocked variant of driver->gem_free_object, and can use the simple unreference internally. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-20-chris@chris-wilson.co.uk	2016-10-28 20:53:50 +01:00
Chris Wilson	fbbd37b36f	drm/i915: Move object release to a freelist + worker We want to hide the latency of releasing objects and their backing storage from the submission, so we move the actual free to a worker. This allows us to switch to struct_mutex freeing of the object in the next patch. Furthermore, if we know that the object we are dereferencing remains valid for the duration of our access, we can forgo the usual synchronisation barriers and atomic reference counting. To ensure this we defer freeing an object til after an RCU grace period, such that any lookup of the object within an RCU read critical section will remain valid until after we exit that critical section. We also employ this delay for rate-limiting the serialisation on reallocation - we have to slow down object creation in order to prevent resource starvation (in particular, files). v2: Return early in i915_gem_tiling() ioctl to skip over superfluous work on error. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-19-chris@chris-wilson.co.uk	2016-10-28 20:53:49 +01:00
Chris Wilson	40e62d5d6b	drm/i915: Acquire the backing storage outside of struct_mutex in set-domain As we can locklessly (well struct_mutex-lessly) acquire the backing storage, do so in set-domain-ioctl to reduce the contention on the struct_mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-18-chris@chris-wilson.co.uk	2016-10-28 20:53:49 +01:00
Chris Wilson	fe115628d5	drm/i915: Implement pwrite without struct-mutex We only need struct_mutex within pwrite for a brief window where we need to serialise with rendering and control our cache domains. Elsewhere we can rely on the backing storage being pinned, and forgive userspace any races against us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-17-chris@chris-wilson.co.uk	2016-10-28 20:53:48 +01:00
Chris Wilson	bb6dc8d96b	drm/i915: Implement pread without struct-mutex We only need struct_mutex within pread for a brief window where we need to serialise with rendering and control our cache domains. Elsewhere we can rely on the backing storage being pinned, and forgive userspace any races against us. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-16-chris@chris-wilson.co.uk	2016-10-28 20:53:48 +01:00
Chris Wilson	7dd737f377	drm/i915/dmabuf: Acquire the backing storage outside of struct_mutex Use the per-object mm.lock to allocate the backing storage (and hold a reference to it across the dmabuf access) without resorting to struct_mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-15-chris@chris-wilson.co.uk	2016-10-28 20:53:48 +01:00
Chris Wilson	1233e2db19	drm/i915: Move object backing storage manipulation to its own locking Break the allocation of the backing storage away from struct_mutex into a per-object lock. This allows parallel page allocation, provided we can do so outside of struct_mutex (i.e. set-domain-ioctl, pwrite, GTT fault), i.e. before execbuf! The increased cost of the atomic counters are hidden behind i915_vma_pin() for the typical case of execbuf, i.e. as the object is typically bound between execbufs, the page_pin_count is static. The cost will be felt around set-domain and pwrite, but offset by the improvement from reduced struct_mutex contention. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-14-chris@chris-wilson.co.uk	2016-10-28 20:53:47 +01:00
Chris Wilson	03ac84f183	drm/i915: Pass around sg_table to get_pages/put_pages backend The plan is to move obj->pages out from under the struct_mutex into its own per-object lock. We need to prune any assumption of the struct_mutex from the get_pages/put_pages backends, and to make it easier we pass around the sg_table to operate on rather than indirectly via the obj. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-13-chris@chris-wilson.co.uk	2016-10-28 20:53:47 +01:00
Chris Wilson	a4f5ea64f0	drm/i915: Refactor object page API The plan is to make obtaining the backing storage for the object avoid struct_mutex (i.e. use its own locking). The first step is to update the API so that normal users only call pin/unpin whilst working on the backing storage. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-12-chris@chris-wilson.co.uk	2016-10-28 20:53:46 +01:00
Chris Wilson	d2a84a76a3	drm/i915: Use radixtree to jump start intel_partial_pages() We can use the radixtree index of the obj->pages to find the start position of the desired partial range. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-11-chris@chris-wilson.co.uk	2016-10-28 20:53:46 +01:00
Chris Wilson	96d7763452	drm/i915: Use a radixtree for random access to the object's backing storage A while ago we switched from a contiguous array of pages into an sglist, for that was both more convenient for mapping to hardware and avoided the requirement for a vmalloc array of pages on every object. However, certain GEM API calls (like pwrite, pread as well as performing relocations) do desire access to individual struct pages. A quick hack was to introduce a cache of the last access such that finding the following page was quick - this works so long as the caller desired sequential access. Walking backwards, or multiple callers, still hits a slow linear search for each page. One solution is to store each successful lookup in a radix tree. v2: Rewrite building the radixtree for clarity, hopefully. v3: Rearrange execbuf to avoid calling i915_gem_object_get_sg() from within an atomic section and so relax the allocation context to a simple GFP_KERNEL and mutex. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-10-chris@chris-wilson.co.uk	2016-10-28 20:53:45 +01:00
Chris Wilson	4c7d62c6b8	drm/i915: Markup GEM API with lockdep asserts Add lockdep_assert_held(struct_mutex) to the API preamble of the internal GEM interfaces. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-9-chris@chris-wilson.co.uk	2016-10-28 20:53:45 +01:00
Chris Wilson	4e50f082ac	drm/i915: Reuse the active golden render state batch The golden render state is constant, but we recreate the batch setting it up for every new context. If we keep that batch in a volatile cache we can safely reuse it whenever we need to initialise a new context. We mark the pages as purgeable and use the shrinker to recover pages from the batch whenever we face memory pressues, recreating that batch afresh on the next new context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtien@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-8-chris@chris-wilson.co.uk	2016-10-28 20:53:44 +01:00
Chris Wilson	920cf41949	drm/i915: Introduce an internal allocator for disposable private objects Quite a few of our objects used for internal hardware programming do not benefit from being swappable or from being zero initialised. As such they do not benefit from using a shmemfs backing storage and since they are internal and never directly exposed to the user, we do not need to worry about providing a filp. For these we can use an drm_i915_gem_object wrapper around a sg_table of plain struct page. They are not swap backed and not automatically pinned. If they are reaped by the shrinker, the pages are released and the contents discarded. For the internal use case, this is fine as for example, ringbuffers are pinned from being written by a request to be read by the hardware. Once they are idle, they can be discarded entirely. As such they are a good match for execlist ringbuffers and a small variety of other internal objects. In the first iteration, this is limited to the scratch batch buffers we use (for command parsing and state initialisation). v2: Allocate physically contiguous pages, where possible. v3: Reduce maximum order on subsequent requests following an allocation failure. v4: Fix up mismatch between swiotlb segment size and page count (it counts in 2k units, not 4k pages) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-7-chris@chris-wilson.co.uk	2016-10-28 20:53:44 +01:00
Chris Wilson	e95433c73a	drm/i915: Rearrange i915_wait_request() accounting with callers Our low-level wait routine has evolved from our generic wait interface that handled unlocked, RPS boosting, waits with time tracking. If we push our GEM fence tracking to use reservation_objects (required for handling multiple timelines), we lose the ability to pass the required information down to i915_wait_request(). However, if we push the extra functionality from i915_wait_request() to the individual callsites (i915_gem_object_wait_rendering and i915_gem_wait_ioctl) that make use of those extras, we can both simplify our low level wait and prepare for extending the GEM interface for use of reservation_objects. v2: Rewrite i915_wait_request() kerneldocs Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-4-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	f8a7fde456	drm/i915: Defer active reference until required We only need the active reference to keep the object alive after the handle has been deleted (so as to prevent a synchronous gem_close). Why then pay the price of a kref on every execbuf when we can insert that final active ref just in time for the handle deletion? Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-6-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	2e36991a8a	drm/i915: Remove unused i915_gem_active_wait() in favour of _unlocked() Since we only use the more generic unlocked variant, just rename it as the normal i915_gem_active_wait(). The temporary cost is that we need to always acquire the reference in a RCU safe manner, but the benefit is that we will combine the common paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-5-chris@chris-wilson.co.uk	2016-10-28 20:53:43 +01:00
Chris Wilson	c92ac094a9	drm/i915: Remove superfluous wait_for_error() from throttle-ioctl The throttle-ioctl never touches the struct_mutex. It does, however, as part of its ABI report whether the hardware is terminally wedged. For that purposes, it only has to report the current state and not incur the cost of checking/waiting every invocation, as we do not have to wait for a reset before waiting on a request to ensure completion (that is baked into the wait request implementation). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-3-chris@chris-wilson.co.uk	2016-10-28 20:53:42 +01:00
Chris Wilson	7e941861c9	drm/i915: Allow i915_sw_fence_await_sw_fence() to allocate In forthcoming patches, we want to be able to dynamically allocate the wait_queue_t used whilst awaiting. This is more convenient if we extend the i915_sw_fence_await_sw_fence() to perform the allocation for us if we pass in a gfp mask as an alternative than a preallocated struct. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-2-chris@chris-wilson.co.uk	2016-10-28 20:53:42 +01:00
Chris Wilson	b52992c06c	drm/i915: Support asynchronous waits on struct fence from i915_gem_request We will need to wait on DMA completion (as signaled via struct fence) before executing our i915_gem_request. Therefore we want to expose a method for adding the await on the fence itself to the request. v2: Add a comment detailing a failure to handle a signal-on-any fence-array. v3: Pretend that magic numbers don't exist. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-1-chris@chris-wilson.co.uk	2016-10-28 20:53:41 +01:00
Chris Wilson	fc0990903c	drm/i915: Remove insert-page shortcut from execbuf relocate_iomap() We are not allowed to touch the GTT entries underneath an atomic section, as they take a rpm wakelock (which is illegal from atomic context) and in the near future acquiring the DMA address for a page within an object may sleep for an allocation. This makes the current shortcircuit in relocation_iomap() for performing a second relocation on an adjacent page illegal, and we need to release the atomic iomapping, lookup the DMA, insert it into the GTT before reentering the atomic iomap section. As it happens, this is precisely what we do on if we are using an iomapping over the full object and not just a single page and by removing the shortcut, we do the right thing. Fixes: `9c870d0367` ("drm/i915: Use RPM as the barrier for controlling...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/20161028142756.3850-1-chris@chris-wilson.co.uk Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>	2016-10-28 20:53:31 +01:00
Matt Roper	2c4b49a0f7	drm/i915: Use macro in place of open-coded for_each_universal_plane loop This was the only use of (misleadingly-named) intel_num_planes() function, so we can remove it as well. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477522291-10874-3-git-send-email-matthew.d.roper@intel.com Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2016-10-28 11:27:19 -07:00
Matt Roper	8b364b41ce	drm/i915: Rename for_each_plane -> for_each_universal_plane This macro's name is a bit misleading; it doesn't actually iterate over all planes since it omits the cursor plane. Its only uses are in gen9 code which is using it to iterate over the universal planes (which we treat as primary+sprites); in these cases the legacy cursor registers are programmed independently if necessary. The macro's iterator value (0 for primary plane, spritenum+1 for each secondary plane) also isn't meaningful outside the gen9 context where the hardware considers them to all be "universal" planes that follow this numbering. This is just a renaming/clarification patch with no functional change. However it will make the subsequent patches more clear. Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477522291-10874-2-git-send-email-matthew.d.roper@intel.com Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>	2016-10-28 11:26:36 -07:00
Navare, Manasi D	40dba34112	drm/i915: Change the placement of some static functions in intel_dp.c These static helper functions are required to be used during fallback link rate implemnetation so they need to be placed at the top of the file. v3: * Add cleanup to other patch (Mika Kahola) v2: * Dont move around functions declared in intel_drv.h (Rodrigo Vivi) Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1477524358-16563-4-git-send-email-manasi.d.navare@intel.com	2016-10-28 15:06:31 +03:00
Ido Yariv	bd768e1466	KVM: x86: fix wbinvd_dirty_mask use-after-free vcpu->arch.wbinvd_dirty_mask may still be used after freeing it, corrupting memory. For example, the following call trace may set a bit in an already freed cpu mask: kvm_arch_vcpu_load vcpu_load vmx_free_vcpu_nested vmx_free_vcpu kvm_arch_vcpu_free Fix this by deferring freeing of wbinvd_dirty_mask. Cc: stable@vger.kernel.org Signed-off-by: Ido Yariv <ido@wizery.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2016-10-28 11:35:21 +02:00
Ander Conselvan de Oliveira	ed37892e6d	drm/i915: Address broxton phy registers based on phy and channel number The port registers related to the phys in broxton map to different channels and specific phys. Make that mapping explicit. v2: Pass enum dpio_phy to macros instead of mmio base. (Imre) v3: Fix typo in macros. (Imre) v4: Also change variables from u32 to enum dpio_phy. (Imre) Remove leftovers from previous version. (Imre) v5: Actually git add the changes. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1476863940-6019-1-git-send-email-ander.conselvan.de.oliveira@intel.com	2016-10-28 12:25:24 +03:00
Ander Conselvan de Oliveira	e7583f7b10	drm/i915: Add location of the Rcomp resistor to bxt_ddi_phy_info Use struct bxt_ddi_phy_info to hold information of where the Rcomp resistor is located, instead of hard coding it in the init sequence. Note that this moves the enabling of the phy with the Rcomp resistor out of the power well enable code. That should be safe since bxt_ddi_phy_init() is called while the power domains lock is held, and that is the only way that function gets called, so there is no possibility of a concurrent phy enable caused by a power domain get call. v2: Replace comment about lock with lockdep_assert_held() (Imre) Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/62d209950ad48484564f3e793cf247cf62572a39.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:25:03 +03:00
Ander Conselvan de Oliveira	842d416654	drm/i915: Create a struct to hold information about the broxton phys Information about which phy is dual channel is hardcoded in the phy init sequence. Split that to a separate struct so the init sequence is more generic. v2: Restore mangled part that ended up in following patch. (Imre) Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/9102f4c984044126057e4fdd1b91a615ff25fae6.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:24:51 +03:00
Ander Conselvan de Oliveira	b6e08203cc	drm/i915: Move broxton vswing sequence to intel_dpio_phy.c The vswing sequence is related to the DPIO phy, so move it closer to the rest of DPIO phy related code. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/59aa5c85a115c5cbed81e793f20cd7b9f8de694b.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:24:45 +03:00
Ander Conselvan de Oliveira	f38861b814	drm/i915: Move DPIO phy documentation section to intel_dpio_phy.c Move the DPIO phy documentation section to intel_dpio_phy.c, since that is a more suitable place now that there is a source file dedicated for those phys. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/55a2d38c15c06a8c5bce498b28decc03948f0224.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:24:37 +03:00
Ander Conselvan de Oliveira	47a6bc61b8	drm/i915: Move broxton phy code to intel_dpio_phy.c The phy in broxton is also a dpio phy, similar to cherryview but with programming through MMIO. So move the code together with the other similar phys. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/d611de6d256593cf904172db7ff27f164480c228.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:24:01 +03:00
Ander Conselvan de Oliveira	b284eedaf7	drm/i915: Pass lane count to bxt_ddi_phy_calc_lane_optmin_mask() Pass lane count to bxt_ddi_phy_calc_lane_optmin_mask() instead of having it extract that number from a pipe_config to decouple the phy code from intel_crtc_state. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/a4977e0207e594953c4f9d1b5f2ef972a8679e74.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:23:53 +03:00
Ander Conselvan de Oliveira	362624c9ba	drm/i915: Explicitly map broxton DPIO power wells to phys The mapping from the BXT_DPIO_CMN_* power wells to their respective phys required a detour implemented in the bxt_power_well_to_phy() function. Instead, embed that information directly into the power_well struct, by resurrecting the data field. Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/7fe97582fa08c7340ce6a3b6b0ea3e72a73182d7.1475770848.git-series.ander.conselvan.de.oliveira@intel.com	2016-10-28 12:23:45 +03:00

1 2 3 4 5 ...

634202 Commits