linux_dsm_epyc7002/drivers/gpu/drm/i915/Kconfig.profile

config DRM_I915_USERFAULT_AUTOSUSPEND
	int "Runtime autosuspend delay for userspace GGTT mmaps (ms)"
	default 250 # milliseconds
	help
	  On runtime suspend, as we suspend the device, we have to revoke
	  userspace GGTT mmaps and force userspace to take a pagefault on
	  their next access. The revocation and subsequent recreation of
	  the GGTT mmap can be very slow and so we impose a small hysteris
	  that complements the runtime-pm autosuspend and provides a lower
	  floor on the autosuspend delay.

	  May be 0 to disable the extra delay and solely use the device level
	  runtime pm autosuspend delay tunable.

config DRM_I915_PREEMPT_TIMEOUT
	int "Preempt timeout (ms, jiffy granularity)"
	default 100 # milliseconds
	help
	  How long to wait (in milliseconds) for a preemption event to occur
	  when submitting a new context via execlists. If the current context
	  does not hit an arbitration point and yield to HW before the timer
	  expires, the HW will be reset to allow the more important context
	  to execute.

	  May be 0 to disable the timeout.

config DRM_I915_SPIN_REQUEST
	int "Busywait for request completion (us)"
	default 5 # microseconds
	help
	  Before sleeping waiting for a request (GPU operation) to complete,
	  we may spend some time polling for its completion. As the IRQ may
	  take a non-negligible time to setup, we do a short spin first to
	  check if the request will complete in the time it would have taken
	  us to enable the interrupt.

	  May be 0 to disable the initial spin. In practice, we estimate
	  the cost of enabling the interrupt (if currently disabled) to be
	  a few microseconds.

config DRM_I915_STOP_TIMEOUT
	int "How long to wait for an engine to quiesce gracefully before reset (ms)"
	default 100 # milliseconds
	help
	  By stopping submission and sleeping for a short time before resetting
	  the GPU, we allow the innocent contexts also on the system to quiesce.
	  It is then less likely for a hanging context to cause collateral
	  damage as the system is reset in order to recover. The corollary is
	  that the reset itself may take longer and so be more disruptive to
	  interactive or low latency workloads.
drm/i915: Keep user GGTT alive for a minimum of 250ms Do not allow runtime pm autosuspend to remove userspace GGTT mmaps too quickly. For example, igt sets the autosuspend delay to 0, and so we immediately attempt to perform runtime suspend upon releasing the wakeref. Unfortunately, that involves tearing down GGTT mmaps as they require an active device. Override the autosuspend for GGTT mmaps, by keeping the wakeref around for 250ms after populating the PTE for a fresh mmap. v2: Prefer refcount_t for its under/overflow error detection v3: Flush the user runtime autosuspend prior to system system. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190527115114.13448-1-chris@chris-wilson.co.uk 2019-05-27 18:51:14 +07:00			`config DRM_I915_USERFAULT_AUTOSUSPEND`
			`int "Runtime autosuspend delay for userspace GGTT mmaps (ms)"`
			`default 250 # milliseconds`
			`help`
			`On runtime suspend, as we suspend the device, we have to revoke`
			`userspace GGTT mmaps and force userspace to take a pagefault on`
			`their next access. The revocation and subsequent recreation of`
			`the GGTT mmap can be very slow and so we impose a small hysteris`
			`that complements the runtime-pm autosuspend and provides a lower`
			`floor on the autosuspend delay.`

			`May be 0 to disable the extra delay and solely use the device level`
			`runtime pm autosuspend delay tunable.`

drm/i915/execlists: Force preemption If the preempted context takes too long to relinquish control, e.g. it is stuck inside a shader with arbitration disabled, evict that context with an engine reset. This ensures that preemptions are reasonably responsive, providing a tighter QoS for the more important context at the cost of flagging unresponsive contexts more frequently (i.e. instead of using an ~10s hangcheck, we now evict at ~100ms). The challenge of lies in picking a timeout that can be reasonably serviced by HW for typical workloads, balancing the existing clients against the needs for responsiveness. Note that coupled with timeslicing, this will lead to rapid GPU "hang" detection with multiple active contexts vying for GPU time. The forced preemption mechanism can be compiled out with ./scripts/config --set-val DRM_I915_PREEMPT_TIMEOUT 0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-2-chris@chris-wilson.co.uk 2019-10-23 20:31:05 +07:00			`config DRM_I915_PREEMPT_TIMEOUT`
			`int "Preempt timeout (ms, jiffy granularity)"`
			`default 100 # milliseconds`
			`help`
			`How long to wait (in milliseconds) for a preemption event to occur`
			`when submitting a new context via execlists. If the current context`
			`does not hit an arbitration point and yield to HW before the timer`
			`expires, the HW will be reset to allow the more important context`
			`to execute.`

			`May be 0 to disable the timeout.`

drm/i915: Expose the busyspin durations for i915_wait_request An interesting discussion regarding "hybrid interrupt polling" for NVMe came to the conclusion that the ideal busyspin before sleeping was half of the expected request latency (and better if it was already halfway through that request). This suggested that we too should look again at our tradeoff between spinning and waiting. Currently, our spin simply tries to hide the cost of enabling the interrupt, which is good to avoid penalising nop requests (i.e. test throughput) and not much else. Studying real world workloads suggests that a spin of upto 500us can dramatically boost performance, but the suggestion is that this is not from avoiding interrupt latency per-se, but from secondary effects of sleeping such as allowing the CPU reduce cstate and context switch away. In a truly hybrid interrupt polling scheme, we would aim to sleep until just before the request completed and then wake up in advance of the interrupt and do a quick poll to handle completion. This is tricky for ourselves at the moment as we are not recording request times, and since we allow preemption, our requests are not on as a nicely ordered timeline as IO. However, the idea is interesting, for it will certainly help us decide when busyspinning is worthwhile. v2: Expose the spin setting via Kconfig options for easier adjustment and testing. v3: Don't get caught sneaking in a change to the busyspin parameters. v4: Explain more about the "hybrid interrupt polling" scheme that we want to migrate towards. Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com> References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk 2019-04-20 01:26:25 +07:00			`config DRM_I915_SPIN_REQUEST`
drm/i915: Add a label for config DRM_I915_SPIN_REQUEST If we don't give it a label, it does not appear as a configuration option. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190612093111.11684-9-chris@chris-wilson.co.uk 2019-06-12 16:31:11 +07:00			`int "Busywait for request completion (us)"`
drm/i915: Expose the busyspin durations for i915_wait_request An interesting discussion regarding "hybrid interrupt polling" for NVMe came to the conclusion that the ideal busyspin before sleeping was half of the expected request latency (and better if it was already halfway through that request). This suggested that we too should look again at our tradeoff between spinning and waiting. Currently, our spin simply tries to hide the cost of enabling the interrupt, which is good to avoid penalising nop requests (i.e. test throughput) and not much else. Studying real world workloads suggests that a spin of upto 500us can dramatically boost performance, but the suggestion is that this is not from avoiding interrupt latency per-se, but from secondary effects of sleeping such as allowing the CPU reduce cstate and context switch away. In a truly hybrid interrupt polling scheme, we would aim to sleep until just before the request completed and then wake up in advance of the interrupt and do a quick poll to handle completion. This is tricky for ourselves at the moment as we are not recording request times, and since we allow preemption, our requests are not on as a nicely ordered timeline as IO. However, the idea is interesting, for it will certainly help us decide when busyspinning is worthwhile. v2: Expose the spin setting via Kconfig options for easier adjustment and testing. v3: Don't get caught sneaking in a change to the busyspin parameters. v4: Explain more about the "hybrid interrupt polling" scheme that we want to migrate towards. Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com> References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk 2019-04-20 01:26:25 +07:00			`default 5 # microseconds`
			`help`
			`Before sleeping waiting for a request (GPU operation) to complete,`
			`we may spend some time polling for its completion. As the IRQ may`
			`take a non-negligible time to setup, we do a short spin first to`
			`check if the request will complete in the time it would have taken`
			`us to enable the interrupt.`

			`May be 0 to disable the initial spin. In practice, we estimate`
			`the cost of enabling the interrupt (if currently disabled) to be`
			`a few microseconds.`
drm/i915/gt: Try to more gracefully quiesce the system before resets If we are doing a normal GPU reset triggered after detecting a long period of stalled work, we can take our time and allow the engines to quiesce. Since we've stopped submission to the engine, and if we wait long enough an innocent context should complete, leaving the engine idle. So by waiting a short amount of time, we should prevent clobbering other users when resetting a stuck context. Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Suggested-by: Jon Bloomfield <jon.bloomfield@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191023133108.21401-1-chris@chris-wilson.co.uk 2019-10-23 20:31:04 +07:00
			`config DRM_I915_STOP_TIMEOUT`
			`int "How long to wait for an engine to quiesce gracefully before reset (ms)"`
			`default 100 # milliseconds`
			`help`
			`By stopping submission and sleeping for a short time before resetting`
			`the GPU, we allow the innocent contexts also on the system to quiesce.`
			`It is then less likely for a hanging context to cause collateral`
			`damage as the system is reset in order to recover. The corollary is`
			`that the reset itself may take longer and so be more disruptive to`
			`interactive or low latency workloads.`