linux_dsm_epyc7002/drivers/gpu/drm/i915/Kconfig.profile

108 lines
4.2 KiB
Plaintext
Raw Normal View History

config DRM_I915_FENCE_TIMEOUT
int "Timeout for unsignaled foreign fences (ms, jiffy granularity)"
default 10000 # milliseconds
help
When listening to a foreign fence, we install a supplementary timer
to ensure that we are always signaled and our userspace is able to
make forward progress. This value specifies the timeout used for an
unsignaled foreign fence.
May be 0 to disable the timeout, and rely on the foreign fence being
eventually signaled.
config DRM_I915_USERFAULT_AUTOSUSPEND
int "Runtime autosuspend delay for userspace GGTT mmaps (ms)"
default 250 # milliseconds
help
On runtime suspend, as we suspend the device, we have to revoke
userspace GGTT mmaps and force userspace to take a pagefault on
their next access. The revocation and subsequent recreation of
the GGTT mmap can be very slow and so we impose a small hysteris
that complements the runtime-pm autosuspend and provides a lower
floor on the autosuspend delay.
May be 0 to disable the extra delay and solely use the device level
runtime pm autosuspend delay tunable.
config DRM_I915_HEARTBEAT_INTERVAL
int "Interval between heartbeat pulses (ms)"
default 2500 # milliseconds
help
The driver sends a periodic heartbeat down all active engines to
check the health of the GPU and undertake regular house-keeping of
internal driver state.
This is adjustable via
/sys/class/drm/card?/engine/*/heartbeat_interval_ms
May be 0 to disable heartbeats and therefore disable automatic GPU
hang detection.
config DRM_I915_PREEMPT_TIMEOUT
int "Preempt timeout (ms, jiffy granularity)"
default 640 # milliseconds
help
How long to wait (in milliseconds) for a preemption event to occur
when submitting a new context via execlists. If the current context
does not hit an arbitration point and yield to HW before the timer
expires, the HW will be reset to allow the more important context
to execute.
This is adjustable via
/sys/class/drm/card?/engine/*/preempt_timeout_ms
May be 0 to disable the timeout.
The compiled in default may get overridden at driver probe time on
certain platforms and certain engines which will be reflected in the
sysfs control.
config DRM_I915_MAX_REQUEST_BUSYWAIT
int "Busywait for request completion limit (ns)"
default 8000 # nanoseconds
drm/i915: Expose the busyspin durations for i915_wait_request An interesting discussion regarding "hybrid interrupt polling" for NVMe came to the conclusion that the ideal busyspin before sleeping was half of the expected request latency (and better if it was already halfway through that request). This suggested that we too should look again at our tradeoff between spinning and waiting. Currently, our spin simply tries to hide the cost of enabling the interrupt, which is good to avoid penalising nop requests (i.e. test throughput) and not much else. Studying real world workloads suggests that a spin of upto 500us can dramatically boost performance, but the suggestion is that this is not from avoiding interrupt latency per-se, but from secondary effects of sleeping such as allowing the CPU reduce cstate and context switch away. In a truly hybrid interrupt polling scheme, we would aim to sleep until just before the request completed and then wake up in advance of the interrupt and do a quick poll to handle completion. This is tricky for ourselves at the moment as we are not recording request times, and since we allow preemption, our requests are not on as a nicely ordered timeline as IO. However, the idea is interesting, for it will certainly help us decide when busyspinning is worthwhile. v2: Expose the spin setting via Kconfig options for easier adjustment and testing. v3: Don't get caught sneaking in a change to the busyspin parameters. v4: Explain more about the "hybrid interrupt polling" scheme that we want to migrate towards. Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com> References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk
2019-04-20 01:26:25 +07:00
help
Before sleeping waiting for a request (GPU operation) to complete,
we may spend some time polling for its completion. As the IRQ may
take a non-negligible time to setup, we do a short spin first to
check if the request will complete in the time it would have taken
us to enable the interrupt.
This is adjustable via
/sys/class/drm/card?/engine/*/max_busywait_duration_ns
drm/i915: Expose the busyspin durations for i915_wait_request An interesting discussion regarding "hybrid interrupt polling" for NVMe came to the conclusion that the ideal busyspin before sleeping was half of the expected request latency (and better if it was already halfway through that request). This suggested that we too should look again at our tradeoff between spinning and waiting. Currently, our spin simply tries to hide the cost of enabling the interrupt, which is good to avoid penalising nop requests (i.e. test throughput) and not much else. Studying real world workloads suggests that a spin of upto 500us can dramatically boost performance, but the suggestion is that this is not from avoiding interrupt latency per-se, but from secondary effects of sleeping such as allowing the CPU reduce cstate and context switch away. In a truly hybrid interrupt polling scheme, we would aim to sleep until just before the request completed and then wake up in advance of the interrupt and do a quick poll to handle completion. This is tricky for ourselves at the moment as we are not recording request times, and since we allow preemption, our requests are not on as a nicely ordered timeline as IO. However, the idea is interesting, for it will certainly help us decide when busyspinning is worthwhile. v2: Expose the spin setting via Kconfig options for easier adjustment and testing. v3: Don't get caught sneaking in a change to the busyspin parameters. v4: Explain more about the "hybrid interrupt polling" scheme that we want to migrate towards. Suggested-by: Sagar Kamble <sagar.a.kamble@intel.com> References: http://events.linuxfoundation.org/sites/events/files/slides/lemoal-nvme-polling-vault-2017-final_0.pdf Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Sagar Kamble <sagar.a.kamble@intel.com> Cc: Eero Tamminen <eero.t.tamminen@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Sagar Kamble <sagar.a.kamble@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190419182625.11186-1-chris@chris-wilson.co.uk
2019-04-20 01:26:25 +07:00
May be 0 to disable the initial spin. In practice, we estimate
the cost of enabling the interrupt (if currently disabled) to be
a few microseconds.
config DRM_I915_STOP_TIMEOUT
int "How long to wait for an engine to quiesce gracefully before reset (ms)"
default 100 # milliseconds
help
By stopping submission and sleeping for a short time before resetting
the GPU, we allow the innocent contexts also on the system to quiesce.
It is then less likely for a hanging context to cause collateral
damage as the system is reset in order to recover. The corollary is
that the reset itself may take longer and so be more disruptive to
interactive or low latency workloads.
This is adjustable via
/sys/class/drm/card?/engine/*/stop_timeout_ms
config DRM_I915_TIMESLICE_DURATION
int "Scheduling quantum for userspace batches (ms, jiffy granularity)"
default 1 # milliseconds
help
When two user batches of equal priority are executing, we will
alternate execution of each batch to ensure forward progress of
all users. This is necessary in some cases where there may be
an implicit dependency between those batches that requires
concurrent execution in order for them to proceed, e.g. they
interact with each other via userspace semaphores. Each context
is scheduled for execution for the timeslice duration, before
switching to the next context.
This is adjustable via
/sys/class/drm/card?/engine/*/timeslice_duration_ms
May be 0 to disable timeslicing.