mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-19 14:15:35 +07:00
65 lines
2.6 KiB
Plaintext
65 lines
2.6 KiB
Plaintext
|
================================
|
||
|
PSI - Pressure Stall Information
|
||
|
================================
|
||
|
|
||
|
:Date: April, 2018
|
||
|
:Author: Johannes Weiner <hannes@cmpxchg.org>
|
||
|
|
||
|
When CPU, memory or IO devices are contended, workloads experience
|
||
|
latency spikes, throughput losses, and run the risk of OOM kills.
|
||
|
|
||
|
Without an accurate measure of such contention, users are forced to
|
||
|
either play it safe and under-utilize their hardware resources, or
|
||
|
roll the dice and frequently suffer the disruptions resulting from
|
||
|
excessive overcommit.
|
||
|
|
||
|
The psi feature identifies and quantifies the disruptions caused by
|
||
|
such resource crunches and the time impact it has on complex workloads
|
||
|
or even entire systems.
|
||
|
|
||
|
Having an accurate measure of productivity losses caused by resource
|
||
|
scarcity aids users in sizing workloads to hardware--or provisioning
|
||
|
hardware according to workload demand.
|
||
|
|
||
|
As psi aggregates this information in realtime, systems can be managed
|
||
|
dynamically using techniques such as load shedding, migrating jobs to
|
||
|
other systems or data centers, or strategically pausing or killing low
|
||
|
priority or restartable batch jobs.
|
||
|
|
||
|
This allows maximizing hardware utilization without sacrificing
|
||
|
workload health or risking major disruptions such as OOM kills.
|
||
|
|
||
|
Pressure interface
|
||
|
==================
|
||
|
|
||
|
Pressure information for each resource is exported through the
|
||
|
respective file in /proc/pressure/ -- cpu, memory, and io.
|
||
|
|
||
|
The format for CPU is as such:
|
||
|
|
||
|
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||
|
|
||
|
and for memory and IO:
|
||
|
|
||
|
some avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||
|
full avg10=0.00 avg60=0.00 avg300=0.00 total=0
|
||
|
|
||
|
The "some" line indicates the share of time in which at least some
|
||
|
tasks are stalled on a given resource.
|
||
|
|
||
|
The "full" line indicates the share of time in which all non-idle
|
||
|
tasks are stalled on a given resource simultaneously. In this state
|
||
|
actual CPU cycles are going to waste, and a workload that spends
|
||
|
extended time in this state is considered to be thrashing. This has
|
||
|
severe impact on performance, and it's useful to distinguish this
|
||
|
situation from a state where some tasks are stalled but the CPU is
|
||
|
still doing productive work. As such, time spent in this subset of the
|
||
|
stall state is tracked separately and exported in the "full" averages.
|
||
|
|
||
|
The ratios are tracked as recent trends over ten, sixty, and three
|
||
|
hundred second windows, which gives insight into short term events as
|
||
|
well as medium and long term trends. The total absolute stall time is
|
||
|
tracked and exported as well, to allow detection of latency spikes
|
||
|
which wouldn't necessarily make a dent in the time averages, or to
|
||
|
average trends over custom time frames.
|