mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 09:40:58 +07:00
Merge branches 'pm-em' and 'pm-core'
* pm-em: OPP: refactor dev_pm_opp_of_register_em() and update related drivers Documentation: power: update Energy Model description PM / EM: change name of em_pd_energy to em_cpu_energy PM / EM: remove em_register_perf_domain PM / EM: add support for other devices than CPUs in Energy Model PM / EM: update callback structure and add device pointer PM / EM: introduce em_dev_register_perf_domain function PM / EM: change naming convention from 'capacity' to 'performance' * pm-core: mmc: jz4740: Use pm_ptr() macro PM: Make *_DEV_PM_OPS macros use __maybe_unused PM: core: introduce pm_ptr() macro
This commit is contained in:
commit
5b5642075c
@ -1,15 +1,17 @@
|
||||
====================
|
||||
Energy Model of CPUs
|
||||
====================
|
||||
.. SPDX-License-Identifier: GPL-2.0
|
||||
|
||||
=======================
|
||||
Energy Model of devices
|
||||
=======================
|
||||
|
||||
1. Overview
|
||||
-----------
|
||||
|
||||
The Energy Model (EM) framework serves as an interface between drivers knowing
|
||||
the power consumed by CPUs at various performance levels, and the kernel
|
||||
the power consumed by devices at various performance levels, and the kernel
|
||||
subsystems willing to use that information to make energy-aware decisions.
|
||||
|
||||
The source of the information about the power consumed by CPUs can vary greatly
|
||||
The source of the information about the power consumed by devices can vary greatly
|
||||
from one platform to another. These power costs can be estimated using
|
||||
devicetree data in some cases. In others, the firmware will know better.
|
||||
Alternatively, userspace might be best positioned. And so on. In order to avoid
|
||||
@ -25,7 +27,7 @@ framework, and interested clients reading the data from it::
|
||||
+---------------+ +-----------------+ +---------------+
|
||||
| Thermal (IPA) | | Scheduler (EAS) | | Other |
|
||||
+---------------+ +-----------------+ +---------------+
|
||||
| | em_pd_energy() |
|
||||
| | em_cpu_energy() |
|
||||
| | em_cpu_get() |
|
||||
+---------+ | +---------+
|
||||
| | |
|
||||
@ -35,7 +37,7 @@ framework, and interested clients reading the data from it::
|
||||
| Framework |
|
||||
+---------------------+
|
||||
^ ^ ^
|
||||
| | | em_register_perf_domain()
|
||||
| | | em_dev_register_perf_domain()
|
||||
+----------+ | +---------+
|
||||
| | |
|
||||
+---------------+ +---------------+ +--------------+
|
||||
@ -47,12 +49,12 @@ framework, and interested clients reading the data from it::
|
||||
| Device Tree | | Firmware | | ? |
|
||||
+--------------+ +---------------+ +--------------+
|
||||
|
||||
The EM framework manages power cost tables per 'performance domain' in the
|
||||
system. A performance domain is a group of CPUs whose performance is scaled
|
||||
together. Performance domains generally have a 1-to-1 mapping with CPUFreq
|
||||
policies. All CPUs in a performance domain are required to have the same
|
||||
micro-architecture. CPUs in different performance domains can have different
|
||||
micro-architectures.
|
||||
In case of CPU devices the EM framework manages power cost tables per
|
||||
'performance domain' in the system. A performance domain is a group of CPUs
|
||||
whose performance is scaled together. Performance domains generally have a
|
||||
1-to-1 mapping with CPUFreq policies. All CPUs in a performance domain are
|
||||
required to have the same micro-architecture. CPUs in different performance
|
||||
domains can have different micro-architectures.
|
||||
|
||||
|
||||
2. Core APIs
|
||||
@ -70,14 +72,16 @@ CONFIG_ENERGY_MODEL must be enabled to use the EM framework.
|
||||
Drivers are expected to register performance domains into the EM framework by
|
||||
calling the following API::
|
||||
|
||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
struct em_data_callback *cb);
|
||||
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *cpus);
|
||||
|
||||
Drivers must specify the CPUs of the performance domains using the cpumask
|
||||
argument, and provide a callback function returning <frequency, power> tuples
|
||||
for each capacity state. The callback function provided by the driver is free
|
||||
Drivers must provide a callback function returning <frequency, power> tuples
|
||||
for each performance state. The callback function provided by the driver is free
|
||||
to fetch data from any relevant location (DT, firmware, ...), and by any mean
|
||||
deemed necessary. See Section 3. for an example of driver implementing this
|
||||
deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
|
||||
performance domains using cpumask. For other devices than CPUs the last
|
||||
argument must be set to NULL.
|
||||
See Section 3. for an example of driver implementing this
|
||||
callback, and kernel/power/energy_model.c for further documentation on this
|
||||
API.
|
||||
|
||||
@ -85,13 +89,20 @@ API.
|
||||
2.3 Accessing performance domains
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
There are two API functions which provide the access to the energy model:
|
||||
em_cpu_get() which takes CPU id as an argument and em_pd_get() with device
|
||||
pointer as an argument. It depends on the subsystem which interface it is
|
||||
going to use, but in case of CPU devices both functions return the same
|
||||
performance domain.
|
||||
|
||||
Subsystems interested in the energy model of a CPU can retrieve it using the
|
||||
em_cpu_get() API. The energy model tables are allocated once upon creation of
|
||||
the performance domains, and kept in memory untouched.
|
||||
|
||||
The energy consumed by a performance domain can be estimated using the
|
||||
em_pd_energy() API. The estimation is performed assuming that the schedutil
|
||||
CPUfreq governor is in use.
|
||||
em_cpu_energy() API. The estimation is performed assuming that the schedutil
|
||||
CPUfreq governor is in use in case of CPU device. Currently this calculation is
|
||||
not provided for other type of devices.
|
||||
|
||||
More details about the above APIs can be found in include/linux/energy_model.h.
|
||||
|
||||
@ -106,42 +117,46 @@ EM framework::
|
||||
|
||||
-> drivers/cpufreq/foo_cpufreq.c
|
||||
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz, int cpu)
|
||||
02 {
|
||||
03 long freq, power;
|
||||
04
|
||||
05 /* Use the 'foo' protocol to ceil the frequency */
|
||||
06 freq = foo_get_freq_ceil(cpu, *KHz);
|
||||
07 if (freq < 0);
|
||||
08 return freq;
|
||||
09
|
||||
10 /* Estimate the power cost for the CPU at the relevant freq. */
|
||||
11 power = foo_estimate_power(cpu, freq);
|
||||
12 if (power < 0);
|
||||
13 return power;
|
||||
14
|
||||
15 /* Return the values to the EM framework */
|
||||
16 *mW = power;
|
||||
17 *KHz = freq;
|
||||
18
|
||||
19 return 0;
|
||||
20 }
|
||||
21
|
||||
22 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
23 {
|
||||
24 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
25 int nr_opp, ret;
|
||||
26
|
||||
27 /* Do the actual CPUFreq init work ... */
|
||||
28 ret = do_foo_cpufreq_init(policy);
|
||||
29 if (ret)
|
||||
30 return ret;
|
||||
31
|
||||
32 /* Find the number of OPPs for this policy */
|
||||
33 nr_opp = foo_get_nr_opp(policy);
|
||||
34
|
||||
35 /* And register the new performance domain */
|
||||
36 em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
37
|
||||
38 return 0;
|
||||
39 }
|
||||
01 static int est_power(unsigned long *mW, unsigned long *KHz,
|
||||
02 struct device *dev)
|
||||
03 {
|
||||
04 long freq, power;
|
||||
05
|
||||
06 /* Use the 'foo' protocol to ceil the frequency */
|
||||
07 freq = foo_get_freq_ceil(dev, *KHz);
|
||||
08 if (freq < 0);
|
||||
09 return freq;
|
||||
10
|
||||
11 /* Estimate the power cost for the dev at the relevant freq. */
|
||||
12 power = foo_estimate_power(dev, freq);
|
||||
13 if (power < 0);
|
||||
14 return power;
|
||||
15
|
||||
16 /* Return the values to the EM framework */
|
||||
17 *mW = power;
|
||||
18 *KHz = freq;
|
||||
19
|
||||
20 return 0;
|
||||
21 }
|
||||
22
|
||||
23 static int foo_cpufreq_init(struct cpufreq_policy *policy)
|
||||
24 {
|
||||
25 struct em_data_callback em_cb = EM_DATA_CB(est_power);
|
||||
26 struct device *cpu_dev;
|
||||
27 int nr_opp, ret;
|
||||
28
|
||||
29 cpu_dev = get_cpu_device(cpumask_first(policy->cpus));
|
||||
30
|
||||
31 /* Do the actual CPUFreq init work ... */
|
||||
32 ret = do_foo_cpufreq_init(policy);
|
||||
33 if (ret)
|
||||
34 return ret;
|
||||
35
|
||||
36 /* Find the number of OPPs for this policy */
|
||||
37 nr_opp = foo_get_nr_opp(policy);
|
||||
38
|
||||
39 /* And register the new performance domain */
|
||||
40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
|
||||
41
|
||||
42 return 0;
|
||||
43 }
|
||||
|
@ -279,7 +279,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
|
||||
policy->cpuinfo.transition_latency = transition_latency;
|
||||
policy->dvfs_possible_from_any_cpu = true;
|
||||
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
|
||||
|
||||
return 0;
|
||||
|
||||
|
@ -193,7 +193,7 @@ static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
|
||||
policy->clk = clks[ARM].clk;
|
||||
cpufreq_generic_init(policy, freq_table, transition_latency);
|
||||
policy->suspend_freq = max_freq;
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -448,7 +448,7 @@ static int mtk_cpufreq_init(struct cpufreq_policy *policy)
|
||||
policy->driver_data = info;
|
||||
policy->clk = info->cpu_clk;
|
||||
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(info->cpu_dev, policy->cpus);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -131,7 +131,7 @@ static int omap_cpu_init(struct cpufreq_policy *policy)
|
||||
|
||||
/* FIXME: what's the actual transition time? */
|
||||
cpufreq_generic_init(policy, freq_table, 300 * 1000);
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(mpu_dev, policy->cpus);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -238,7 +238,7 @@ static int qcom_cpufreq_hw_cpu_init(struct cpufreq_policy *policy)
|
||||
goto error;
|
||||
}
|
||||
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
|
||||
|
||||
policy->fast_switch_possible = true;
|
||||
|
||||
|
@ -103,17 +103,12 @@ scmi_get_sharing_cpus(struct device *cpu_dev, struct cpumask *cpumask)
|
||||
}
|
||||
|
||||
static int __maybe_unused
|
||||
scmi_get_cpu_power(unsigned long *power, unsigned long *KHz, int cpu)
|
||||
scmi_get_cpu_power(unsigned long *power, unsigned long *KHz,
|
||||
struct device *cpu_dev)
|
||||
{
|
||||
struct device *cpu_dev = get_cpu_device(cpu);
|
||||
unsigned long Hz;
|
||||
int ret, domain;
|
||||
|
||||
if (!cpu_dev) {
|
||||
pr_err("failed to get cpu%d device\n", cpu);
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
domain = handle->perf_ops->device_domain_id(cpu_dev);
|
||||
if (domain < 0)
|
||||
return domain;
|
||||
@ -200,7 +195,7 @@ static int scmi_cpufreq_init(struct cpufreq_policy *policy)
|
||||
|
||||
policy->fast_switch_possible = true;
|
||||
|
||||
em_register_perf_domain(policy->cpus, nr_opp, &em_cb);
|
||||
em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
|
||||
|
||||
return 0;
|
||||
|
||||
|
@ -167,7 +167,7 @@ static int scpi_cpufreq_init(struct cpufreq_policy *policy)
|
||||
|
||||
policy->fast_switch_possible = false;
|
||||
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
|
||||
|
||||
return 0;
|
||||
|
||||
|
@ -450,7 +450,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
|
||||
policy->freq_table = freq_table[cur_cluster];
|
||||
policy->cpuinfo.transition_latency = 1000000; /* 1 ms */
|
||||
|
||||
dev_pm_opp_of_register_em(policy->cpus);
|
||||
dev_pm_opp_of_register_em(cpu_dev, policy->cpus);
|
||||
|
||||
if (is_bL_switching_enabled())
|
||||
per_cpu(cpu_last_req_freq, policy->cpu) =
|
||||
|
@ -1108,24 +1108,18 @@ static int jz4740_mmc_remove(struct platform_device *pdev)
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PM_SLEEP
|
||||
|
||||
static int jz4740_mmc_suspend(struct device *dev)
|
||||
static int __maybe_unused jz4740_mmc_suspend(struct device *dev)
|
||||
{
|
||||
return pinctrl_pm_select_sleep_state(dev);
|
||||
}
|
||||
|
||||
static int jz4740_mmc_resume(struct device *dev)
|
||||
static int __maybe_unused jz4740_mmc_resume(struct device *dev)
|
||||
{
|
||||
return pinctrl_select_default_state(dev);
|
||||
}
|
||||
|
||||
static SIMPLE_DEV_PM_OPS(jz4740_mmc_pm_ops, jz4740_mmc_suspend,
|
||||
jz4740_mmc_resume);
|
||||
#define JZ4740_MMC_PM_OPS (&jz4740_mmc_pm_ops)
|
||||
#else
|
||||
#define JZ4740_MMC_PM_OPS NULL
|
||||
#endif
|
||||
|
||||
static struct platform_driver jz4740_mmc_driver = {
|
||||
.probe = jz4740_mmc_probe,
|
||||
@ -1133,7 +1127,7 @@ static struct platform_driver jz4740_mmc_driver = {
|
||||
.driver = {
|
||||
.name = "jz4740-mmc",
|
||||
.of_match_table = of_match_ptr(jz4740_mmc_of_match),
|
||||
.pm = JZ4740_MMC_PM_OPS,
|
||||
.pm = pm_ptr(&jz4740_mmc_pm_ops),
|
||||
},
|
||||
};
|
||||
|
||||
|
@ -1209,20 +1209,19 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_get_of_node);
|
||||
|
||||
/*
|
||||
* Callback function provided to the Energy Model framework upon registration.
|
||||
* This computes the power estimated by @CPU at @kHz if it is the frequency
|
||||
* This computes the power estimated by @dev at @kHz if it is the frequency
|
||||
* of an existing OPP, or at the frequency of the first OPP above @kHz otherwise
|
||||
* (see dev_pm_opp_find_freq_ceil()). This function updates @kHz to the ceiled
|
||||
* frequency and @mW to the associated power. The power is estimated as
|
||||
* P = C * V^2 * f with C being the CPU's capacitance and V and f respectively
|
||||
* the voltage and frequency of the OPP.
|
||||
* P = C * V^2 * f with C being the device's capacitance and V and f
|
||||
* respectively the voltage and frequency of the OPP.
|
||||
*
|
||||
* Returns -ENODEV if the CPU device cannot be found, -EINVAL if the power
|
||||
* calculation failed because of missing parameters, 0 otherwise.
|
||||
* Returns -EINVAL if the power calculation failed because of missing
|
||||
* parameters, 0 otherwise.
|
||||
*/
|
||||
static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
|
||||
int cpu)
|
||||
static int __maybe_unused _get_power(unsigned long *mW, unsigned long *kHz,
|
||||
struct device *dev)
|
||||
{
|
||||
struct device *cpu_dev;
|
||||
struct dev_pm_opp *opp;
|
||||
struct device_node *np;
|
||||
unsigned long mV, Hz;
|
||||
@ -1230,11 +1229,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
|
||||
u64 tmp;
|
||||
int ret;
|
||||
|
||||
cpu_dev = get_cpu_device(cpu);
|
||||
if (!cpu_dev)
|
||||
return -ENODEV;
|
||||
|
||||
np = of_node_get(cpu_dev->of_node);
|
||||
np = of_node_get(dev->of_node);
|
||||
if (!np)
|
||||
return -EINVAL;
|
||||
|
||||
@ -1244,7 +1239,7 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
|
||||
return -EINVAL;
|
||||
|
||||
Hz = *kHz * 1000;
|
||||
opp = dev_pm_opp_find_freq_ceil(cpu_dev, &Hz);
|
||||
opp = dev_pm_opp_find_freq_ceil(dev, &Hz);
|
||||
if (IS_ERR(opp))
|
||||
return -EINVAL;
|
||||
|
||||
@ -1264,30 +1259,38 @@ static int __maybe_unused _get_cpu_power(unsigned long *mW, unsigned long *kHz,
|
||||
|
||||
/**
|
||||
* dev_pm_opp_of_register_em() - Attempt to register an Energy Model
|
||||
* @cpus : CPUs for which an Energy Model has to be registered
|
||||
* @dev : Device for which an Energy Model has to be registered
|
||||
* @cpus : CPUs for which an Energy Model has to be registered. For
|
||||
* other type of devices it should be set to NULL.
|
||||
*
|
||||
* This checks whether the "dynamic-power-coefficient" devicetree property has
|
||||
* been specified, and tries to register an Energy Model with it if it has.
|
||||
* Having this property means the voltages are known for OPPs and the EM
|
||||
* might be calculated.
|
||||
*/
|
||||
void dev_pm_opp_of_register_em(struct cpumask *cpus)
|
||||
int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus)
|
||||
{
|
||||
struct em_data_callback em_cb = EM_DATA_CB(_get_cpu_power);
|
||||
int ret, nr_opp, cpu = cpumask_first(cpus);
|
||||
struct device *cpu_dev;
|
||||
struct em_data_callback em_cb = EM_DATA_CB(_get_power);
|
||||
struct device_node *np;
|
||||
int ret, nr_opp;
|
||||
u32 cap;
|
||||
|
||||
cpu_dev = get_cpu_device(cpu);
|
||||
if (!cpu_dev)
|
||||
return;
|
||||
if (IS_ERR_OR_NULL(dev)) {
|
||||
ret = -EINVAL;
|
||||
goto failed;
|
||||
}
|
||||
|
||||
nr_opp = dev_pm_opp_get_opp_count(cpu_dev);
|
||||
if (nr_opp <= 0)
|
||||
return;
|
||||
nr_opp = dev_pm_opp_get_opp_count(dev);
|
||||
if (nr_opp <= 0) {
|
||||
ret = -EINVAL;
|
||||
goto failed;
|
||||
}
|
||||
|
||||
np = of_node_get(cpu_dev->of_node);
|
||||
if (!np)
|
||||
return;
|
||||
np = of_node_get(dev->of_node);
|
||||
if (!np) {
|
||||
ret = -EINVAL;
|
||||
goto failed;
|
||||
}
|
||||
|
||||
/*
|
||||
* Register an EM only if the 'dynamic-power-coefficient' property is
|
||||
@ -1298,9 +1301,20 @@ void dev_pm_opp_of_register_em(struct cpumask *cpus)
|
||||
*/
|
||||
ret = of_property_read_u32(np, "dynamic-power-coefficient", &cap);
|
||||
of_node_put(np);
|
||||
if (ret || !cap)
|
||||
return;
|
||||
if (ret || !cap) {
|
||||
dev_dbg(dev, "Couldn't find proper 'dynamic-power-coefficient' in DT\n");
|
||||
ret = -EINVAL;
|
||||
goto failed;
|
||||
}
|
||||
|
||||
em_register_perf_domain(cpus, nr_opp, &em_cb);
|
||||
ret = em_dev_register_perf_domain(dev, nr_opp, &em_cb, cpus);
|
||||
if (ret)
|
||||
goto failed;
|
||||
|
||||
return 0;
|
||||
|
||||
failed:
|
||||
dev_dbg(dev, "Couldn't register Energy Model %d\n", ret);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(dev_pm_opp_of_register_em);
|
||||
|
@ -333,18 +333,18 @@ static inline bool em_is_sane(struct cpufreq_cooling_device *cpufreq_cdev,
|
||||
return false;
|
||||
|
||||
policy = cpufreq_cdev->policy;
|
||||
if (!cpumask_equal(policy->related_cpus, to_cpumask(em->cpus))) {
|
||||
if (!cpumask_equal(policy->related_cpus, em_span_cpus(em))) {
|
||||
pr_err("The span of pd %*pbl is misaligned with cpufreq policy %*pbl\n",
|
||||
cpumask_pr_args(to_cpumask(em->cpus)),
|
||||
cpumask_pr_args(em_span_cpus(em)),
|
||||
cpumask_pr_args(policy->related_cpus));
|
||||
return false;
|
||||
}
|
||||
|
||||
nr_levels = cpufreq_cdev->max_level + 1;
|
||||
if (em->nr_cap_states != nr_levels) {
|
||||
pr_err("The number of cap states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
|
||||
cpumask_pr_args(to_cpumask(em->cpus)),
|
||||
em->nr_cap_states, nr_levels);
|
||||
if (em_pd_nr_perf_states(em) != nr_levels) {
|
||||
pr_err("The number of performance states in pd %*pbl (%u) doesn't match the number of cooling levels (%u)\n",
|
||||
cpumask_pr_args(em_span_cpus(em)),
|
||||
em_pd_nr_perf_states(em), nr_levels);
|
||||
return false;
|
||||
}
|
||||
|
||||
|
@ -13,6 +13,7 @@
|
||||
#define _DEVICE_H_
|
||||
|
||||
#include <linux/dev_printk.h>
|
||||
#include <linux/energy_model.h>
|
||||
#include <linux/ioport.h>
|
||||
#include <linux/kobject.h>
|
||||
#include <linux/klist.h>
|
||||
@ -560,6 +561,10 @@ struct device {
|
||||
struct dev_pm_info power;
|
||||
struct dev_pm_domain *pm_domain;
|
||||
|
||||
#ifdef CONFIG_ENERGY_MODEL
|
||||
struct em_perf_domain *em_pd;
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN
|
||||
struct irq_domain *msi_domain;
|
||||
#endif
|
||||
|
@ -2,6 +2,7 @@
|
||||
#ifndef _LINUX_ENERGY_MODEL_H
|
||||
#define _LINUX_ENERGY_MODEL_H
|
||||
#include <linux/cpumask.h>
|
||||
#include <linux/device.h>
|
||||
#include <linux/jump_label.h>
|
||||
#include <linux/kobject.h>
|
||||
#include <linux/rcupdate.h>
|
||||
@ -10,13 +11,15 @@
|
||||
#include <linux/types.h>
|
||||
|
||||
/**
|
||||
* em_cap_state - Capacity state of a performance domain
|
||||
* @frequency: The CPU frequency in KHz, for consistency with CPUFreq
|
||||
* @power: The power consumed by 1 CPU at this level, in milli-watts
|
||||
* em_perf_state - Performance state of a performance domain
|
||||
* @frequency: The frequency in KHz, for consistency with CPUFreq
|
||||
* @power: The power consumed at this level, in milli-watts (by 1 CPU or
|
||||
by a registered device). It can be a total power: static and
|
||||
dynamic.
|
||||
* @cost: The cost coefficient associated with this level, used during
|
||||
* energy calculation. Equal to: power * max_frequency / frequency
|
||||
*/
|
||||
struct em_cap_state {
|
||||
struct em_perf_state {
|
||||
unsigned long frequency;
|
||||
unsigned long power;
|
||||
unsigned long cost;
|
||||
@ -24,102 +27,119 @@ struct em_cap_state {
|
||||
|
||||
/**
|
||||
* em_perf_domain - Performance domain
|
||||
* @table: List of capacity states, in ascending order
|
||||
* @nr_cap_states: Number of capacity states
|
||||
* @cpus: Cpumask covering the CPUs of the domain
|
||||
* @table: List of performance states, in ascending order
|
||||
* @nr_perf_states: Number of performance states
|
||||
* @cpus: Cpumask covering the CPUs of the domain. It's here
|
||||
* for performance reasons to avoid potential cache
|
||||
* misses during energy calculations in the scheduler
|
||||
* and simplifies allocating/freeing that memory region.
|
||||
*
|
||||
* A "performance domain" represents a group of CPUs whose performance is
|
||||
* scaled together. All CPUs of a performance domain must have the same
|
||||
* micro-architecture. Performance domains often have a 1-to-1 mapping with
|
||||
* CPUFreq policies.
|
||||
* In case of CPU device, a "performance domain" represents a group of CPUs
|
||||
* whose performance is scaled together. All CPUs of a performance domain
|
||||
* must have the same micro-architecture. Performance domains often have
|
||||
* a 1-to-1 mapping with CPUFreq policies. In case of other devices the @cpus
|
||||
* field is unused.
|
||||
*/
|
||||
struct em_perf_domain {
|
||||
struct em_cap_state *table;
|
||||
int nr_cap_states;
|
||||
struct em_perf_state *table;
|
||||
int nr_perf_states;
|
||||
unsigned long cpus[];
|
||||
};
|
||||
|
||||
#define em_span_cpus(em) (to_cpumask((em)->cpus))
|
||||
|
||||
#ifdef CONFIG_ENERGY_MODEL
|
||||
#define EM_CPU_MAX_POWER 0xFFFF
|
||||
#define EM_MAX_POWER 0xFFFF
|
||||
|
||||
struct em_data_callback {
|
||||
/**
|
||||
* active_power() - Provide power at the next capacity state of a CPU
|
||||
* @power : Active power at the capacity state in mW (modified)
|
||||
* @freq : Frequency at the capacity state in kHz (modified)
|
||||
* @cpu : CPU for which we do this operation
|
||||
* active_power() - Provide power at the next performance state of
|
||||
* a device
|
||||
* @power : Active power at the performance state in mW
|
||||
* (modified)
|
||||
* @freq : Frequency at the performance state in kHz
|
||||
* (modified)
|
||||
* @dev : Device for which we do this operation (can be a CPU)
|
||||
*
|
||||
* active_power() must find the lowest capacity state of 'cpu' above
|
||||
* active_power() must find the lowest performance state of 'dev' above
|
||||
* 'freq' and update 'power' and 'freq' to the matching active power
|
||||
* and frequency.
|
||||
*
|
||||
* The power is the one of a single CPU in the domain, expressed in
|
||||
* milli-watts. It is expected to fit in the [0, EM_CPU_MAX_POWER]
|
||||
* range.
|
||||
* In case of CPUs, the power is the one of a single CPU in the domain,
|
||||
* expressed in milli-watts. It is expected to fit in the
|
||||
* [0, EM_MAX_POWER] range.
|
||||
*
|
||||
* Return 0 on success.
|
||||
*/
|
||||
int (*active_power)(unsigned long *power, unsigned long *freq, int cpu);
|
||||
int (*active_power)(unsigned long *power, unsigned long *freq,
|
||||
struct device *dev);
|
||||
};
|
||||
#define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb }
|
||||
|
||||
struct em_perf_domain *em_cpu_get(int cpu);
|
||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
struct em_data_callback *cb);
|
||||
struct em_perf_domain *em_pd_get(struct device *dev);
|
||||
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *span);
|
||||
void em_dev_unregister_perf_domain(struct device *dev);
|
||||
|
||||
/**
|
||||
* em_pd_energy() - Estimates the energy consumed by the CPUs of a perf. domain
|
||||
* em_cpu_energy() - Estimates the energy consumed by the CPUs of a
|
||||
performance domain
|
||||
* @pd : performance domain for which energy has to be estimated
|
||||
* @max_util : highest utilization among CPUs of the domain
|
||||
* @sum_util : sum of the utilization of all CPUs in the domain
|
||||
*
|
||||
* This function must be used only for CPU devices. There is no validation,
|
||||
* i.e. if the EM is a CPU type and has cpumask allocated. It is called from
|
||||
* the scheduler code quite frequently and that is why there is not checks.
|
||||
*
|
||||
* Return: the sum of the energy consumed by the CPUs of the domain assuming
|
||||
* a capacity state satisfying the max utilization of the domain.
|
||||
*/
|
||||
static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
|
||||
static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
|
||||
unsigned long max_util, unsigned long sum_util)
|
||||
{
|
||||
unsigned long freq, scale_cpu;
|
||||
struct em_cap_state *cs;
|
||||
struct em_perf_state *ps;
|
||||
int i, cpu;
|
||||
|
||||
/*
|
||||
* In order to predict the capacity state, map the utilization of the
|
||||
* most utilized CPU of the performance domain to a requested frequency,
|
||||
* like schedutil.
|
||||
* In order to predict the performance state, map the utilization of
|
||||
* the most utilized CPU of the performance domain to a requested
|
||||
* frequency, like schedutil.
|
||||
*/
|
||||
cpu = cpumask_first(to_cpumask(pd->cpus));
|
||||
scale_cpu = arch_scale_cpu_capacity(cpu);
|
||||
cs = &pd->table[pd->nr_cap_states - 1];
|
||||
freq = map_util_freq(max_util, cs->frequency, scale_cpu);
|
||||
ps = &pd->table[pd->nr_perf_states - 1];
|
||||
freq = map_util_freq(max_util, ps->frequency, scale_cpu);
|
||||
|
||||
/*
|
||||
* Find the lowest capacity state of the Energy Model above the
|
||||
* Find the lowest performance state of the Energy Model above the
|
||||
* requested frequency.
|
||||
*/
|
||||
for (i = 0; i < pd->nr_cap_states; i++) {
|
||||
cs = &pd->table[i];
|
||||
if (cs->frequency >= freq)
|
||||
for (i = 0; i < pd->nr_perf_states; i++) {
|
||||
ps = &pd->table[i];
|
||||
if (ps->frequency >= freq)
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* The capacity of a CPU in the domain at that capacity state (cs)
|
||||
* The capacity of a CPU in the domain at the performance state (ps)
|
||||
* can be computed as:
|
||||
*
|
||||
* cs->freq * scale_cpu
|
||||
* cs->cap = -------------------- (1)
|
||||
* ps->freq * scale_cpu
|
||||
* ps->cap = -------------------- (1)
|
||||
* cpu_max_freq
|
||||
*
|
||||
* So, ignoring the costs of idle states (which are not available in
|
||||
* the EM), the energy consumed by this CPU at that capacity state is
|
||||
* estimated as:
|
||||
* the EM), the energy consumed by this CPU at that performance state
|
||||
* is estimated as:
|
||||
*
|
||||
* cs->power * cpu_util
|
||||
* ps->power * cpu_util
|
||||
* cpu_nrg = -------------------- (2)
|
||||
* cs->cap
|
||||
* ps->cap
|
||||
*
|
||||
* since 'cpu_util / cs->cap' represents its percentage of busy time.
|
||||
* since 'cpu_util / ps->cap' represents its percentage of busy time.
|
||||
*
|
||||
* NOTE: Although the result of this computation actually is in
|
||||
* units of power, it can be manipulated as an energy value
|
||||
@ -129,55 +149,64 @@ static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
|
||||
* By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product
|
||||
* of two terms:
|
||||
*
|
||||
* cs->power * cpu_max_freq cpu_util
|
||||
* ps->power * cpu_max_freq cpu_util
|
||||
* cpu_nrg = ------------------------ * --------- (3)
|
||||
* cs->freq scale_cpu
|
||||
* ps->freq scale_cpu
|
||||
*
|
||||
* The first term is static, and is stored in the em_cap_state struct
|
||||
* as 'cs->cost'.
|
||||
* The first term is static, and is stored in the em_perf_state struct
|
||||
* as 'ps->cost'.
|
||||
*
|
||||
* Since all CPUs of the domain have the same micro-architecture, they
|
||||
* share the same 'cs->cost', and the same CPU capacity. Hence, the
|
||||
* share the same 'ps->cost', and the same CPU capacity. Hence, the
|
||||
* total energy of the domain (which is the simple sum of the energy of
|
||||
* all of its CPUs) can be factorized as:
|
||||
*
|
||||
* cs->cost * \Sum cpu_util
|
||||
* ps->cost * \Sum cpu_util
|
||||
* pd_nrg = ------------------------ (4)
|
||||
* scale_cpu
|
||||
*/
|
||||
return cs->cost * sum_util / scale_cpu;
|
||||
return ps->cost * sum_util / scale_cpu;
|
||||
}
|
||||
|
||||
/**
|
||||
* em_pd_nr_cap_states() - Get the number of capacity states of a perf. domain
|
||||
* em_pd_nr_perf_states() - Get the number of performance states of a perf.
|
||||
* domain
|
||||
* @pd : performance domain for which this must be done
|
||||
*
|
||||
* Return: the number of capacity states in the performance domain table
|
||||
* Return: the number of performance states in the performance domain table
|
||||
*/
|
||||
static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
|
||||
static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
|
||||
{
|
||||
return pd->nr_cap_states;
|
||||
return pd->nr_perf_states;
|
||||
}
|
||||
|
||||
#else
|
||||
struct em_data_callback {};
|
||||
#define EM_DATA_CB(_active_power_cb) { }
|
||||
|
||||
static inline int em_register_perf_domain(cpumask_t *span,
|
||||
unsigned int nr_states, struct em_data_callback *cb)
|
||||
static inline
|
||||
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *span)
|
||||
{
|
||||
return -EINVAL;
|
||||
}
|
||||
static inline void em_dev_unregister_perf_domain(struct device *dev)
|
||||
{
|
||||
}
|
||||
static inline struct em_perf_domain *em_cpu_get(int cpu)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
static inline unsigned long em_pd_energy(struct em_perf_domain *pd,
|
||||
static inline struct em_perf_domain *em_pd_get(struct device *dev)
|
||||
{
|
||||
return NULL;
|
||||
}
|
||||
static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
|
||||
unsigned long max_util, unsigned long sum_util)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
static inline int em_pd_nr_cap_states(struct em_perf_domain *pd)
|
||||
static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
@ -351,7 +351,7 @@ struct dev_pm_ops {
|
||||
* to RAM and hibernation.
|
||||
*/
|
||||
#define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \
|
||||
const struct dev_pm_ops name = { \
|
||||
const struct dev_pm_ops __maybe_unused name = { \
|
||||
SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
|
||||
}
|
||||
|
||||
@ -369,11 +369,17 @@ const struct dev_pm_ops name = { \
|
||||
* .runtime_resume(), respectively (and analogously for hibernation).
|
||||
*/
|
||||
#define UNIVERSAL_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \
|
||||
const struct dev_pm_ops name = { \
|
||||
const struct dev_pm_ops __maybe_unused name = { \
|
||||
SET_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
|
||||
SET_RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PM
|
||||
#define pm_ptr(_ptr) (_ptr)
|
||||
#else
|
||||
#define pm_ptr(_ptr) NULL
|
||||
#endif
|
||||
|
||||
/*
|
||||
* PM_EVENT_ messages
|
||||
*
|
||||
|
@ -11,6 +11,7 @@
|
||||
#ifndef __LINUX_OPP_H__
|
||||
#define __LINUX_OPP_H__
|
||||
|
||||
#include <linux/energy_model.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/notifier.h>
|
||||
|
||||
@ -373,7 +374,11 @@ struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev);
|
||||
struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp);
|
||||
int of_get_required_opp_performance_state(struct device_node *np, int index);
|
||||
int dev_pm_opp_of_find_icc_paths(struct device *dev, struct opp_table *opp_table);
|
||||
void dev_pm_opp_of_register_em(struct cpumask *cpus);
|
||||
int dev_pm_opp_of_register_em(struct device *dev, struct cpumask *cpus);
|
||||
static inline void dev_pm_opp_of_unregister_em(struct device *dev)
|
||||
{
|
||||
em_dev_unregister_perf_domain(dev);
|
||||
}
|
||||
#else
|
||||
static inline int dev_pm_opp_of_add_table(struct device *dev)
|
||||
{
|
||||
@ -413,7 +418,13 @@ static inline struct device_node *dev_pm_opp_get_of_node(struct dev_pm_opp *opp)
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static inline void dev_pm_opp_of_register_em(struct cpumask *cpus)
|
||||
static inline int dev_pm_opp_of_register_em(struct device *dev,
|
||||
struct cpumask *cpus)
|
||||
{
|
||||
return -ENOTSUPP;
|
||||
}
|
||||
|
||||
static inline void dev_pm_opp_of_unregister_em(struct device *dev)
|
||||
{
|
||||
}
|
||||
|
||||
|
@ -1,9 +1,10 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Energy Model of CPUs
|
||||
* Energy Model of devices
|
||||
*
|
||||
* Copyright (c) 2018, Arm ltd.
|
||||
* Copyright (c) 2018-2020, Arm ltd.
|
||||
* Written by: Quentin Perret, Arm ltd.
|
||||
* Improvements provided by: Lukasz Luba, Arm ltd.
|
||||
*/
|
||||
|
||||
#define pr_fmt(fmt) "energy_model: " fmt
|
||||
@ -15,30 +16,32 @@
|
||||
#include <linux/sched/topology.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
/* Mapping of each CPU to the performance domain to which it belongs. */
|
||||
static DEFINE_PER_CPU(struct em_perf_domain *, em_data);
|
||||
|
||||
/*
|
||||
* Mutex serializing the registrations of performance domains and letting
|
||||
* callbacks defined by drivers sleep.
|
||||
*/
|
||||
static DEFINE_MUTEX(em_pd_mutex);
|
||||
|
||||
static bool _is_cpu_device(struct device *dev)
|
||||
{
|
||||
return (dev->bus == &cpu_subsys);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_DEBUG_FS
|
||||
static struct dentry *rootdir;
|
||||
|
||||
static void em_debug_create_cs(struct em_cap_state *cs, struct dentry *pd)
|
||||
static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd)
|
||||
{
|
||||
struct dentry *d;
|
||||
char name[24];
|
||||
|
||||
snprintf(name, sizeof(name), "cs:%lu", cs->frequency);
|
||||
snprintf(name, sizeof(name), "ps:%lu", ps->frequency);
|
||||
|
||||
/* Create per-cs directory */
|
||||
/* Create per-ps directory */
|
||||
d = debugfs_create_dir(name, pd);
|
||||
debugfs_create_ulong("frequency", 0444, d, &cs->frequency);
|
||||
debugfs_create_ulong("power", 0444, d, &cs->power);
|
||||
debugfs_create_ulong("cost", 0444, d, &cs->cost);
|
||||
debugfs_create_ulong("frequency", 0444, d, &ps->frequency);
|
||||
debugfs_create_ulong("power", 0444, d, &ps->power);
|
||||
debugfs_create_ulong("cost", 0444, d, &ps->cost);
|
||||
}
|
||||
|
||||
static int em_debug_cpus_show(struct seq_file *s, void *unused)
|
||||
@ -49,22 +52,30 @@ static int em_debug_cpus_show(struct seq_file *s, void *unused)
|
||||
}
|
||||
DEFINE_SHOW_ATTRIBUTE(em_debug_cpus);
|
||||
|
||||
static void em_debug_create_pd(struct em_perf_domain *pd, int cpu)
|
||||
static void em_debug_create_pd(struct device *dev)
|
||||
{
|
||||
struct dentry *d;
|
||||
char name[8];
|
||||
int i;
|
||||
|
||||
snprintf(name, sizeof(name), "pd%d", cpu);
|
||||
|
||||
/* Create the directory of the performance domain */
|
||||
d = debugfs_create_dir(name, rootdir);
|
||||
d = debugfs_create_dir(dev_name(dev), rootdir);
|
||||
|
||||
debugfs_create_file("cpus", 0444, d, pd->cpus, &em_debug_cpus_fops);
|
||||
if (_is_cpu_device(dev))
|
||||
debugfs_create_file("cpus", 0444, d, dev->em_pd->cpus,
|
||||
&em_debug_cpus_fops);
|
||||
|
||||
/* Create a sub-directory for each capacity state */
|
||||
for (i = 0; i < pd->nr_cap_states; i++)
|
||||
em_debug_create_cs(&pd->table[i], d);
|
||||
/* Create a sub-directory for each performance state */
|
||||
for (i = 0; i < dev->em_pd->nr_perf_states; i++)
|
||||
em_debug_create_ps(&dev->em_pd->table[i], d);
|
||||
|
||||
}
|
||||
|
||||
static void em_debug_remove_pd(struct device *dev)
|
||||
{
|
||||
struct dentry *debug_dir;
|
||||
|
||||
debug_dir = debugfs_lookup(dev_name(dev), rootdir);
|
||||
debugfs_remove_recursive(debug_dir);
|
||||
}
|
||||
|
||||
static int __init em_debug_init(void)
|
||||
@ -76,58 +87,55 @@ static int __init em_debug_init(void)
|
||||
}
|
||||
core_initcall(em_debug_init);
|
||||
#else /* CONFIG_DEBUG_FS */
|
||||
static void em_debug_create_pd(struct em_perf_domain *pd, int cpu) {}
|
||||
static void em_debug_create_pd(struct device *dev) {}
|
||||
static void em_debug_remove_pd(struct device *dev) {}
|
||||
#endif
|
||||
static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
|
||||
struct em_data_callback *cb)
|
||||
|
||||
static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd,
|
||||
int nr_states, struct em_data_callback *cb)
|
||||
{
|
||||
unsigned long opp_eff, prev_opp_eff = ULONG_MAX;
|
||||
unsigned long power, freq, prev_freq = 0;
|
||||
int i, ret, cpu = cpumask_first(span);
|
||||
struct em_cap_state *table;
|
||||
struct em_perf_domain *pd;
|
||||
struct em_perf_state *table;
|
||||
int i, ret;
|
||||
u64 fmax;
|
||||
|
||||
if (!cb->active_power)
|
||||
return NULL;
|
||||
|
||||
pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
|
||||
if (!pd)
|
||||
return NULL;
|
||||
|
||||
table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL);
|
||||
if (!table)
|
||||
goto free_pd;
|
||||
return -ENOMEM;
|
||||
|
||||
/* Build the list of capacity states for this performance domain */
|
||||
/* Build the list of performance states for this performance domain */
|
||||
for (i = 0, freq = 0; i < nr_states; i++, freq++) {
|
||||
/*
|
||||
* active_power() is a driver callback which ceils 'freq' to
|
||||
* lowest capacity state of 'cpu' above 'freq' and updates
|
||||
* lowest performance state of 'dev' above 'freq' and updates
|
||||
* 'power' and 'freq' accordingly.
|
||||
*/
|
||||
ret = cb->active_power(&power, &freq, cpu);
|
||||
ret = cb->active_power(&power, &freq, dev);
|
||||
if (ret) {
|
||||
pr_err("pd%d: invalid cap. state: %d\n", cpu, ret);
|
||||
goto free_cs_table;
|
||||
dev_err(dev, "EM: invalid perf. state: %d\n",
|
||||
ret);
|
||||
goto free_ps_table;
|
||||
}
|
||||
|
||||
/*
|
||||
* We expect the driver callback to increase the frequency for
|
||||
* higher capacity states.
|
||||
* higher performance states.
|
||||
*/
|
||||
if (freq <= prev_freq) {
|
||||
pr_err("pd%d: non-increasing freq: %lu\n", cpu, freq);
|
||||
goto free_cs_table;
|
||||
dev_err(dev, "EM: non-increasing freq: %lu\n",
|
||||
freq);
|
||||
goto free_ps_table;
|
||||
}
|
||||
|
||||
/*
|
||||
* The power returned by active_state() is expected to be
|
||||
* positive, in milli-watts and to fit into 16 bits.
|
||||
*/
|
||||
if (!power || power > EM_CPU_MAX_POWER) {
|
||||
pr_err("pd%d: invalid power: %lu\n", cpu, power);
|
||||
goto free_cs_table;
|
||||
if (!power || power > EM_MAX_POWER) {
|
||||
dev_err(dev, "EM: invalid power: %lu\n",
|
||||
power);
|
||||
goto free_ps_table;
|
||||
}
|
||||
|
||||
table[i].power = power;
|
||||
@ -141,12 +149,12 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
|
||||
*/
|
||||
opp_eff = freq / power;
|
||||
if (opp_eff >= prev_opp_eff)
|
||||
pr_warn("pd%d: hertz/watts ratio non-monotonically decreasing: em_cap_state %d >= em_cap_state%d\n",
|
||||
cpu, i, i - 1);
|
||||
dev_dbg(dev, "EM: hertz/watts ratio non-monotonically decreasing: em_perf_state %d >= em_perf_state%d\n",
|
||||
i, i - 1);
|
||||
prev_opp_eff = opp_eff;
|
||||
}
|
||||
|
||||
/* Compute the cost of each capacity_state. */
|
||||
/* Compute the cost of each performance state. */
|
||||
fmax = (u64) table[nr_states - 1].frequency;
|
||||
for (i = 0; i < nr_states; i++) {
|
||||
table[i].cost = div64_u64(fmax * table[i].power,
|
||||
@ -154,39 +162,94 @@ static struct em_perf_domain *em_create_pd(cpumask_t *span, int nr_states,
|
||||
}
|
||||
|
||||
pd->table = table;
|
||||
pd->nr_cap_states = nr_states;
|
||||
cpumask_copy(to_cpumask(pd->cpus), span);
|
||||
pd->nr_perf_states = nr_states;
|
||||
|
||||
em_debug_create_pd(pd, cpu);
|
||||
return 0;
|
||||
|
||||
return pd;
|
||||
|
||||
free_cs_table:
|
||||
free_ps_table:
|
||||
kfree(table);
|
||||
free_pd:
|
||||
kfree(pd);
|
||||
|
||||
return NULL;
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static int em_create_pd(struct device *dev, int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *cpus)
|
||||
{
|
||||
struct em_perf_domain *pd;
|
||||
struct device *cpu_dev;
|
||||
int cpu, ret;
|
||||
|
||||
if (_is_cpu_device(dev)) {
|
||||
pd = kzalloc(sizeof(*pd) + cpumask_size(), GFP_KERNEL);
|
||||
if (!pd)
|
||||
return -ENOMEM;
|
||||
|
||||
cpumask_copy(em_span_cpus(pd), cpus);
|
||||
} else {
|
||||
pd = kzalloc(sizeof(*pd), GFP_KERNEL);
|
||||
if (!pd)
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
ret = em_create_perf_table(dev, pd, nr_states, cb);
|
||||
if (ret) {
|
||||
kfree(pd);
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (_is_cpu_device(dev))
|
||||
for_each_cpu(cpu, cpus) {
|
||||
cpu_dev = get_cpu_device(cpu);
|
||||
cpu_dev->em_pd = pd;
|
||||
}
|
||||
|
||||
dev->em_pd = pd;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* em_pd_get() - Return the performance domain for a device
|
||||
* @dev : Device to find the performance domain for
|
||||
*
|
||||
* Returns the performance domain to which @dev belongs, or NULL if it doesn't
|
||||
* exist.
|
||||
*/
|
||||
struct em_perf_domain *em_pd_get(struct device *dev)
|
||||
{
|
||||
if (IS_ERR_OR_NULL(dev))
|
||||
return NULL;
|
||||
|
||||
return dev->em_pd;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(em_pd_get);
|
||||
|
||||
/**
|
||||
* em_cpu_get() - Return the performance domain for a CPU
|
||||
* @cpu : CPU to find the performance domain for
|
||||
*
|
||||
* Return: the performance domain to which 'cpu' belongs, or NULL if it doesn't
|
||||
* Returns the performance domain to which @cpu belongs, or NULL if it doesn't
|
||||
* exist.
|
||||
*/
|
||||
struct em_perf_domain *em_cpu_get(int cpu)
|
||||
{
|
||||
return READ_ONCE(per_cpu(em_data, cpu));
|
||||
struct device *cpu_dev;
|
||||
|
||||
cpu_dev = get_cpu_device(cpu);
|
||||
if (!cpu_dev)
|
||||
return NULL;
|
||||
|
||||
return em_pd_get(cpu_dev);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(em_cpu_get);
|
||||
|
||||
/**
|
||||
* em_register_perf_domain() - Register the Energy Model of a performance domain
|
||||
* @span : Mask of CPUs in the performance domain
|
||||
* @nr_states : Number of capacity states to register
|
||||
* em_dev_register_perf_domain() - Register the Energy Model (EM) for a device
|
||||
* @dev : Device for which the EM is to register
|
||||
* @nr_states : Number of performance states to register
|
||||
* @cb : Callback functions providing the data of the Energy Model
|
||||
* @cpus : Pointer to cpumask_t, which in case of a CPU device is
|
||||
* obligatory. It can be taken from i.e. 'policy->cpus'. For other
|
||||
* type of devices this should be set to NULL.
|
||||
*
|
||||
* Create Energy Model tables for a performance domain using the callbacks
|
||||
* defined in cb.
|
||||
@ -196,14 +259,13 @@ EXPORT_SYMBOL_GPL(em_cpu_get);
|
||||
*
|
||||
* Return 0 on success
|
||||
*/
|
||||
int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
struct em_data_callback *cb)
|
||||
int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
|
||||
struct em_data_callback *cb, cpumask_t *cpus)
|
||||
{
|
||||
unsigned long cap, prev_cap = 0;
|
||||
struct em_perf_domain *pd;
|
||||
int cpu, ret = 0;
|
||||
int cpu, ret;
|
||||
|
||||
if (!span || !nr_states || !cb)
|
||||
if (!dev || !nr_states || !cb)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
@ -212,47 +274,79 @@ int em_register_perf_domain(cpumask_t *span, unsigned int nr_states,
|
||||
*/
|
||||
mutex_lock(&em_pd_mutex);
|
||||
|
||||
for_each_cpu(cpu, span) {
|
||||
/* Make sure we don't register again an existing domain. */
|
||||
if (READ_ONCE(per_cpu(em_data, cpu))) {
|
||||
ret = -EEXIST;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
/*
|
||||
* All CPUs of a domain must have the same micro-architecture
|
||||
* since they all share the same table.
|
||||
*/
|
||||
cap = arch_scale_cpu_capacity(cpu);
|
||||
if (prev_cap && prev_cap != cap) {
|
||||
pr_err("CPUs of %*pbl must have the same capacity\n",
|
||||
cpumask_pr_args(span));
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
prev_cap = cap;
|
||||
}
|
||||
|
||||
/* Create the performance domain and add it to the Energy Model. */
|
||||
pd = em_create_pd(span, nr_states, cb);
|
||||
if (!pd) {
|
||||
ret = -EINVAL;
|
||||
if (dev->em_pd) {
|
||||
ret = -EEXIST;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
for_each_cpu(cpu, span) {
|
||||
/*
|
||||
* The per-cpu array can be read concurrently from em_cpu_get().
|
||||
* The barrier enforces the ordering needed to make sure readers
|
||||
* can only access well formed em_perf_domain structs.
|
||||
*/
|
||||
smp_store_release(per_cpu_ptr(&em_data, cpu), pd);
|
||||
if (_is_cpu_device(dev)) {
|
||||
if (!cpus) {
|
||||
dev_err(dev, "EM: invalid CPU mask\n");
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
for_each_cpu(cpu, cpus) {
|
||||
if (em_cpu_get(cpu)) {
|
||||
dev_err(dev, "EM: exists for CPU%d\n", cpu);
|
||||
ret = -EEXIST;
|
||||
goto unlock;
|
||||
}
|
||||
/*
|
||||
* All CPUs of a domain must have the same
|
||||
* micro-architecture since they all share the same
|
||||
* table.
|
||||
*/
|
||||
cap = arch_scale_cpu_capacity(cpu);
|
||||
if (prev_cap && prev_cap != cap) {
|
||||
dev_err(dev, "EM: CPUs of %*pbl must have the same capacity\n",
|
||||
cpumask_pr_args(cpus));
|
||||
|
||||
ret = -EINVAL;
|
||||
goto unlock;
|
||||
}
|
||||
prev_cap = cap;
|
||||
}
|
||||
}
|
||||
|
||||
pr_debug("Created perf domain %*pbl\n", cpumask_pr_args(span));
|
||||
ret = em_create_pd(dev, nr_states, cb, cpus);
|
||||
if (ret)
|
||||
goto unlock;
|
||||
|
||||
em_debug_create_pd(dev);
|
||||
dev_info(dev, "EM: created perf domain\n");
|
||||
|
||||
unlock:
|
||||
mutex_unlock(&em_pd_mutex);
|
||||
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(em_register_perf_domain);
|
||||
EXPORT_SYMBOL_GPL(em_dev_register_perf_domain);
|
||||
|
||||
/**
|
||||
* em_dev_unregister_perf_domain() - Unregister Energy Model (EM) for a device
|
||||
* @dev : Device for which the EM is registered
|
||||
*
|
||||
* Unregister the EM for the specified @dev (but not a CPU device).
|
||||
*/
|
||||
void em_dev_unregister_perf_domain(struct device *dev)
|
||||
{
|
||||
if (IS_ERR_OR_NULL(dev) || !dev->em_pd)
|
||||
return;
|
||||
|
||||
if (_is_cpu_device(dev))
|
||||
return;
|
||||
|
||||
/*
|
||||
* The mutex separates all register/unregister requests and protects
|
||||
* from potential clean-up/setup issues in the debugfs directories.
|
||||
* The debugfs directory name is the same as device's name.
|
||||
*/
|
||||
mutex_lock(&em_pd_mutex);
|
||||
em_debug_remove_pd(dev);
|
||||
|
||||
kfree(dev->em_pd->table);
|
||||
kfree(dev->em_pd);
|
||||
dev->em_pd = NULL;
|
||||
mutex_unlock(&em_pd_mutex);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain);
|
||||
|
@ -6501,7 +6501,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
|
||||
max_util = max(max_util, cpu_util);
|
||||
}
|
||||
|
||||
return em_pd_energy(pd->em_pd, max_util, sum_util);
|
||||
return em_cpu_energy(pd->em_pd, max_util, sum_util);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -272,10 +272,10 @@ static void perf_domain_debug(const struct cpumask *cpu_map,
|
||||
printk(KERN_DEBUG "root_domain %*pbl:", cpumask_pr_args(cpu_map));
|
||||
|
||||
while (pd) {
|
||||
printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_cstate=%d }",
|
||||
printk(KERN_CONT " pd%d:{ cpus=%*pbl nr_pstate=%d }",
|
||||
cpumask_first(perf_domain_span(pd)),
|
||||
cpumask_pr_args(perf_domain_span(pd)),
|
||||
em_pd_nr_cap_states(pd->em_pd));
|
||||
em_pd_nr_perf_states(pd->em_pd));
|
||||
pd = pd->next;
|
||||
}
|
||||
|
||||
@ -313,26 +313,26 @@ static void sched_energy_set(bool has_eas)
|
||||
*
|
||||
* The complexity of the Energy Model is defined as:
|
||||
*
|
||||
* C = nr_pd * (nr_cpus + nr_cs)
|
||||
* C = nr_pd * (nr_cpus + nr_ps)
|
||||
*
|
||||
* with parameters defined as:
|
||||
* - nr_pd: the number of performance domains
|
||||
* - nr_cpus: the number of CPUs
|
||||
* - nr_cs: the sum of the number of capacity states of all performance
|
||||
* - nr_ps: the sum of the number of performance states of all performance
|
||||
* domains (for example, on a system with 2 performance domains,
|
||||
* with 10 capacity states each, nr_cs = 2 * 10 = 20).
|
||||
* with 10 performance states each, nr_ps = 2 * 10 = 20).
|
||||
*
|
||||
* It is generally not a good idea to use such a model in the wake-up path on
|
||||
* very complex platforms because of the associated scheduling overheads. The
|
||||
* arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs
|
||||
* with per-CPU DVFS and less than 8 capacity states each, for example.
|
||||
* with per-CPU DVFS and less than 8 performance states each, for example.
|
||||
*/
|
||||
#define EM_MAX_COMPLEXITY 2048
|
||||
|
||||
extern struct cpufreq_governor schedutil_gov;
|
||||
static bool build_perf_domains(const struct cpumask *cpu_map)
|
||||
{
|
||||
int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
|
||||
int i, nr_pd = 0, nr_ps = 0, nr_cpus = cpumask_weight(cpu_map);
|
||||
struct perf_domain *pd = NULL, *tmp;
|
||||
int cpu = cpumask_first(cpu_map);
|
||||
struct root_domain *rd = cpu_rq(cpu)->rd;
|
||||
@ -384,15 +384,15 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
|
||||
pd = tmp;
|
||||
|
||||
/*
|
||||
* Count performance domains and capacity states for the
|
||||
* Count performance domains and performance states for the
|
||||
* complexity check.
|
||||
*/
|
||||
nr_pd++;
|
||||
nr_cs += em_pd_nr_cap_states(pd->em_pd);
|
||||
nr_ps += em_pd_nr_perf_states(pd->em_pd);
|
||||
}
|
||||
|
||||
/* Bail out if the Energy Model complexity is too high. */
|
||||
if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
|
||||
if (nr_pd * (nr_ps + nr_cpus) > EM_MAX_COMPLEXITY) {
|
||||
WARN(1, "rd %*pbl: Failed to start EAS, EM complexity is too high\n",
|
||||
cpumask_pr_args(cpu_map));
|
||||
goto free;
|
||||
|
Loading…
Reference in New Issue
Block a user