Commit Graph

164 Commits

Author SHA1 Message Date
Rafael J. Wysocki
8c8f77fd07 cpufreq: governor: Move per-CPU data to the common code
After previous changes there is only one piece of code in the
ondemand governor making references to per-CPU data structures,
but it can be easily modified to avoid doing that, so modify it
accordingly and move the definition of per-CPU data used by the
ondemand and conservative governors to the common code.  Next,
change that code to access the per-CPU data structures directly
rather than via a governor callback.

This causes the ->get_cpu_cdbs governor callback to become
unnecessary, so drop it along with the macro and function
definitions related to it.

Finally, drop the definitions of struct od_cpu_dbs_info_s and
struct cs_cpu_dbs_info_s that aren't necessary any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:09 +01:00
Rafael J. Wysocki
7d5a9956af cpufreq: governor: Make governor private data per-policy
Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s
are only used for a limited set of CPUs.  Namely, if a policy is
shared between multiple CPUs, those fields will only be used for one
of them (policy->cpu).  This means that they really are per-policy
rather than per-CPU and holding room for them in per-CPU data
structures is generally wasteful.  Also moving those fields into
per-policy data structures will allow some significant simplifications
to be made going forward.

For this reason, introduce struct cs_policy_dbs_info and
struct od_policy_dbs_info to hold those fields.  Define each of the
new structures as an extension of struct policy_dbs_info (such that
struct policy_dbs_info is embedded in each of them) and introduce
new ->alloc and ->free governor callbacks to allocate and free
those structures, respectively, such that ->alloc() will return
a pointer to the struct policy_dbs_info embedded in the allocated
data structure and ->free() will take that pointer as its argument.

With that, modify the code accessing the data fields in question
in per-CPU data objects to look for them in the new structures
via the struct policy_dbs_info pointer available to it and drop
them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki
d1db75fffc cpufreq: ondemand: Rework the handling of powersave bias updates
The ondemand_powersave_bias_init() function used for resetting data
fields related to the powersave bias tunable of the ondemand governor
works by walking all of the online CPUs in the system and updating the
od_cpu_dbs_info_s structures for all of them.

However, if governor tunables are per policy, the update should not
touch the CPUs that are not associated with the given dbs_data.

Moreover, since the data fields in question are only ever used for
policy->cpu in each policy governed by ondemand, the update can be
limited to those specific CPUs.

Rework the code to take the above observations into account.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki
a33cce1c6c cpufreq: governor: Fix CPU load information updates via ->store
The ->store() callbacks of some tunable sysfs attributes of the
ondemand and conservative governors trigger immediate updates of
the CPU load information for all CPUs "governed" by the given
dbs_data by walking the cpu_dbs_info structures for all online
CPUs in the system and updating them.

This is questionable for two reasons.  First, it may lead to a lot of
extra overhead on a system with many CPUs if the given dbs_data is
only associated with a few of them.  Second, if governor tunables are
per-policy, the CPUs associated with the other sets of governor
tunables should not be updated.

To address this issue, use the observation that in all of the places
in question the update operation may be carried out in the same way
(because all of the tunables involved are now located in struct
dbs_data and readily available to the common code) and make the
code in those places invoke the same (new) helper function that
will carry out the update correctly.

That new function always checks the ignore_nice_load tunable value
and updates the CPUs' prev_cpu_nice data fields if that's set, which
wasn't done by the original code in store_io_is_busy(), but it
should have been done in there too.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:08 +01:00
Rafael J. Wysocki
76c5f66aa1 cpufreq: ondemand: Drop one more callback from struct od_ops
The ->powersave_bias_init_cpu callback in struct od_ops is only used
in one place and that invocation may be replaced with a direct call
to the function pointed to by that callback, so change the code
accordingly and drop the callback.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki
8434dadbb4 cpufreq: governor: Drop unused governor callback and data fields
After some previous changes, the ->get_cpu_dbs_info_s governor
callback and the "governor" field in struct dbs_governor (whose
value represents the governor type) are not used any more, so
drop them.

Also drop the unused gov_ops field from struct dbs_governor.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki
702c9e542a cpufreq: governor: Add a ->start callback for governors
To avoid having to check the governor type explicitly in the common
code in order to initialize data structures specific to the governor
type properly, add a ->start callback to struct dbs_governor and
use it to initialize those data structures for the ondemand and
conservative governors.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:07 +01:00
Rafael J. Wysocki
8847e038c1 cpufreq: governor: Move io_is_busy to struct dbs_data
The io_is_busy governor tunable is only used by the ondemand governor
and is located in the ondemand-specific data structure, but it is
looked at by the common governor code that has to do ugly things to
get to that value, so move it to struct dbs_data and modify ondemand
accordingly.

Since the conservative governor never touches that field, it will
be always 0 for that governor and it won't have any effect on the
results of computations in that case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:06 +01:00
Rafael J. Wysocki
8eb055d3f5 cpufreq: ondemand: Drop unused callback from struct od_ops
The ->freq_increase callback in struct od_ops is never invoked,
so drop it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:06 +01:00
Rafael J. Wysocki
a7f35cffb9 cpufreq: ondemand: Simplify od_update() slightly
Drop some lines of code from od_update() by arranging the statements
in there in a more logical way.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki
07aa4402a0 cpufreq: governor: Use microseconds in sample delay computations
Do not convert microseconds to jiffies and the other way around
in governor computations related to the sampling rate and sample
delay and drop delay_for_sampling_rate() which isn't of any use
then.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki
6e96c5b3ac cpufreq: ondemand: Simplify conditionals in od_dbs_timer()
Reduce the indentation level in the conditionals in od_dbs_timer()
and drop the delay variable from it.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:05 +01:00
Rafael J. Wysocki
57dc3bcd45 cpufreq: governor: Move rate_mult to struct policy_dbs
The rate_mult field in struct od_cpu_dbs_info_s is used by the code
shared with the conservative governor and to access it that code
has to do an ugly governor type check.  However, first of all it
is ever only used for policy->cpu, so it is per-policy rather than
per-CPU and second, it is initialized to 1 by cpufreq_governor_start(),
so if the conservative governor never modifies it, it will have no
effect on the results of any computations.

For these reasons, move rate_mult to struct policy_dbs_info (as a
common field).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:04 +01:00
Rafael J. Wysocki
4cccf75557 cpufreq: governor: Get rid of the ->gov_check_cpu callback
The way the ->gov_check_cpu governor callback is used by the ondemand
and conservative governors is not really straightforward.  Namely, the
governor calls dbs_check_cpu() that updates the load information for
the policy and the invokes ->gov_check_cpu() for the governor.

To get rid of that entanglement, notice that cpufreq_governor_limits()
doesn't need to call dbs_check_cpu() directly.  Instead, it can simply
reset the sample delay to 0 which will cause a sample to be taken
immediately.  The result of that is practically equivalent to calling
dbs_check_cpu() except that it will trigger a full update of governor
internal state and not just the ->gov_check_cpu() part.

Following that observation, make cpufreq_governor_limits() reset
the sample delay and turn dbs_check_cpu() into a function that will
simply evaluate the load and return the result called dbs_update().

That function can now be called by governors from the routines that
previously were pointed to by ->gov_check_cpu and those routines
can be called directly by each governor instead of dbs_check_cpu().
This way ->gov_check_cpu becomes unnecessary, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:41:04 +01:00
Viresh Kumar
a23d6d1809 cpufreq: ondemand: Rearrange od_dbs_timer() to avoid updating delay
Avoid extra checks in od_dbs_timer() by rearranging updates to the
local delay variable in it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:02 +01:00
Viresh Kumar
aded387b94 cpufreq: conservative: Update sample_delay_ns immediately
The ondemand governor already updates sample_delay_ns immediately on
updates to the sampling rate, but conservative doesn't do that.

It was left out earlier as the code was really too complex to get
that done easily.  Things are sorted out very well now, however, and
the conservative governor can be modified to follow ondemand in that
respect.

Moreover, since the code needed to implement that in the
conservative governor would be identical to the corresponding
ondemand governor's code, make that code common and change both
governors to use it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:41:01 +01:00
Viresh Kumar
c54df07184 cpufreq: governor: Create and traverse list of policy_dbs to avoid deadlock
The dbs_data_mutex lock is currently used in two places.  First,
cpufreq_governor_dbs() uses it to guarantee mutual exclusion between
invocations of governor operations from the core.  Second, it is used by
ondemand governor's update_sampling_rate() to ensure the stability of
data structures walked by it.

The second usage is quite problematic, because update_sampling_rate() is
called from a governor sysfs attribute's ->store callback and that leads
to a deadlock scenario involving cpufreq_governor_exit() which runs
under dbs_data_mutex.  Thus it is better to rework the code so
update_sampling_rate() doesn't need to acquire dbs_data_mutex.

To that end, rework update_sampling_rate() to walk a list of policy_dbs
objects supported by the dbs_data one it has been called for (instead of
walking cpu_dbs_info object for all CPUs).  The list manipulation is
protected with dbs_data->mutex which also is held around the execution
of update_sampling_rate(), it is not necessary to hold dbs_data_mutex in
that function any more.

Reported-by: Juri Lelli <juri.lelli@arm.com>
Reported-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:59 +01:00
Viresh Kumar
c443563036 cpufreq: governor: New sysfs show/store callbacks for governor tunables
The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use the same governor with different tunables at the
same time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog + rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar
ff4b17895e cpufreq: governor: Move common tunables to 'struct dbs_data'
There are a few common tunables shared between the ondemand and
conservative governors.  Move them to struct dbs_data to simplify
code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:58 +01:00
Viresh Kumar
d0684d3b89 cpufreq: governor: Create generic macro for common tunables
Some tunables are present in governor-specific structures, whereas one
(min_sampling_rate) is located directly in struct dbs_data.

There is a special macro for creating its sysfs attribute and the
show/store callbacks, but since more tunables are going to be moved
to struct dbs_data, a new generic macro for such cases will be useful,
so add it and use it for min_sampling_rate.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Juri Lelli <juri.lelli@arm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-09 14:40:57 +01:00
Rafael J. Wysocki
bc505475b8 cpufreq: governor: Rearrange governor data structures
The struct policy_dbs_info objects representing per-policy governor
data are not accessible directly from the corresponding policy
objects.  To access them, one has to get a pointer to the
struct cpu_dbs_info of policy->cpu and use the policy_dbs field of
that which isn't really straightforward.

To address that rearrange the governor data structures so the
governor_data pointer in struct cpufreq_policy will point to
struct policy_dbs_info (instead of struct dbs_data) and that will
contain a pointer to struct dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:56 +01:00
Rafael J. Wysocki
d10b5eb5fc cpufreq: governor: Drop cpu argument from dbs_check_cpu()
Since policy->cpu is always passed as the second argument to
dbs_check_cpu(), it is not really necessary to pass it, because
the function can obtain that value via its first argument just fine.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
e40e7b255e cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_info
The struct cpu_common_dbs_info structure represents the per-policy
part of the governor data (for the ondemand and conservative
governors), but its name doesn't reflect its purpose.

Rename it to struct policy_dbs_info and rename variables related to
it accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
ea59ee0dc9 cpufreq: governor: Drop the gov pointer from struct dbs_data
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it with the help
of container_of(), the additional gov pointer in struct dbs_data
isn't really necessary.

Drop that pointer and make the code using it reach the dbs_governor
object via policy->governor.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:55 +01:00
Rafael J. Wysocki
906a6e5aae cpufreq: governor: Rework cpufreq_governor_dbs()
Since it is possible to obtain a pointer to struct dbs_governor
from a pointer to the struct governor embedded in it via
container_of(), the second argument of cpufreq_governor_init()
is not necessary.  Accordingly, cpufreq_governor_dbs() doesn't
need its second argument either and the ->governor callbacks
for both the ondemand and conservative governors may be set
to cpufreq_governor_dbs() directly.  Make that happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
7bdad34d08 cpufreq: governor: Rename some data types and variables
The ondemand and conservative governors are represented by
struct common_dbs_data whose name doesn't reflect the purpose it
is used for, so rename it to struct dbs_governor and rename
variables of that type accordingly.

No functional changes.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
af92618523 cpufreq: governor: Put governor structure into common_dbs_data
For the ondemand and conservative governors (generally, governors
that use the common code in cpufreq_governor.c), there are two static
data structures representing the governor, the struct governor
structure (the interface to the cpufreq core) and the struct
common_dbs_data one (the interface to the cpufreq_governor.c code).

There's no fundamental reason why those two structures have to be
separate.  Moreover, if the struct governor one is included into
struct common_dbs_data, it will be possible to reach the latter from
the policy via its policy->governor pointer, so it won't be necessary
to pass a separate pointer to it around.  For this reason, embed
struct governor in struct common_dbs_data.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:54 +01:00
Rafael J. Wysocki
2bb8d94fb0 cpufreq: governor: Use common mutex for dbs_data protection
Every governor relying on the common code in cpufreq_governor.c
has to provide its own mutex in struct common_dbs_data.  However,
there actually is no need to have a separate mutex per governor
for this purpose, they may be using the same global mutex just
fine.  Accordingly, introduce a single common mutex for that and
drop the mutex field from struct common_dbs_data.

That at least will ensure that the mutex is always present and
initialized regardless of what the particular governors do.

Another benefit is that the common code does not need a pointer to
a governor-related structure to get to the mutex which sometimes
helps.

Finally, it makes the code generally easier to follow.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki
9be4fd2c77 cpufreq: governor: Replace timers with utilization update callbacks
Instead of using a per-CPU deferrable timer for queuing up governor
work items, register a utilization update callback that will be
invoked from the scheduler on utilization changes.

The sampling rate is still the same as what was used for the
deferrable timers and the added irq_work overhead should be offset by
the eliminated timers overhead, so in theory the functional impact of
this patch should not be significant.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
2016-03-09 14:40:53 +01:00
Rafael J. Wysocki
de1df26b7c cpufreq: Clean up default and fallback governor setup
The preprocessor magic used for setting the default cpufreq governor
(and for using the performance governor as a fallback one for that
matter) is really nasty, so replace it with __weak functions and
overrides.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-02-05 02:37:42 +01:00
Viresh Kumar
f08f638b9c cpufreq: ondemand: update update_sampling_rate() to make it more efficient
Currently update_sampling_rate() runs over each online CPU and
cancels/queues timers on all policy->cpus every time. This should be
done just once for any cpu belonging to a policy.

Create a cpumask and keep on clearing it as and when we process
policies, so that we don't have to traverse through all CPUs of the same
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09 22:26:12 +01:00
Viresh Kumar
70f43e5e79 cpufreq: governor: replace per-CPU delayed work with timers
cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09 22:26:00 +01:00
Viresh Kumar
affde5d06a cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()
Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and
dbs_data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:22 +01:00
Viresh Kumar
e68fe18c5b cpufreq: ondemand: Work is guaranteed to be pending
We are guaranteed to have works scheduled for policy->cpus, as the
policy isn't stopped yet. And so there is no need to check that again.
Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:21 +01:00
Viresh Kumar
e128c86407 cpufreq: ondemand: Update sampling rate only for concerned policies
We are comparing policy->governor against cpufreq_gov_ondemand to make
sure that we update sampling rate only for the concerned CPUs. But that
isn't enough.

In case of governor_per_policy, there can be multiple instances of
ondemand governor and we will always end up updating all of them with
current code. What we rather need to do, is to compare dbs_data with
poilcy->governor_data, which will match only for the policies governed
by dbs_data.

This code is also racy as the governor might be getting stopped at that
time and we may end up scheduling work for a policy, which we have just
disabled.

Fix that by protecting the entire function with &od_dbs_cdata.mutex,
which will prevent against races with policy START/STOP/etc.

After these locks are in place, we can safely get the policy via per-cpu
dbs_info.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-07 02:20:10 +01:00
Viresh Kumar
083701b13c cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
'timer_mutex' is required to sync work-handlers of policy->cpus.
update_sampling_rate() is just canceling the works and queuing them
again. This isn't protecting anything at all in update_sampling_rate()
and is not gonna be of any use.

Even if a work-handler is already running for a CPU,
cancel_delayed_work_sync() will wait for it to finish.

Drop these unnecessary locks.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-28 09:20:04 +01:00
Viresh Kumar
43e0ee361e cpufreq: governor: split out common part of {cs|od}_dbs_timer()
Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is
unnecessarily duplicated.

Create the real work-handler in cpufreq_governor.c and put the common
code in this routine (dbs_timer()).

Shouldn't make any functional change.

Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-21 01:12:01 +02:00
Viresh Kumar
44152cb82d cpufreq: governor: Keep single copy of information common to policy->cpus
Some information is common to all CPUs belonging to a policy, but are
kept on per-cpu basis. Lets keep that in another structure common to all
policy->cpus. That will make updates/reads to that less complex and less
error prone.

The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so
that it we don't reallocate it for STOP/START sequence. It will be also
be used (in next patch) while the governor is stopped and so must not be
freed that early.

Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-21 01:12:01 +02:00
Viresh Kumar
42994af63c cpufreq: governor: rename cur_policy as policy
Just call it 'policy', cur_policy is unnecessarily long and doesn't
have any special meaning.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-17 23:46:48 +02:00
Viresh Kumar
386d46e6d5 cpufreq: governor: Name delayed-work as dwork
Delayed work was named as 'work' and to access work within it we do
work.work. Not much readable. Rename delayed_work as 'dwork'.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-17 23:46:47 +02:00
Viresh Kumar
732b6d617a cpufreq: governor: Serialize governor callbacks
There are several races reported in cpufreq core around governors (only
ondemand and conservative) by different people.

There are at least two race scenarios present in governor code:
 (a) Concurrent access/updates of governor internal structures.

 It is possible that fields such as 'dbs_data->usage_count', etc.  are
 accessed simultaneously for different policies using same governor
 structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
 because of this we can dereference bad pointers.

 For example consider a system with two CPUs with separate 'struct
 cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
 CPU0 switching to powersave and CPU1 to ondemand:
	CPU0				CPU1

	store*				store*

	cpufreq_governor_exit()		cpufreq_governor_init()
					dbs_data = cdata->gdbs_data;

	if (!--dbs_data->usage_count)
		kfree(dbs_data);

					dbs_data->usage_count++;
					*Bad pointer dereference*

 There are other races possible between EXIT and START/STOP/LIMIT as
 well. Its really complicated.

 (b) Switching governor state in bad sequence:

 For example trying to switch a governor to START state, when the
 governor is in EXIT state. There are some checks present in
 __cpufreq_governor() but they aren't sufficient as they compare events
 against 'policy->governor_enabled', where as we need to take governor's
 state into account, which can be used by multiple policies.

These two issues need to be solved separately and the responsibility
should be properly divided between cpufreq and governor core.

The first problem is more about the governor core, as it needs to
protect its structures properly. And the second problem should be fixed
in cpufreq core instead of governor, as its all about sequence of
events.

This patch is trying to solve only the first problem.

There are two types of data we need to protect,
- 'struct common_dbs_data': No matter what, there is going to be a
  single copy of this per governor.
- 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
  will have per-policy copy of this data, otherwise a single copy.

Because of such complexities, the mutex present in 'struct dbs_data' is
insufficient to solve our problem. For example we need to protect
fetching of 'dbs_data' from different structures at the beginning of
cpufreq_governor_dbs(), to make sure it isn't currently being updated.

This can be fixed if we can guarantee serialization of event parsing
code for an individual governor. This is best solved with a mutex per
governor, and the placeholder for that is 'struct common_dbs_data'.

And so this patch moves the mutex from 'struct dbs_data' to 'struct
common_dbs_data' and takes it at the beginning and drops it at the end
of cpufreq_governor_dbs().

Tested with and without following configuration options:

CONFIG_LOCKDEP_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15 15:42:53 +02:00
Viresh Kumar
8e0484d2b3 cpufreq: governor: register notifier from cs_init()
Notifiers are required only for conservative governor and the common
governor code is unnecessarily polluted with that. Handle that from
cs_init/exit() instead of cpufreq_governor_dbs().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-06-15 15:37:12 +02:00
Stratos Karafotis
6393d6a102 cpufreq: ondemand: Eliminate the deadband effect
Currently, ondemand calculates the target frequency proportional to load
using the formula:
	Target frequency = C * load
	where C = policy->cpuinfo.max_freq / 100

Though, in many cases, the minimum available frequency is pretty high and
the above calculation introduces a dead band from load 0 to
100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target
frequency is always calculated to less than policy->cpuinfo.min_freq and
the minimum frequency is selected.

For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000
and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU
starts to scale up at a load above 47.
On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000
and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale
at load above 25.

Change the calculation of target frequency to eliminate the above effect using
the formula:

	Target frequency = A + B * load
	where A = policy->cpuinfo.min_freq and
	      B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100

This will map load values 0 to 100 linearly to cpuinfo.min_freq to
cpuinfo.max_freq.

Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the
closest frequency in frequency_table. This is necessary to avoid selection
of minimum frequency only when load equals to 0. It will also help for selection
of frequencies using a more 'fair' criterion.

Tables below show the difference in selected frequency for specific values
of load without and with this patch. On Intel i7-3770 @ 3.40GHz:
	Without			With
Load	Target	Selected	Target	Selected
0	0	1600000		1600000	1600000
5	170050	1600000		1690050	1700000
10	340100	1600000		1780100	1700000
15	510150	1600000		1870150	1900000
20	680200	1600000		1960200	2000000
25	850250	1600000		2050250	2100000
30	1020300	1600000		2140300	2100000
35	1190350	1600000		2230350	2200000
40	1360400	1600000		2320400	2400000
45	1530450	1600000		2410450	2400000
50	1700500	1900000		2500500	2500000
55	1870550	1900000		2590550	2600000
60	2040600	2100000		2680600	2600000
65	2210650	2400000		2770650	2800000
70	2380700	2400000		2860700	2800000
75	2550750	2600000		2950750	3000000
80	2720800	2800000		3040800	3000000
85	2890850	2900000		3130850	3100000
90	3060900	3100000		3220900	3300000
95	3230950	3300000		3310950	3300000
100	3401000	3401000		3401000	3401000

On ARM quad core 1500MHz Krait:
	Without			With
Load	Target	Selected	Target	Selected
0	0	384000		384000	384000
5	75600	384000		440400	486000
10	151200	384000		496800	486000
15	226800	384000		553200	594000
20	302400	384000		609600	594000
25	378000	384000		666000	702000
30	453600	486000		722400	702000
35	529200	594000		778800	810000
40	604800	702000		835200	810000
45	680400	702000		891600	918000
50	756000	810000		948000	918000
55	831600	918000		1004400	1026000
60	907200	918000		1060800	1026000
65	982800	1026000		1117200	1134000
70	1058400	1134000		1173600	1134000
75	1134000	1134000		1230000	1242000
80	1209600	1242000		1286400	1242000
85	1285200	1350000		1342800	1350000
90	1360800	1458000		1399200	1350000
95	1436400	1458000		1455600	1458000
100	1512000	1512000		1512000	1512000

Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
(Android smartphone).
Benchmarks on Intel i7 shows a performance improvement on low and medium
work loads with lower power consumption. Specifics:

Phoronix Linux Kernel Compilation 3.1:
Time: -0.40%, energy: -0.07%
Phoronix Apache:
Time: -4.98%, energy: -2.35%
Phoronix FFMPEG:
Time: -6.29%, energy: -4.02%

Also, running mp3 decoding (very low load) shows no differences with and
without this patch.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:43:19 +02:00
Stratos Karafotis
880eef0416 cpufreq: ondemand: Remove redundant return statement
After commit dfa5bb6225 (cpufreq: ondemand: Change the calculation
of target frequency), this return statement is no longer needed.

Reported-by: Henrik Nilsson <Karl.Henrik.Nilsson@gmail.com>
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-01 00:44:34 +01:00
Stratos Karafotis
934dac1ea0 cpufreq: governors: Remove duplicate check of target freq in supported range
Function __cpufreq_driver_target() checks if target_freq is within
policy->min and policy->max range. generic_powersave_bias_target() also
checks if target_freq is valid via a cpufreq_frequency_table_target()
call. So, drop the unnecessary duplicate check in *_check_cpu().

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-28 22:03:02 +02:00
Rafael J. Wysocki
c49a089c3e Merge back earlier 'pm-cpufreq' material 2013-08-14 22:21:16 +02:00
Viresh Kumar
d5b73cd870 cpufreq: Use sizeof(*ptr) convetion for computing sizes
Chapter 14 of Documentation/CodingStyle says:

The preferred form for passing a size of a struct is the following:

	p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts
readability and introduces an opportunity for a bug when the pointer
variable type is changed but the corresponding sizeof that is passed
to a memory allocator is not.

This wasn't followed consistently in drivers/cpufreq, let's make it
more consistent by always following this rule.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar
3a3e9e06d0 cpufreq: Give consistent names to cpufreq_policy objects
They are called policy, cur_policy, new_policy, data, etc.  Just call
them policy wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar
5ff0a26803 cpufreq: Clean up header files included in the core
This patch addresses the following issues in the header files in the
cpufreq core:
 - Include headers in ascending order, so that we don't add same
   many times by mistake.
 - <asm/> must be included after <linux/>, so that they override
   whatever they need to.
 - Remove unnecessary includes.
 - Don't include files already included by cpufreq.h or
   cpufreq_governor.h.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:09 +02:00
Viresh Kumar
6c4640c3ad cpufreq: rename ignore_nice as ignore_nice_load
This sysfs file was called ignore_nice_load earlier and commit
4d5dcc4 (cpufreq: governor: Implement per policy instances of
governors) changed its name to ignore_nice by mistake.

Lets get it renamed back to its original name.

Reported-by: Martin von Gagern <Martin.vGagern@gmx.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 22:25:06 +02:00