Commit Graph

450 Commits

Author SHA1 Message Date
Viresh Kumar
d9f354460d cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications
CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which
doesn't use it anymore. Remove them.

This also decrements values of other notification macros defined after
CPUFREQ_UPDATE_POLICY_CPU by 1 to remove gaps. Hopefully all users are using
macro's instead of direct numbers and so they wouldn't break as macro values are
changed now.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 23:06:44 +01:00
Viresh Kumar
7c418ff099 cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy
'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of
it.

Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 23:06:44 +01:00
Viresh Kumar
818c57126e cpufreq: move some initialization stuff to cpufreq_policy_alloc()
We need to initialize completion and work only on policy allocation and not
really on the policy restore side and so we better move this piece of code to
cpufreq_policy_alloc().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:35 +01:00
Viresh Kumar
ce1bcfe94d cpufreq: check cpufreq_policy_list instead of scanning policies for all CPUs
CPUFREQ_STICKY flag is set by drivers which don't want to get unregistered
even if cpufreq-core isn't able to initialize policy for any CPU.

When this flag isn't set, we try to unregister the driver. To find out
which CPUs are registered and which are not, we try to check per_cpu
cpufreq_cpu_data for all CPUs. Because we have a list of valid policies
available now, we better check if the list is empty or not instead of
the 'for' loop. That will be much more efficient.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:35 +01:00
Viresh Kumar
39c132eebd cpufreq: limit the scope of l_p_j variables
These variables are just used within adjust_jiffies() and so must be
local to it. Also there is no need of a dummy routine for CONFIG_SMP
case as we can take care of all that with help of macros in the same
routine. It doesn't look that ugly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
d7a9771c1a cpufreq: use light-weight cpufreq_cpu_get_raw() in __cpufreq_add_dev()
We just need to check if a 'policy' is already present for the cpu we are
adding. We don't need to take all the locks and do kobject usage updates. Use
the light-weight cpufreq_cpu_get_raw() routine instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
7f0c020ab6 cpufreq: get rid of 'tpolicy' from __cpufreq_add_dev()
There is no need of this separate variable, use 'policy' instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
22a7cfb014 cpufreq: get rid of CONFIG_{HOTPLUG_CPU|SMP} mess
These are messing up more than the benefit they provide. It isn't
a lot of code anyway, that we will compile without them.

Kill them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
bc68b7dfda cpufreq: update driver_data->flags only if we are registering driver
We should first check if a cpufreq driver is already registered or not
before updating driver_data->flags.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
d92d50a462 cpufreq: pass policy to __cpufreq_get()
There is no point finding out the 'policy' again within __cpufreq_get()
when all the callers already have it. Just make them pass policy instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
a1e1dc41c4 cpufreq: pass policy to cpufreq_out_of_sync
There is no point finding out the 'policy' again within cpufreq_out_of_sync()
when all the callers already have it. Just make them pass policy instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
2e1cc3a5d7 cpufreq: No need to check for has_target()
Either we can be setpolicy or target type, nothing else. And so the
else part of setpolicy will automatically be of has_target() type.
And so we don't need to check it again.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:34 +01:00
Viresh Kumar
42f91fa116 cpufreq: s/__find_governor/find_governor
Remove unnecessary from find_governor's name.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:33 +01:00
Viresh Kumar
db5f299574 cpufreq: merge 'if' blocks in __cpufreq_remove_dev_prepare()
There are two 'if' blocks here, checking for !cpufreq_driver->setpolicy and
has_target(). Both are actually doing the same thing, merge them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:33 +01:00
Viresh Kumar
09347b2905 cpufreq: don't need line break in show_scaling_cur_freq()
No need of an unnecessary line break.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:33 +01:00
Viresh Kumar
f13f1184a1 cpufreq: remove extra parenthesis
We can live without it and so we should.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:33 +01:00
Viresh Kumar
e673f1639b cpufreq: remove dangling comment
It doesn't make any sense at all and is a leftover of some earlier commit.
Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:49:33 +01:00
Doug Anderson
90de2a4aa9 cpufreq: suspend cpufreq governors on shutdown
We should stop cpufreq governors when we shut down the system.  If we
don't do this, we can end up with this deadlock:

1. cpufreq governor may be running on a CPU other than CPU0.
2. In machine_restart() we call smp_send_stop() which stops CPUs.
   If one of these CPUs was actively running a cpufreq governor
   then it may have the mutex / spinlock needed to access the main
   PMIC in the system (perhaps over I2C)
3. If a machine needs access to the main PMIC in order to shutdown
   then it will never get it since the mutex was lost when the other
   CPU stopped.
4. We'll hang (possibly eventually hitting the hard lockup detector).

Let's avoid the problem by stopping the cpufreq governor at shutdown,
which is a sensible thing to do anyway.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-01-23 22:20:30 +01:00
Ethan Zhao
cb57720bf7 cpufreq: fix a NULL pointer dereference in __cpufreq_governor()
If ACPI _PPC changed notification happens before governor was initiated
while kernel is booting, a NULL pointer dereference will be triggered:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
 IP: [<ffffffff81470453>] __cpufreq_governor+0x23/0x1e0
 PGD 0
 Oops: 0000 [#1] SMP
 ... ...
 RIP: 0010:[<ffffffff81470453>]  [<ffffffff81470453>]
 __cpufreq_governor+0x23/0x1e0
 RSP: 0018:ffff881fcfbcfbb8  EFLAGS: 00010286
 RAX: 0000000000000000 RBX: ffff881fd11b3980 RCX: ffff88407fc20000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd11b3980
 RBP: ffff881fcfbcfbd8 R08: 0000000000000000 R09: 000000000000000f
 R10: ffffffff818068d0 R11: 0000000000000043 R12: 0000000000000004
 R13: 0000000000000000 R14: ffffffff8196cae0 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffff881fffc00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000030 CR3: 00000000018ae000 CR4: 00000000000407f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process kworker/0:3 (pid: 750, threadinfo ffff881fcfbce000, task
 ffff881fcf556400)
 Stack:
  ffff881fffc17d00 ffff881fcfbcfc18 ffff881fd11b3980 0000000000000000
  ffff881fcfbcfc08 ffffffff81470d08 ffff881fd11b3980 0000000000000007
  ffff881fcfbcfc18 ffff881fffc17d00 ffff881fcfbcfd28 ffffffff81472e9a
 Call Trace:
  [<ffffffff81470d08>] __cpufreq_set_policy+0x1b8/0x2e0
  [<ffffffff81472e9a>] cpufreq_update_policy+0xca/0x150
  [<ffffffff81472f20>] ? cpufreq_update_policy+0x150/0x150
  [<ffffffff81324a96>] acpi_processor_ppc_has_changed+0x71/0x7b
  [<ffffffff81320bcd>] acpi_processor_notify+0x55/0x115
  [<ffffffff812f9c29>] acpi_device_notify+0x19/0x1b
  [<ffffffff813084ca>] acpi_ev_notify_dispatch+0x41/0x5f
  [<ffffffff812f64a4>] acpi_os_execute_deferred+0x27/0x34

The root cause is a race conditon -- cpufreq core and acpi-cpufreq driver
were initiated, but cpufreq_governor wasn't and _PPC changed notification
happened, __cpufreq_governor() was called within acpi_os_execute_deferred
kernel thread context.

To fix this panic issue, add pointer checking code in __cpufreq_governor()
before pointer policy->governor is to be dereferenced.

Signed-off-by: Ethan Zhao <ethan.zhao@oracle.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-12-19 22:49:07 +01:00
Viresh Kumar
7c45cf31b3 cpufreq: Introduce ->ready() callback for cpufreq drivers
Currently there is no callback for cpufreq drivers which is called once the
policy is ready to be used. There are some requirements where such a callback is
required.

One of them is registering a cooling device with the help of
of_cpufreq_cooling_register(). This routine tries to get 'struct cpufreq_policy'
for CPUs which isn't yet initialed at the time ->init() is called and so we face
issues while registering the cooling device.

Because we can't register cooling device from ->init(), we need a callback that
is called after the policy is ready to be used and hence we introduce ->ready()
callback.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-29 23:38:38 +01:00
Tomeu Vizoso
6d4e81ed89 cpufreq: Ref the policy object sooner
Do it before it's assigned to cpufreq_cpu_data, otherwise when a driver
tries to get the cpu frequency during initialization the policy kobj is
referenced and we get this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 64 at include/linux/kref.h:47 kobject_get+0x64/0x70()
Modules linked in:
CPU: 1 PID: 64 Comm: irq/77-tegra-ac Not tainted 3.18.0-rc4-next-20141114ccu-00050-g3eff942 #326
[<c0016fac>] (unwind_backtrace) from [<c001272c>] (show_stack+0x10/0x14)
[<c001272c>] (show_stack) from [<c06085d8>] (dump_stack+0x98/0xd8)
[<c06085d8>] (dump_stack) from [<c002892c>] (warn_slowpath_common+0x84/0xb4)
[<c002892c>] (warn_slowpath_common) from [<c00289f8>] (warn_slowpath_null+0x1c/0x24)
[<c00289f8>] (warn_slowpath_null) from [<c0220290>] (kobject_get+0x64/0x70)
[<c0220290>] (kobject_get) from [<c03e944c>] (cpufreq_cpu_get+0x88/0xc8)
[<c03e944c>] (cpufreq_cpu_get) from [<c03e9500>] (cpufreq_get+0xc/0x64)
[<c03e9500>] (cpufreq_get) from [<c0285288>] (actmon_thread_isr+0x134/0x198)
[<c0285288>] (actmon_thread_isr) from [<c0069008>] (irq_thread_fn+0x1c/0x40)
[<c0069008>] (irq_thread_fn) from [<c0069324>] (irq_thread+0x134/0x174)
[<c0069324>] (irq_thread) from [<c0040290>] (kthread+0xdc/0xf4)
[<c0040290>] (kthread) from [<c000f4b8>] (ret_from_fork+0x14/0x3c)
---[ end trace b7bd64a81b340c59 ]---

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-25 22:44:17 +01:00
Rafael J. Wysocki
bd2a0f6754 Merge back cpufreq material for 3.19-rc1. 2014-11-18 01:22:29 +01:00
Vince Hsu
619c144c84 cpufreq: respect the min/max settings from user space
When the user space tries to set scaling_(max|min)_freq through
sysfs, the cpufreq_set_policy() asks other driver's opinions
for the max/min frequencies. Some device drivers, like Tegra
CPU EDP which is not upstreamed yet though, may constrain the
CPU maximum frequency dynamically because of board design.
So if the user space access happens and some driver is capping
the cpu frequency at the same time, the user_policy->(max|min)
is overridden by the capped value, and that's not expected by
the user space. And if the user space is not invoked again,
the CPU will always be capped by the user_policy->(max|min)
even no drivers limit the CPU frequency any more.

This patch preserves the user specified min/max settings, so that
every time the cpufreq policy is updated, the new max/min can
be re-evaluated correctly based on the user's expection and
the present device drivers' status.

Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-11 23:04:28 +01:00
Geert Uytterhoeven
09712f557b cpufreq: Avoid crash in resume on SMP without OPP
When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:

    Enabling non-boot CPUs ...
    CPU1: Booted secondary processor
    Unable to handle kernel NULL pointer dereference at virtual address 0000003c
    pgd = ee5e6b00
    [0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
    Internal error: Oops: a07 [#1] SMP ARM
    Modules linked in:
    CPU: 0 PID: 1246 Comm: s2ram Tainted: G        W      3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589
    task: eeec5240 ti: ee704000 task.ti: ee704000
    PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
    LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
    pc : [<c0298efc>]    lr : [<c0298ef4>]    psr: 60000153
    sp : ee705d48  ip : ee705d48  fp : ee705d84
    r10: c04e0450  r9 : 00000000  r8 : 00000001
    r7 : c05426a8  r6 : 00000001  r5 : 00000001  r4 : 00000000
    r3 : 00000000  r2 : 00000000  r1 : 20000153  r0 : c0542734

Verify that policy is not NULL before dereferencing it to fix this.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 8414809c6a (cpufreq: Preserve policy structure across suspend/resume)
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-08 02:10:04 +01:00
Dirk Brandewie
c034b02e21 cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers
Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-23 22:59:59 +02:00
Thomas Petazzoni
51315cdfa0 cpufreq: allow driver-specific data
This commit extends the cpufreq_driver structure with an additional
'void *driver_data' field that can be filled by the ->probe() function
of a cpufreq driver to pass additional custom information to the
driver itself.

A new function called cpufreq_get_driver_data() is added to allow a
cpufreq driver to retrieve those driver data, since they are typically
needed from a cpufreq_policy->init() callback, which does not have
access to the cpufreq_driver structure. This function call is similar
to the existing cpufreq_get_current_driver() function call.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-10-21 00:51:01 +02:00
Rafael J. Wysocki
6f1293ff74 Merge back cpufreq material for v3.18. 2014-10-03 15:41:16 +02:00
Viresh Kumar
b1b12babe3 cpufreq: update 'cpufreq_suspended' after stopping governors
Commit 8e30444e15 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
introduced a bug where the governors wouldn't be stopped anymore for
->target{_index}() drivers during suspend. This happens because
'cpufreq_suspended' is updated before stopping the governors during suspend
and due to this __cpufreq_governor() would return early due to this check:

	/* Don't start any governor operations if we are entering suspend */
	if (cpufreq_suspended)
		return 0;

Fixes: 8e30444e15 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+: 8e30444e15 "cpufreq: fix cpufreq suspend/resume for intel_pstate"
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-30 21:02:34 +02:00
Rasmus Villemoes
7c4f453970 cpufreq: Replace strnicmp with strncasecmp
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics
and a slightly buggy strncasecmp. The latter is the POSIX name, so
strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper
for the new strncasecmp to avoid breaking existing users.

To allow the compat wrapper strnicmp to be removed at some point in
the future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29 15:53:04 +02:00
Preeti U Murthy
789ca24374 cpufreq: Allow stop CPU callback to be used by all cpufreq drivers
Commit 367dc4aa93 ("cpufreq: Add stop CPU callback to
cpufreq_driver interface") introduced the stop CPU callback for
intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this
callback is invoked so that drivers can take some action on the
pstate of the cpu before it is taken offline. This callback was
assumed to be useful only for those drivers which have implemented
the set_policy CPU callback because they have no other way to take
action about the cpufreq of a CPU which is being hotplugged out
except in the exit callback which is called very late in the offline
process.

The drivers which implement the target/target_index callbacks were
expected to take care of requirements like the ones that commit
367dc4aa addresses in the GOV_STOP notification event. But there
are disadvantages to restricting the usage of stop CPU callback
to cpufreq drivers that implement the set_policy callbacks and who
want to take explicit action on the setting the cpufreq during a
hotplug operation.

1.GOV_STOP gets called for every CPU offline and drivers would usually
want to take action when the last cpu in the policy->cpus mask
is taken offline. As long as there is more than one cpu in the
policy->cpus mask, cpufreq core itself makes sure that the freq
for the other cpus in this mask is set according to the maximum load.
This is sensible and drivers which implement the target_index callback
would mostly not want to modify that. However the cpufreq core leaves a
loose end when the cpu in the policy->cpus mask is the last one to go offline;
it does nothing explicit to the frequency of the core. Drivers may need
a way to take some action here and stop CPU callback mechanism is the
best way to do it today.

2. We cannot implement driver specific actions in the GOV_STOP mechanism.
So we will need another driver callback which is invoked from here which is
unnecessary.

Therefore this patch extends the usage of stop CPU callback to be used
by all cpufreq drivers as long as they have this callback implemented
and irrespective of whether they are set_policy/target_index drivers.
The assumption is if the drivers find the GOV_STOP path to be a suitable
way of implementing what they want to do with the freq of the cpu
going offine,they will not implement the stop CPU callback at all.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-29 15:47:12 +02:00
Prarit Bhargava
7106e02bae cpufreq: release policy->rwsem on error
While debugging a cpufreq-related hardware failure on a system I saw the
following lockdep warning:

 =========================
 [ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G            E
 -------------------------
 insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
  (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
 3 locks held by insmod/2247:
  #0:  (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
  #1:  (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
  #2:  (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80

 stack backtrace:
 CPU: 0 PID: 2247 Comm: insmod Tainted: G            E  3.17.0-rc4+ #1
 Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013
  0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358
  ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400
  ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246
 Call Trace:
  [<ffffffff8171b358>] dump_stack+0x4d/0x66
  [<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180
  [<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80
  [<ffffffff8121412b>] kfree+0xab/0x2b0
  [<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80
  [<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10
  [<ffffffff814855d1>] subsys_interface_register+0xc1/0x120
  [<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340
  [<ffffffff8121415a>] ? kfree+0xda/0x2b0
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq]
  [<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
  [<ffffffff81002144>] do_one_initcall+0xd4/0x210
  [<ffffffff811f7472>] ? __vunmap+0xd2/0x120
  [<ffffffff81127155>] load_module+0x1315/0x1b70
  [<ffffffff811222a0>] ? store_uevent+0x70/0x70
  [<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180
  [<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0
  [<ffffffff81725b69>] system_call_fastpath+0x16/0x1b
 cpufreq: __cpufreq_add_dev: ->get() failed
insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device

The warning occurs in the __cpufreq_add_dev() code which does

        down_write(&policy->rwsem);
	...
        if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
                policy->cur = cpufreq_driver->get(policy->cpu);
                if (!policy->cur) {
                        pr_err("%s: ->get() failed\n", __func__);
                        goto err_get_freq;
                }

If cpufreq_driver->get(policy->cpu) returns an error we execute the
code at err_get_freq, which does not up the policy->rwsem.  This causes
the lockdep warning.

Trivial patch to up the policy->rwsem in the error path.

After the patch has been applied, and an error occurs in the
cpufreq_driver->get(policy->cpu) call we will now see

cpufreq: __cpufreq_add_dev: ->get() failed
cpufreq: __cpufreq_add_dev: ->get() failed
modprobe: ERROR: could not insert 'pcc_cpufreq': No such device

Fixes: 4e97b631f2 (cpufreq: Initialize governor for a new policy under policy->rwsem)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-22 14:32:43 +02:00
Lan Tianyu
8e30444e15 cpufreq: fix cpufreq suspend/resume for intel_pstate
Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
for cpufreq drivers which have target or target_index callback. This
skips intel_pstate driver. This patch is to set the flag before checking
target or target_index callback.

Fixes: 2f0aea9363 (cpufreq: suspend governors on system suspend/hibernate)
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-09-22 14:23:18 +02:00
Viresh Kumar
1bfb425b3b cpufreq: move policy kobj to update_policy_cpu()
We are calling kobject_move() from two separate places currently and both these
places share another routine update_policy_cpu() which is handling everything
around updating policy->cpu. Moving ownership of policy->kobj also lies under
the role of update_policy_cpu() routine and must be handled from there.

So, Lets move kobject_move() to update_policy_cpu() and get rid of
cpufreq_nominate_new_policy_cpu() as it doesn't have anything significant left.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:43:20 +02:00
Viresh Kumar
41dfd908fc cpufreq: propagate error returned by kobject_move()
We are returning -EINVAL instead of the error returned from kobject_move() when
it fails. Propagate the actual error number.

Also add a meaningful print when sysfs_create_link() fails.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:43:20 +02:00
Viresh Kumar
1461dc7d1c cpufreq: don't restore policy->cpus on failure to move kobj
While hot-unplugging policy->cpu, we call cpufreq_nominate_new_policy_cpu() to
nominate next owner of policy, i.e. policy->cpu. If we fail to move policy
kobject under the new policy->cpu, we try to update policy->cpus with the old
policy->cpu.

This would have been required in case old-CPU is removed from policy->cpus in
the first place. But its not done before calling
cpufreq_nominate_new_policy_cpu(), but during the POST_DEAD notification which
happens quite late in the hot-unplugging path.

So, this is just some useless code hanging around, get rid of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-21 13:43:20 +02:00
Viresh Kumar
92c14bd947 cpufreq: move policy kobj to policy->cpu at resume
This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.

Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.

Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().

But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.

This bug got introduced with commit 42f921a, which overlooked this scenario.

To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.

Fixes: 42f921a6f1 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc:  3.13+ <stable@vger.kernel.org> # 3.13+
Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com>
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-17 14:23:22 +02:00
Aaron Plattner
fefa8ff810 cpufreq: unlock when failing cpufreq_update_policy()
Commit bd0fa9bb45 introduced a failure path to cpufreq_update_policy() if
cpufreq_driver->get(cpu) returns NULL.  However, it jumps to the 'no_policy'
label, which exits without unlocking any of the locks the function acquired
earlier.  This causes later calls into cpufreq to hang.

Fix this by creating a new 'unlock' label and jumping to that instead.

Fixes: bd0fa9bb45 ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()")
Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-18 21:52:20 +02:00
Viresh Kumar
1c03a2d04d cpufreq: add support for intermediate (stable) frequencies
Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.

While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.

For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.

To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.

get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().

NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-05 23:32:29 +02:00
Viresh Kumar
8d65775d17 cpufreq: handle calls to ->target_index() in separate routine
Handling calls to ->target_index() has got complex over time and might become
more complex. So, its better to take target_index() bits out in another routine
__target_index() for better code readability. Shouldn't have any functional
impact.

Tested-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-29 01:27:38 +02:00
Stratos Karafotis
5eeaf1f189 cpufreq: Fix build error on some platforms that use cpufreq_for_each_*
On platforms that use cpufreq_for_each_* macros, build fails if
CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform:

drivers/built-in.o: In function `clk_round_parent':
clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
drivers/built-in.o: In function `clk_rate_table_find':
clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
make[3]: *** [vmlinux] Error 1

Fix this making cpufreq_next_valid function inline and move it to
cpufreq.h.

Fixes: 27e289dce2 (cpufreq: Introduce macros for cpufreq_frequency_table iteration)
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-08 13:10:56 +02:00
Srivatsa S. Bhat
ca654dc3a9 cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end
Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).

Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.

We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().

We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
for 2 reasons:

1. At the moment, we have no way to avoid a particular scenario where this
   debug infrastructure can emit false-positive warnings for such drivers.
   The scenario is depicted below:

         Task A						Task B

    /* 1st freq transition */
    Invoke _begin() {
            ...
            ...
    }

    Change the frequency

    /* 2nd freq transition */
    Invoke _begin() {
	    ...	//waiting for B to
            ... //finish _end() for
	    ... //the 1st transition
	    ...	      |				Got interrupt for successful
	    ...	      |				change of frequency (1st one).
	    ...       |
	    ...	      |				/* 1st freq transition */
	    ...	      |				Invoke _end() {
	    ...	      |					...
	    ...	      V				}
	    ...
	    ...
    }

   This scenario is actually deadlock-free because, once Task A changes the
   frequency, it is Task B's responsibility to invoke the corresponding
   _end() for the 1st frequency transition. Hence it is perfectly legal for
   Task A to go ahead and attempt another frequency transition in the meantime.
   (Of course it won't be able to proceed until Task B finishes the 1st _end(),
   but this doesn't cause a deadlock or a hang).

   The debug infrastructure cannot handle this scenario and will treat it as
   a deadlock and print a warning. To avoid this, we exclude such drivers
   from the purview of this code.

2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
   at all! The cpufreq core does not automatically invoke the _begin() and
   _end() APIs during frequency transitions in such drivers. Thus, the driver
   alone is responsible for invoking _begin()/_end() and hence there shouldn't
   be any conflicts which lead to double invocations. So, we can skip these
   drivers, since the probability that such drivers will hit this problem is
   extremely low, as outlined above.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-07 00:32:39 +02:00
Stratos Karafotis
27e289dce2 cpufreq: Introduce macros for cpufreq_frequency_table iteration
Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.

This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:

- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.

It should have no functional changes.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-04-30 00:05:31 +02:00
Viresh Kumar
236a980052 cpufreq: Make cpufreq_notify_transition & cpufreq_notify_post_transition static
cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be
called directly by cpufreq drivers anymore and so these should be marked static.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26 16:41:41 +01:00
Viresh Kumar
8fec051eea cpufreq: Convert existing drivers to use cpufreq_freq_transition_{begin|end}
CPUFreq core has new infrastructure that would guarantee serialized calls to
target() or target_index() callbacks. These are called
cpufreq_freq_transition_begin() and cpufreq_freq_transition_end().

This patch converts existing drivers to use these new set of routines.

Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26 16:41:41 +01:00
Srivatsa S. Bhat
12478cf0c5 cpufreq: Make sure frequency transitions are serialized
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.

The following examples illustrate why this is important:

Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()

The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().

If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A

We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.

Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.

Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().

Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)

A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage

Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:

X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens

Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.

This patch introduces a set of synchronization primitives to serialize frequency
transitions, which are to be used as shown below:

cpufreq_freq_transition_begin();

//Perform the frequency change

cpufreq_freq_transition_end();

The _begin() call sends the PRECHANGE notification whereas the _end() call sends
the POSTCHANGE notification. Also, all the necessary synchronization is handled
within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION
flag can also use these APIs for performing frequency transitions (ie., you can
call _begin() from one task, and call the corresponding _end() from a different
task).

The actual synchronization underneath is not that complicated:

The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).

To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
spinlock is used to ensure that only one such "flow" is in flight at any given
time. Put together, this provides us all the necessary synchronization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26 16:41:40 +01:00
Viresh Kumar
0c5aa405a9 cpufreq: resume drivers before enabling governors
During suspend, we first stop governors and then suspend cpufreq drivers and
resume must be exactly opposite of that. i.e. resume drivers first and then
start governors.

But the current code in resume enables governors first and then resume drivers.
Fix it be changing code sequence there.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-26 16:37:18 +01:00
Dirk Brandewie
367dc4aa93 cpufreq: Add stop CPU callback to cpufreq_driver interface
This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified.  This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away.  This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
[rjw: Minor modifications]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 03:50:12 +01:00
Stratos Karafotis
bda9f552f9 cpufreq: Remove unnecessary braces
Remove unnecessary braces from a single statement.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 03:40:48 +01:00
Stratos Karafotis
e5c87b7628 cpufreq: Fix checkpatch errors and warnings
Fix 2 checkpatch errors about using assignment in if condition,
1 checkpatch error about a required space after comma
and 3 warnings about line over 80 characters.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-20 03:39:28 +01:00
Viresh Kumar
0b443ead71 cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE}
Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have
not been used for some time, so remove them to clean up code a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-19 14:10:24 +01:00
Rafael J. Wysocki
9832235f3f cpufreq: Do not allow ->setpolicy drivers to provide ->target
cpufreq drivers that provide the ->setpolicy() callback are supposed
to have integrated governors, so they don't need to set ->target()
or ->target_index() and may confuse the core if any of these callbacks
is present.

For this reason, add a check preventing ->setpolicy cpufreq drivers
from registering if they have non-NULL ->target or ->target_index.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2014-03-19 12:48:30 +01:00
Rafael J. Wysocki
15afee3aea Merge back earlier 'pm-cpufreq' material. 2014-03-17 13:51:39 +01:00
Rafael J. Wysocki
2ed99e39cb cpufreq: Skip current frequency initialization for ->setpolicy drivers
After commit da60ce9f2f (cpufreq: call cpufreq_driver->get() after
calling ->init()) __cpufreq_add_dev() sometimes fails for CPUs handled
by intel_pstate, because that driver may return 0 from its ->get()
callback if it has not run long enough to collect enough samples on the
given CPU.  That didn't happen before commit da60ce9f2f which added
policy->cur initialization to __cpufreq_add_dev() to help reduce code
duplication in other cpufreq drivers.

However, the code added by commit da60ce9f2f need not be executed
for cpufreq drivers having the ->setpolicy callback defined, because
the subsequent invocation of cpufreq_set_policy() will use that
callback to initialize the policy anyway and it doesn't need
policy->cur to be initialized upfront.  The analogous code in
cpufreq_update_policy() is also unnecessary for cpufreq drivers
having ->setpolicy set and may be skipped for them as well.

Since intel_pstate provides ->setpolicy, skipping the upfront
policy->cur initialization for cpufreq drivers with that callback
set will cover intel_pstate and the problem it's been having after
commit da60ce9f2f will be addressed.

Fixes: da60ce9f2f (cpufreq: call cpufreq_driver->get() after calling ->init())
References: https://bugzilla.kernel.org/show_bug.cgi?id=71931
Reported-and-tested-by: Patrik Lundquist <patrik.lundquist@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-13 00:37:16 +01:00
Viresh Kumar
96bbbe4a2a cpufreq: Remove unnecessary variable/parameter 'frozen'
We have used 'frozen' variable/function parameter at many places to
distinguish between CPU offline/online on suspend/resume vs sysfs
removals. We now have another variable cpufreq_suspended which can
be used in these cases, so we can get rid of all those variables or
function parameters.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12 01:06:01 +01:00
Viresh Kumar
e0b3165ba5 cpufreq: add 'freq_table' in struct cpufreq_policy
freq table is not per CPU but per policy, so it makes more sense to
keep it within struct cpufreq_policy instead of a per-cpu variable.

This patch does it. Over that, there is no need to set policy->freq_table
to NULL in ->exit(), as policy structure is going to be freed soon.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12 01:06:00 +01:00
Joe Perches
e837f9b58b cpufreq: Reformat printk() statements
- Add missing newlines
 - Coalesce format fragments
 - Convert printks to pr_<level>
 - Align arguments

Based-on-patch-by: Sören Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-12 00:49:22 +01:00
Viresh Kumar
e28867eab7 cpufreq: Implement cpufreq_generic_suspend()
Multiple platforms need to set CPUs to a particular frequency before
suspending the system, so provide a common infrastructure for them.

Those platforms only need to point their ->suspend callback pointers
to the generic routine.

Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 15:04:12 +01:00
Viresh Kumar
2f0aea9363 cpufreq: suspend governors on system suspend/hibernate
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the
tunables configuration for clusters/sockets with non-boot CPUs was
lost after system suspend/resume, as we were notifying governors with
CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy
which caused the tunables memory to be freed.

This is fixed by preventing any governor operations from being
carried out between the device suspend and device resume stages of
system suspend and resume, respectively.

We could have added these callbacks at dpm_{suspend|resume}_noirq()
level, but there is an additional problem that the majority of I/O
devices is already suspended at that point and if cpufreq drivers
want to change the frequency before suspending, then that not be
possible on some platforms (which depend on peripherals like i2c,
regulators, etc).

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 15:04:12 +01:00
viresh kumar
6e2c89d16d cpufreq: move call to __find_governor() to cpufreq_init_policy()
We call __find_governor() during the addition of the first CPU of
each policy from __cpufreq_add_dev() to find the last governor used
for this CPU before it was hot-removed.

After that we call cpufreq_parse_governor() in cpufreq_init_policy(),
either with this governor, or with the default governor. Right after
that policy->governor is set to NULL.

While that code is not functionally problematic, the structure of it
is suboptimal, because some of the code required in cpufreq_init_policy()
is being executed by its caller, __cpufreq_add_dev(). So, it would make
more sense to get all of it together in a single place to make code more
readable.

Accordingly, move the code needed for policy initialization to
cpufreq_init_policy() and initialize policy->governor to NULL at the
beginning.

In order to clean up the code a bit more, some of the #ifdefs for
CONFIG_HOTPLUG_CPU are dropped too.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 14:38:44 +01:00
Rafael J. Wysocki
3b4aff0472 Merge back earlier 'pm-cpufreq' material. 2014-03-06 13:25:59 +01:00
Viresh Kumar
4e97b631f2 cpufreq: Initialize governor for a new policy under policy->rwsem
policy->rwsem is used to lock access to all parts of code modifying
struct cpufreq_policy, but it's not used on a new policy created by
__cpufreq_add_dev().

Because of that, if cpufreq_update_policy() is called in a tight loop
on one CPU in parallel with offline/online of another CPU, then the
following crash can be triggered:

Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = c0003000
[00000020] *pgd=80000000004003, *pmd=00000000
Internal error: Oops: 206 [#1] PREEMPT SMP ARM

PC is at __cpufreq_governor+0x10/0x1ac
LR is at cpufreq_update_policy+0x114/0x150

---[ end trace f23a8defea6cd706 ]---
Kernel panic - not syncing: Fatal exception
CPU0: stopping
CPU: 0 PID: 7136 Comm: mpdecision Tainted: G      D W    3.10.0-gd727407-00074-g979ede8 #396

[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
[<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
[<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
[<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
[<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
[<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
[<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
[<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
[<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
[<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
[<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
[<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
[<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)

Fix that by taking locks at appropriate places in __cpufreq_add_dev()
as well.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 13:25:30 +01:00
Viresh Kumar
5a7e56a5d2 cpufreq: Initialize policy before making it available for others to use
Policy must be fully initialized before it is being made available
for use by others. Otherwise cpufreq_cpu_get() would be able to grab
a half initialized policy structure that might not have affected_cpus
(for example) populated. Then, anybody accessing those fields will get
a wrong value and that will lead to unpredictable results.

In order to fix this, do all the necessary initialization before we
make the policy structure available via cpufreq_cpu_get(). That will
guarantee that any code accessing fields of the policy will get
correct data from them.

Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 13:25:29 +01:00
Aaron Plattner
999976e0f6 cpufreq: use cpufreq_cpu_get() to avoid cpufreq_get() race conditions
If a module calls cpufreq_get while cpufreq is initializing, it's
possible for it to be called after cpufreq_driver is set but before
cpufreq_cpu_data is written during subsys_interface_register.  This
happens because cpufreq_get doesn't take the cpufreq_driver_lock
around its use of cpufreq_cpu_data.

Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather
than reading it out of cpufreq_cpu_data directly.  cpufreq_cpu_get()
takes the appropriate locks to prevent this race from happening.

Since it's possible for policy to be NULL if the caller passes in an
invalid CPU number or calls the function before cpufreq is initialized,
delete the BUG_ON(!policy) and simply return 0.  Don't try to return
-ENOENT because that's negative and the function returns an unsigned
integer.

References: https://bbs.archlinux.org/viewtopic.php?id=177934
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-06 13:25:16 +01:00
Viresh Kumar
bd0fa9bb45 cpufreq: Return error if ->get() failed in cpufreq_update_policy()
cpufreq_update_policy() calls cpufreq_driver->get() to get current
frequency of a CPU and it is not supposed to fail or return zero.
Return error in case that happens.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-01 01:39:30 +01:00
Rashika Kheria
8a5c74a175 cpufreq: Mark function as static in cpufreq.c
Mark function as static in cpufreq.c because it is not
used outside this file.

This eliminates the following warning in cpufreq.c:
drivers/cpufreq/cpufreq.c:355:9: warning: no previous prototype for ‘show_boost’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-27 00:49:36 +01:00
Viresh Kumar
1c0ca90207 cpufreq: don't call cpufreq_update_policy() on CPU addition
cpufreq_update_policy() is called from two places currently. From a
workqueue handled queued from cpufreq_bp_resume() for boot CPU and
from cpufreq_cpu_callback() whenever a CPU is added.

The first one makes sure that boot CPU is running on the frequency
present in policy->cpu. But we don't really need a call from
cpufreq_cpu_callback(), because we always call cpufreq_driver->init()
(which will set policy->cur correctly) whenever first CPU of any
policy is added back. And so every policy structure is guaranteed to
have the right frequency in policy->cur.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-24 13:37:43 +01:00
Rafael J. Wysocki
d9a789c7a0 cpufreq: Refactor cpufreq_set_policy()
Reduce the rampant usage of goto and the indentation level in
cpufreq_set_policy() to improve the readability of that code.

No functional changes should result from that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
2014-02-24 13:37:43 +01:00
viresh kumar
6964d91db2 cpufreq: remove sysfs link when a cpu != policy->cpu, is removed
Commit 42f921a (cpufreq: remove sysfs files for CPUs which failed to
come back after resume) tried to do this but missed this piece of code
to fix.

Currently we are getting this on suspend/resume:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 877 at fs/sysfs/dir.c:52 sysfs_warn_dup+0x68/0x84()
sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Modules linked in: brcmfmac brcmutil
CPU: 0 PID: 877 Comm: test-rtc-resume Not tainted 3.14.0-rc2-00259-g9398a10cd964 #12
[<c0015bac>] (unwind_backtrace) from [<c0011850>] (show_stack+0x10/0x14)
[<c0011850>] (show_stack) from [<c056e018>] (dump_stack+0x80/0xcc)
[<c056e018>] (dump_stack) from [<c0025e44>] (warn_slowpath_common+0x64/0x88)
[<c0025e44>] (warn_slowpath_common) from [<c0025efc>] (warn_slowpath_fmt+0x30/0x40)
[<c0025efc>] (warn_slowpath_fmt) from [<c012776c>] (sysfs_warn_dup+0x68/0x84)
[<c012776c>] (sysfs_warn_dup) from [<c0127a54>] (sysfs_do_create_link_sd+0xb0/0xb8)
[<c0127a54>] (sysfs_do_create_link_sd) from [<c038ef64>] (__cpufreq_add_dev.isra.27+0x2a8/0x814)
[<c038ef64>] (__cpufreq_add_dev.isra.27) from [<c038f548>] (cpufreq_cpu_callback+0x70/0x8c)
[<c038f548>] (cpufreq_cpu_callback) from [<c0043864>] (notifier_call_chain+0x44/0x84)
[<c0043864>] (notifier_call_chain) from [<c0025f60>] (__cpu_notify+0x28/0x44)
[<c0025f60>] (__cpu_notify) from [<c00261e8>] (_cpu_up+0xf0/0x140)
[<c00261e8>] (_cpu_up) from [<c0569eb8>] (enable_nonboot_cpus+0x68/0xb0)
[<c0569eb8>] (enable_nonboot_cpus) from [<c006339c>] (suspend_devices_and_enter+0x198/0x2dc)
[<c006339c>] (suspend_devices_and_enter) from [<c0063654>] (pm_suspend+0x174/0x1e8)
[<c0063654>] (pm_suspend) from [<c00624e0>] (state_store+0x6c/0xbc)
[<c00624e0>] (state_store) from [<c01fc200>] (kobj_attr_store+0x14/0x20)
[<c01fc200>] (kobj_attr_store) from [<c0126e50>] (sysfs_kf_write+0x44/0x48)
[<c0126e50>] (sysfs_kf_write) from [<c012a274>] (kernfs_fop_write+0xb4/0x14c)
[<c012a274>] (kernfs_fop_write) from [<c00d4818>] (vfs_write+0xa8/0x180)
[<c00d4818>] (vfs_write) from [<c00d4bb8>] (SyS_write+0x3c/0x70)
[<c00d4bb8>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30)
---[ end trace 76969904b614c18f ]---

Fix this by removing sysfs link for cpufreq directory when cpu removed
isn't policy->cpu.

Revamps: 42f921a (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-19 01:04:40 +01:00
Lukasz Majewski
6f19efc0a1 cpufreq: Add boost frequency support in core
This commit adds boost frequency support in cpufreq core (Hardware &
Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency
above its normal operation limits. Such mode shall be only used for a
short time.

Overclocking (boost) support is essentially provided by platform
dependent cpufreq driver.

This commit unifies support for SW and HW (Intel) overclocking solutions
in the core cpufreq driver. Previously the "boost" sysfs attribute was
defined in the ACPI processor driver code. By default boost is disabled.
One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost.

It only shows up when cpufreq driver supports overclocking.
Under the hood frequencies dedicated for boosting are marked with a
special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table.
It is the user's concern to enable/disable overclocking with a proper call
to sysfs.

The cpufreq_boost_trigger_state() function is defined non static on purpose.
It is used later with thermal subsystem to provide automatic enable/disable
of the BOOST feature.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17 02:00:44 +01:00
Viresh Kumar
652ed95d5f cpufreq: introduce cpufreq_generic_get() routine
CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(),
to get CPUs clk rate, have similar sort of code used in most of them.

This patch adds a generic ->get() which will do the same thing for them.
All those drivers are required to now is to set .get to cpufreq_generic_get()
and set their clk pointer in policy->clk during ->init().

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17 02:00:44 +01:00
Viresh Kumar
fcd7af917a cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly
There are several problems with cpufreq stats in the way it handles
cpufreq_unregister_driver() and suspend/resume..

 - We must not lose data collected so far when suspend/resume happens
   and so stats directories must not be removed/allocated during these
   operations, which is done currently.

 - cpufreq_stat has registered notifiers with both cpufreq and hotplug.
   It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
   and removes this directory with a notifier from hotplug core.

   In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
   stats directories per cpu aren't removed as CPUs are still online. The
   only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
   all CPUs except the last of each policy. And pointer to stat information
   is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
   But policy structure would be freed inside cpufreq core and so that will
   result in memory leak inside cpufreq stats (as we are never freeing
   memory for stats).

   Now if we again insert the module cpufreq_register_driver() will be
   called and we will again allocate stats data and put it on for first
   CPU of every policy.  In case we only have a single CPU per policy, we
   will return with a error from cpufreq_stats_create_table() due to this
   code:

	if (per_cpu(cpufreq_stats_table, cpu))
		return -EBUSY;

   And so probably cpufreq stats directory would not show up anymore (as
   it was added inside last policies->kobj which doesn't exist anymore).
   I haven't tested it, though. Also the values in stats files wouldn't
   be refreshed as we are using the earlier stats structure.

 - CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
   scenarios where we don't really want cpufreq_stat_notifier_policy() to get
   called. For example whenever we are changing anything related to a policy:
   min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
   stats is notified. Where we don't do any useful stuff other than simply
   returning with -EBUSY from cpufreq_stats_create_table(). And so this
   isn't the right notifier that cpufreq stats..

 Due to all above reasons this patch does following changes:
 - Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
   which are only called when policy is created/destroyed. They aren't
   called for suspend/resume paths..
 - Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
   stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
   shouldn't be a problem for cpufreq_stats.
 - Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
   so that we don't free stats structure.

Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-17 02:00:43 +01:00
Viresh Kumar
d3916691c9 cpufreq: Make sure CPU is running on a freq from freq-table
Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in freq-table. This also makes cpufreq stats
inconsistent as cpufreq-stats would fail to register because current frequency
of CPU isn't found in freq-table.

Because we don't want this change to affect boot process badly, we go for the
next freq which is >= policy->cur ('cur' must be set by now, otherwise we will
end up setting freq to lowest of the table as 'cur' is initialized to zero).

In case current frequency doesn't match any frequency from freq-table, we throw
warnings to user, so that user can get this fixed in their bootloaders or
freq-tables.

Reported-by: Carlos Hernandez <ceh@ti.com>
Reported-and-tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06 14:17:25 +01:00
Viresh Kumar
ab1b1c4e82 cpufreq: send new set of notification for transition failures
In the current code, if we fail during a frequency transition, we
simply send the POSTCHANGE notification with the old frequency. This
isn't enough.

One of the core users of these notifications is the code responsible
for keeping loops_per_jiffy aligned with frequency changes. And mostly
it is written as:

	if ((val == CPUFREQ_PRECHANGE  && freq->old < freq->new) ||
	    (val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
		update-loops-per-jiffy...
	}

So, suppose we are changing to a higher frequency and failed during
transition, then following will happen:
- CPUFREQ_PRECHANGE notification with freq-new > freq-old
- CPUFREQ_POSTCHANGE notification with freq-new == freq-old

The first one will update loops_per_jiffy and second one will do
nothing. Even if we send the 2nd notification by exchanging values of
freq-new and old, some users of these notifications might get
unstable.

This can be fixed by simply calling cpufreq_notify_post_transition()
with error code and this routine will take care of sending
notifications in the correct order.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Folded 3 patches into one, rebased unicore2 changes]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06 01:43:44 +01:00
Viresh Kumar
f7ba3b41e2 cpufreq: Introduce cpufreq_notify_post_transition()
This introduces a new routine cpufreq_notify_post_transition() which
can be used to send POSTCHANGE notification for new freq with or
without both {PRE|POST}CHANGE notifications for last freq. This is
useful at multiple places, especially for sending transition failure
notifications.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06 01:43:44 +01:00
Jane Li
6f1e4efd88 cpufreq: Fix timer/workqueue corruption by protecting reading governor_enabled
When a CPU is hot removed we'll cancel all the delayed work items via
gov_cancel_work(). Sometimes the delayed work function determines that
it should adjust the delay for all other CPUs that the policy is
managing. If this scenario occurs, the canceling CPU will cancel its own
work but queue up the other CPUs works to run.

Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
queueing) has tried to fix this, but reading governor_enabled is not
protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
governor_enabled before gov_queue_work(), this scenario may occur. For
example:

 CPU0                                        CPU1
 ----                                        ----
 cpu_down()
  ...                                        <work runs>
  __cpufreq_remove_dev()                     od_dbs_timer()
   __cpufreq_governor()                       policy->governor_enabled
    policy->governor_enabled = false;
    cpufreq_governor_dbs()
     case CPUFREQ_GOV_STOP:
      gov_cancel_work(dbs_data, policy);
       cpu0 work is canceled
        timer is canceled
        cpu1 work is canceled
        <waits for cpu1>
                                              gov_queue_work(*, *, true);
                                               cpu0 work queued
                                               cpu1 work queued
                                               cpu2 work queued
                                               ...
        cpu1 work is canceled
        cpu2 work is canceled
        ...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
Modules linked in:
CPU: 1 PID: 1205 Comm: sh Tainted: G        W    3.10.0 #200
[<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
[<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
[<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
[<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
[<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
[<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
[<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
[<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
[<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
[<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
[<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
[<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
[<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
[<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
[<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
[<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
[<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
[<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
[<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
[<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)

In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
and unlock it after __gov_queue_work(). In this way, governor_enabled
is guaranteed not changed in gov_queue_work().

Signed-off-by: Jane Li <jiel@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-01-06 01:22:02 +01:00
Viresh Kumar
08fd8c1cf0 cpufreq: preserve user_policy across suspend/resume
Prevent __cpufreq_add_dev() from overwriting the existing values of
user_policy.{min|max|policy|governor} with defaults during resume
from system suspend.

Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-29 15:31:21 +01:00
Rafael J. Wysocki
72368d122c cpufreq: Clean up after a failing light-weight initialization
If cpufreq_policy_restore() returns NULL during system resume,
__cpufreq_add_dev() should just fall back to the full initialization
instead of returning an error, because that may actually make things
work.  Moreover, it should not leave stale fallback data behind after
it has failed to restore a previously existing policy.

This change is based on Viresh Kumar's work.

Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
2013-12-29 15:30:36 +01:00
Jason Baron
a27a9ab706 cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers
When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the
intel_pstate driver, the desired default policy is not properly set. For
example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the
'powersave' policy being set.

Fix by configuring the correct default policy, if either 'powersave' or
'performance' are requested. Otherwise, fallback to what the driver originally
set via its 'init' routine.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-22 00:51:52 +01:00
Viresh Kumar
42f921a6f1 cpufreq: remove sysfs files for CPUs which failed to come back after resume
There are cases where cpufreq_add_dev() may fail for some CPUs
during system resume. With the current code we will still have
sysfs cpufreq files for those CPUs and struct cpufreq_policy
would be already freed for them. Hence any operation on those
sysfs files would result in kernel warnings.

Example of problems resulting from resume errors (from Bjørn Mork):

WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G      D      3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
 [<ffffffff81380b0e>] dump_stack+0x55/0x76
 [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
 [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
 [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
 [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
 [<ffffffff811823c7>] sysfs_open_file+0x77/0x212
 [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
 [<ffffffff81122562>] do_dentry_open+0x17c/0x257
 [<ffffffff8112267e>] finish_open+0x41/0x4f
 [<ffffffff81130225>] do_last+0x80c/0x9ba
 [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
 [<ffffffff81130606>] path_openat+0x233/0x4a1
 [<ffffffff81130b7e>] do_filp_open+0x35/0x85
 [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
 [<ffffffff811232ea>] do_sys_open+0x6b/0xfa
 [<ffffffff811233a7>] SyS_openat+0xf/0x11
 [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b

To fix this, remove those sysfs files or put the associated kobject
in case of such errors. Also, to make it simple, remove the cpufreq
sysfs links from all the CPUs (except for the policy->cpu) during
suspend, as that operation won't result in a loss of sysfs file
permissions and we can create those links during resume just fine.

Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-22 00:47:46 +01:00
Rafael J. Wysocki
d4faadd5d5 Revert "cpufreq: fix garbage kobjects on errors during suspend/resume"
Commit 2167e2399d (cpufreq: fix garbage kobjects on errors during
suspend/resume) breaks suspend/resume on Martin Ziegler's system
(hard lockup during resume), so revert it.

Fixes: 2167e2399d (cpufreq: fix garbage kobjects on errors during suspend/resume)
References: https://bugzilla.kernel.org/show_bug.cgi?id=66751
Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-08 01:32:41 +01:00
Rafael J. Wysocki
12205a4b79 Revert "cpufreq: suspend governors on system suspend/hibernate"
Commit 5a87182aa2 (cpufreq: suspend governors on system
suspend/hibernate) causes hibernation problems to happen on
Bjørn Mork's and Paul Bolle's systems, so revert it.

Fixes: 5a87182aa2 (cpufreq: suspend governors on system suspend/hibernate)
Reported-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-08 01:04:17 +01:00
Bjørn Mork
2167e2399d cpufreq: fix garbage kobjects on errors during suspend/resume
This is effectively a revert of commit 5302c3fb2e ("cpufreq: Perform
light-weight init/teardown during suspend/resume"), which enabled
suspend/resume optimizations leaving the sysfs files in place.

Errors during suspend/resume are not handled properly, leaving
dead sysfs attributes in case of failures.  There are are number of
functions with special code for the "frozen" case, and all these
need to also have special error handling.

The problem is easy to demonstrate by making cpufreq_driver->init()
or cpufreq_driver->get() fail during resume.

The code is too complex for a simple fix, with split code paths
in multiple blocks within a number of functions.  It is therefore
best to revert the patch enabling this code until the error handling
is in place.

Examples of problems resulting from resume errors:

WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G      D      3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
 [<ffffffff81380b0e>] dump_stack+0x55/0x76
 [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
 [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
 [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
 [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
 [<ffffffff811823c7>] sysfs_open_file+0x77/0x212
 [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
 [<ffffffff81122562>] do_dentry_open+0x17c/0x257
 [<ffffffff8112267e>] finish_open+0x41/0x4f
 [<ffffffff81130225>] do_last+0x80c/0x9ba
 [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
 [<ffffffff81130606>] path_openat+0x233/0x4a1
 [<ffffffff81130b7e>] do_filp_open+0x35/0x85
 [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
 [<ffffffff811232ea>] do_sys_open+0x6b/0xfa
 [<ffffffff811233a7>] SyS_openat+0xf/0x11
 [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b

The failure to restore cpufreq devices on cancelled hibernation is
not a new bug. It is caused by the ACPI _PPC call failing unless the
hibernate is completed. This makes the acpi_cpufreq driver fail its
init.

Previously, the cpufreq device could be restored by offlining the
cpu temporarily.  And as a complete hibernation cycle would do this,
it would be automatically restored most of the time.  But after
commit 5302c3fb2e the leftover sysfs attributes will block any
device add action.  Therefore offlining and onlining CPU 1 will no
longer restore the cpufreq object, and a complete suspend/resume
cycle will replace it with garbage.

Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-12-03 15:25:52 +01:00
Viresh Kumar
5a87182aa2 cpufreq: suspend governors on system suspend/hibernate
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}_noirq()
for handling suspend/resume of cpufreq governors.

Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where
tunables configuration for clusters/sockets with non-boot CPUs was
getting lost after suspend/resume, as we were notifying governors
with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that
policy and so deallocating memory for tunables. This is fixed by
this patch as we don't allow any operation on governors after
device suspend and before device resume now.

Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog, minor cleanups]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-28 14:47:31 +01:00
Viresh Kumar
d4019f0a92 cpufreq: move freq change notifications to cpufreq core
Most of the drivers do following in their ->target_index() routines:

	struct cpufreq_freqs freqs;
	freqs.old = old freq...
	freqs.new = new freq...

	cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);

	/* Change rate here */

	cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);

This is replicated over all cpufreq drivers today and there doesn't exists a
good enough reason why this shouldn't be moved to cpufreq core instead.

There are few special cases though, like exynos5440, which doesn't do everything
on the call to ->target_index() routine and call some kind of bottom halves for
doing this work, work/tasklet/etc..

They may continue doing notification from their own code as flag:
CPUFREQ_ASYNC_NOTIFICATION is already set for them.

All drivers are also modified in this patch to avoid breaking 'git bisect', as
double notification would happen otherwise.

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-31 00:11:08 +01:00
viresh kumar
ad7722dab7 cpufreq: create per policy rwsem instead of per CPU cpu_policy_rwsem
We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use
all of them. We always use rwsem of policy->cpu and so we can
actually make this rwsem per policy instead.

This patch does this change. With this change other tricky situations
are also avoided now, like which lock to take while we are changing
policy->cpu, etc.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-25 23:54:12 +02:00
Viresh Kumar
9c0ebcf78f cpufreq: Implement light weight ->target_index() routine
Currently, the prototype of cpufreq_drivers target routines is:

int target(struct cpufreq_policy *policy, unsigned int target_freq,
		unsigned int relation);

And most of the drivers call cpufreq_frequency_table_target() to get a valid
index of their frequency table which is closest to the target_freq. And they
don't use target_freq and relation after that.

So, it makes sense to just do this work in cpufreq core before calling
cpufreq_frequency_table_target() and simply pass index instead. But this can be
done only with drivers which expose their frequency table with cpufreq core. For
others we need to stick with the old prototype of target() until those drivers
are converted to expose frequency tables.

This patch implements the new light weight prototype for target_index() routine.
It looks like this:

int target_index(struct cpufreq_policy *policy, unsigned int index);

CPUFreq core will call cpufreq_frequency_table_target() before calling this
routine and pass index to it. Because CPUFreq core now requires to call routines
present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time.

This also marks target() interface as deprecated. So, that new drivers avoid
using it. And Documentation is updated accordingly.

It also converts existing .target() to newly defined light weight
.target_index() routine for many driver.

Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
2013-10-25 22:42:24 +02:00
Srivatsa S. Bhat
99ec899eaf cpufreq: Detect spurious invocations of update_policy_cpu()
The function update_policy_cpu() is expected to be called when the policy->cpu
of a cpufreq policy is to be changed: ie., the new CPU nominated to become the
policy->cpu is different from the old one.

Print a warning if it is invoked with new_cpu == old_cpu, since such an
invocation might hint at a faulty logic in the caller.

Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-17 01:09:16 +02:00
Viresh Kumar
3bc28ab6da cpufreq: remove CONFIG_CPU_FREQ_TABLE
CONFIG_CPU_FREQ_TABLE will be always enabled when cpufreq framework is used, as
cpufreq core depends on it. So, we don't need this CONFIG option anymore as it
is not configurable. Remove CONFIG_CPU_FREQ_TABLE and update its users.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:33 +02:00
Viresh Kumar
70e9e77833 cpufreq: create cpufreq_generic_init() routine
Many CPUFreq drivers for SMP system (where all cores share same clock lines), do
similar stuff in their ->init() part.

This patch creates a generic routine in cpufreq core which can be used by these
so that we can remove some redundant code.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:33 +02:00
Viresh Kumar
da60ce9f2f cpufreq: call cpufreq_driver->get() after calling ->init()
Almost all drivers set policy->cur with current CPU frequency in their ->init()
part. This can be done for all of them at core level and so they wouldn't need
to do it.

This patch adds supporting code in cpufreq core for calling get() after we have
called init() for a policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:28 +02:00
Viresh Kumar
0b981e7074 cpufreq: use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY
Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead
of a separate field within cpufreq_driver. This will save some bytes of
memory.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:23 +02:00
Viresh Kumar
037ce8397d cpufreq: rename __cpufreq_set_policy() as cpufreq_set_policy()
Earlier there used to be two functions named __cpufreq_set_policy() and
cpufreq_set_policy(), but now we only have a single routine lets name it
cpufreq_set_policy() instead of __cpufreq_set_policy().

This also removes some invalid comments or fixes some incorrect comments.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:23 +02:00
Viresh Kumar
27a862e983 cpufreq: remove __cpufreq_remove_dev()
Nobody except cpufreq_remove_dev() calls __cpufreq_remove_dev() and
so we don't need two separate routines here. Merge code from
__cpufreq_remove_dev() into cpufreq_remove_dev() and get rid of
__cpufreq_remove_dev().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar
75949c9a1f cpufreq: don't break string in print statements
As a rule its better not to break string (quoted inside "") in a
print statement even if it crosses 80 column boundary as that may
introduce bugs and so this patch rewrites one of the print statements..

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar
bbdd04ab1f cpufreq: Remove extra blank line
We don't need a blank line just at start of a block, lets remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:22 +02:00
Viresh Kumar
67a29e558b cpufreq: remove invalid comment from __cpufreq_remove_dev()
Some section of kerneldoc comment for __cpufreq_remove_dev() is invalid now.
Remove it.

Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-16 00:50:21 +02:00
Viresh Kumar
1b750e3bda cpufreq: make return type of lock_policy_rwsem_{read|write}() as void
lock_policy_rwsem_{read|write}() currently has return type of int,
but it always returns zero and hence its return type should be void
instead. This patch makes that change and modifies all of the users
accordingly.

Reported-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-10-10 02:51:14 +02:00
Viresh Kumar
26ca869434 cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()
cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().

Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).

Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 03:24:02 +02:00
Yinghai Lu
4dea5806d3 cpufreq: return EEXIST instead of EBUSY for second registering
On systems that support intel_pstate, acpi_cpufreq fails to load, and
udev keeps trying until trace gets filled up and kernel crashes.

The root cause is driver return ret from cpufreq_register_driver(),
because when some other driver takes over before, it will return
EBUSY and then udev will keep trying ...

cpufreq_register_driver() should return EEXIST instead so that the
system can boot without appending intel_pstate=disable and still use
intel_pstate.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-20 00:37:10 +02:00
Viresh Kumar
8efd57657d cpufreq: unlock correct rwsem while updating policy->cpu
Current code looks like this:

        WARN_ON(lock_policy_rwsem_write(cpu));
        update_policy_cpu(policy, new_cpu);
        unlock_policy_rwsem_write(cpu);

{lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem.
Because cpu is changing with the call to update_policy_cpu(), the
unlock_policy_rwsem_write() will release the incorrect lock.

The right solution would be to release the same lock as was taken earlier. Also
update_policy_cpu() was also called from cpufreq_add_dev() without any locks and
so its better if we move this locking to inside update_policy_cpu().

This patch fixes a regression introduced in 3.12 by commit f9ba680d23
(cpufreq: Extract the handover of policy cpu to a helper function).

Reported-and-tested-by: Jon Medhurst<tixy@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18 00:01:52 +02:00
Viresh Kumar
9c8f1ee40b cpufreq: Clear policy->cpus bits in __cpufreq_remove_dev_finish()
This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev()
into two parts" from Srivatsa.

Consider a scenario where we have two CPUs in a policy (0 & 1) and we are
removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1
from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read
cpumask_weight of policy->cpus, which will come as 1 and this code will behave
as if we are removing the last CPU from policy :)

Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of
__cpufreq_remove_dev_prepare().

Tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-18 00:01:27 +02:00
Lan Tianyu
44871c9c7f cpufreq: Acquire the lock in cpufreq_policy_restore() for reading
In cpufreq_policy_restore() before system suspend policy is read from
percpu's cpufreq_cpu_data_fallback.  It's a read operation rather
than a write one, so take the lock for reading in there.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:30:03 +02:00
Srivatsa S. Bhat
cb38ed5cf1 cpufreq: Prevent problems in update_policy_cpu() if last_cpu == new_cpu
If update_policy_cpu() is invoked with the existing policy->cpu itself
as the new-cpu parameter, then a lot of things can go terribly wrong.

In its present form, update_policy_cpu() always assumes that the new-cpu
is different from policy->cpu and invokes other functions to perform their
respective updates. And those functions implement the actual update like
this:

per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
per_cpu(..., last_cpu) = NULL;

Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
references vanish into thin air! (memory leak). From there, it leads to more
problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
and hence tries to create a new sysfs-group; but sysfs already had created
the group earlier, so it complains that it cannot create a duplicate filename.
In short, the repercussions of a rather innocuous invocation of
update_policy_cpu() can turn out to be pretty nasty.

Ideally update_policy_cpu() should handle this situation (new == last)
gracefully, and not lead to such severe problems. So fix it by adding an
appropriate check.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat
61173f256a cpufreq: Restructure if/else block to avoid unintended behavior
In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
the sysfs link or nominate a new policy cpu, is governed by an if/else block
with a rather complex set of conditionals. Worse, they harbor a subtlety
which leads to certain unintended behavior.

The code looks like this:

        if (cpu != policy->cpu && !frozen) {
                sysfs_remove_link(&dev->kobj, "cpufreq");
        } else if (cpus > 1) {
		new_cpu = cpufreq_nominate_new_policy_cpu(...);
		...
		update_policy_cpu(..., new_cpu);
	}

The original intention was:
If the CPU going offline is not policy->cpu, just remove the link.
On the other hand, if the CPU going offline is the policy->cpu itself,
handover the policy->cpu job to some other surviving CPU in that policy.

But because the 'if' condition also includes the 'frozen' check, now there
are *two* possibilities by which we can enter the 'else' block:

1. cpu == policy->cpu (intended)
2. cpu != policy->cpu && frozen (unintended)

Due to the second (unintended) scenario, we end up spuriously nominating
a CPU as the policy->cpu, even when the existing policy->cpu is alive and
well. This can cause problems further down the line, especially when we end
up nominating the same policy->cpu as the new one (ie., old == new),
because it totally confuses update_policy_cpu().

To avoid this mess, restructure the if/else block to only do what was
originally intended, and thus prevent any unwelcome surprises.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Srivatsa S. Bhat
0d66b91ebf cpufreq: Fix crash in cpufreq-stats during suspend/resume
Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
dereference during the second attempt to suspend a system. He also
pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
init/teardown during suspend/resume".

That commit actually ensured that the cpufreq-stats table and the
cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
suspend/resume, which makes it all the more surprising. However, it turns
out that the root-cause is not that we access an already freed memory, but
that the reference to the allocated memory gets moved around and we lose
track of that during resume, leading to the reported crash in a subsequent
suspend attempt.

In the suspend path, during CPU offline, the value of policy->cpu is updated
by choosing one of the surviving CPUs in that policy, as long as there is
atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
invoked to update the reference to the stats structure by assigning it to
the new CPU. However, in the resume path, during CPU online, we end up
assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
know about this. Thus the reference to the stats structure remains
(incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
during CPU offline, we end up accessing an incorrect location to get the
stats structure, which eventually leads to the NULL pointer dereference.

Fix this by letting cpufreq-stats know about the update of the policy->cpu
during CPU online in the resume path. (Also, move the update_policy_cpu()
function higher up in the file, so that __cpufreq_add_dev() can invoke
it).

Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-11 23:29:57 +02:00
Rafael J. Wysocki
798282a871 Revert "cpufreq: make sure frequency transitions are serialized"
Commit 7c30ed5 (cpufreq: make sure frequency transitions are
serialized) attempted to serialize frequency transitions by
adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
notifications.  However, it assumed that the notifications will
always originate from the driver's .target() callback, but they
also can be triggered by cpufreq_out_of_sync() and that leads to
warnings like this on some systems:

 WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
 __cpufreq_notify_transition+0x238/0x260()
 In middle of another frequency transition

accompanied by a call trace similar to this one:

 [<ffffffff81720daa>] dump_stack+0x46/0x58
 [<ffffffff8106534c>] warn_slowpath_common+0x8c/0xc0
 [<ffffffff815b8560>] ? acpi_cpufreq_target+0x320/0x320
 [<ffffffff81065436>] warn_slowpath_fmt+0x46/0x50
 [<ffffffff815b1ec8>] __cpufreq_notify_transition+0x238/0x260
 [<ffffffff815b33be>] cpufreq_notify_transition+0x3e/0x70
 [<ffffffff815b345d>] cpufreq_out_of_sync+0x6d/0xb0
 [<ffffffff815b370c>] cpufreq_update_policy+0x10c/0x160
 [<ffffffff815b3760>] ? cpufreq_update_policy+0x160/0x160
 [<ffffffff81413813>] cpufreq_set_cur_state+0x8c/0xb5
 [<ffffffff814138df>] processor_set_cur_state+0xa3/0xcf
 [<ffffffff8158e13c>] thermal_cdev_update+0x9c/0xb0
 [<ffffffff8159046a>] step_wise_throttle+0x5a/0x90
 [<ffffffff8158e21f>] handle_thermal_trip+0x4f/0x140
 [<ffffffff8158e377>] thermal_zone_device_update+0x57/0xa0
 [<ffffffff81415b36>] acpi_thermal_check+0x2e/0x30
 [<ffffffff81415ca0>] acpi_thermal_notify+0x40/0xdc
 [<ffffffff813e7dbd>] acpi_device_notify+0x19/0x1b
 [<ffffffff813f8241>] acpi_ev_notify_dispatch+0x41/0x5c
 [<ffffffff813e3fbe>] acpi_os_execute_deferred+0x25/0x32
 [<ffffffff81081060>] process_one_work+0x170/0x4a0
 [<ffffffff81082121>] worker_thread+0x121/0x390
 [<ffffffff81082000>] ? manage_workers.isra.20+0x170/0x170
 [<ffffffff81088fe0>] kthread+0xc0/0xd0
 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0
 [<ffffffff8173582c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81088f20>] ? flush_kthread_worker+0xb0/0xb0

For this reason, revert commit 7c30ed5 along with the fix 266c13d
(cpufreq: Fix serialization of frequency transitions) on top of it
and we will revisit the serialization problem later.

Reported-by: Alessandro Bono <alessandro.bono@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:54:50 +02:00
Srivatsa S. Bhat
5136fa5658 cpufreq: Use signed type for 'ret' variable, to store negative error values
There are places where the variable 'ret' is declared as unsigned int
and then used to store negative return values such as -EINVAL. Fix them
by declaring the variable as a signed quantity.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:48 +02:00
Srivatsa S. Bhat
56d07db274 cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
and partial solution to the race condition between writing to a cpufreq sysfs
file and taking a CPU offline. Now that we have a proper and complete solution
to that problem, remove the temporary fix.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat
4f750c9308 cpufreq: Synchronize the cpufreq store_*() routines with CPU hotplug
The functions that are used to write to cpufreq sysfs files (such as
store_scaling_max_freq()) are not hotplug safe. They can race with CPU
hotplug tasks and lead to problems such as trying to acquire an already
destroyed timer-mutex etc.

Eg:

    __cpufreq_remove_dev()
     __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
       policy->governor->governor(policy, CPUFREQ_GOV_STOP);
        cpufreq_governor_dbs()
         case CPUFREQ_GOV_STOP:
          mutex_destroy(&cpu_cdbs->timer_mutex)
          cpu_cdbs->cur_policy = NULL;
      <PREEMPT>
    store()
     __cpufreq_set_policy()
      __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
        policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
         case CPUFREQ_GOV_LIMITS:
          mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
           if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

So use get_online_cpus()/put_online_cpus() in the store_*() functions, to
synchronize with CPU hotplug. However, there is an additional point to note
here: some parts of the CPU teardown in the cpufreq subsystem are done in
the CPU_POST_DEAD stage, with cpu_hotplug.lock *released*. So, using the
get/put_online_cpus() functions alone is insufficient; we should also ensure
that we don't race with those latter steps in the hotplug sequence. We can
easily achieve this by checking if the CPU is online before proceeding with
the store, since the CPU would have been marked offline by the time the
CPU_POST_DEAD notifiers are executed.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat
1aee40ac9c cpufreq: Invoke __cpufreq_remove_dev_finish() after releasing cpu_hotplug.lock
__cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
offline. But because we destroy the kobject towards the end of the CPU offline
phase, there are certain race windows where a task can try to write to a
cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
that CPU offline, and this can bump up the kobject refcount, which in turn might
hinder the CPU offline task from running to completion. (It can also cause
other more serious problems such as trying to acquire a destroyed timer-mutex
etc., depending on the exact stage of the cleanup at which the task managed to
take a new refcount).

To fix the race window, we will need to synchronize those store_*() call-sites
with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
in turn can cause a total deadlock because it can end up waiting for the
CPU offline task to complete, with incremented refcount!

Write to sysfs                            CPU offline task
--------------                            ----------------
kobj_refcnt++

                                          Acquire cpu_hotplug.lock

get_online_cpus();

					  Wait for kobj_refcnt to drop to zero

                     **DEADLOCK**

A simple way to avoid this problem is to perform the kobject cleanup in the
CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
released. Doing this helps us avoid deadlocks due to holding kobject refcounts
and waiting on each other on the cpu_hotplug.lock.

(Note: We can't move all of the cpufreq CPU offline steps to the
CPU_POST_DEAD stage, because certain things such as stopping the governors
have to be done before the outgoing CPU is marked offline. So retain those
parts in the CPU_DOWN_PREPARE stage itself).

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:47 +02:00
Srivatsa S. Bhat
cedb70afd0 cpufreq: Split __cpufreq_remove_dev() into two parts
During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
to perform work such as stopping the cpufreq governor, clearing the
CPU from the policy structure etc, and finally cleaning up the
kobject.

There are certain subtle issues related to the kobject cleanup, and
it would be much easier to deal with them if we separate that part
from the rest of the cleanup-work in the CPU offline phase. So split
the __cpufreq_remove_dev() function into 2 parts: one that handles
the kobject cleanup, and the other that handles the rest of the work.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:46 +02:00
Viresh Kumar
19c763031a cpufreq: serialize calls to __cpufreq_governor()
We can't take a big lock around __cpufreq_governor() as this causes
recursive locking for some cases. But calls to this routine must be
serialized for every policy. Otherwise we can see some unpredictable
events.

For example, consider following scenario:

__cpufreq_remove_dev()
 __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
   policy->governor->governor(policy, CPUFREQ_GOV_STOP);
    cpufreq_governor_dbs()
     case CPUFREQ_GOV_STOP:
      mutex_destroy(&cpu_cdbs->timer_mutex)
      cpu_cdbs->cur_policy = NULL;
  <PREEMPT>
store()
 __cpufreq_set_policy()
  __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
    policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
     case CPUFREQ_GOV_LIMITS:
      mutex_lock(&cpu_cdbs->timer_mutex); <-- Warning (destroyed mutex)
       if (policy->max < cpu_cdbs->cur_policy->cur) <- cur_policy == NULL

And so store() will eventually result in a crash if cur_policy is
NULL at this point.

Introduce an additional variable which would guarantee serialization
here.

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:46 +02:00
Viresh Kumar
f73d393384 cpufreq: don't allow governor limits to be changed when it is disabled
__cpufreq_governor() returns with -EBUSY when governor is already
stopped and we try to stop it again, but when it is stopped we must
not allow calls to CPUFREQ_GOV_LIMITS event as well.

This patch adds this check in __cpufreq_governor().

Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-10 02:49:45 +02:00
Li Zhong
5025d628c8 cpufreq: fix bad unlock balance on !CONFIG_SMP
This patch tries to fix lockdep complaint attached below.

It seems that we should always read acquire the cpufreq_rwsem,
whether CONFIG_SMP is enabled or not.  And CONFIG_HOTPLUG_CPU
depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
code enabled by CONFIG_HOTPLUG_CPU.

[    0.504191] =====================================
[    0.504627] [ BUG: bad unlock balance detected! ]
[    0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
[    0.504627] -------------------------------------
[    0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
[    0.504627] [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[    0.504627] but there are no more locks to release!
[    0.504627]
[    0.504627] other info that might help us debug this:
[    0.504627] 1 lock held by swapper/1:
[    0.504627]  #0:  (subsys mutex#4){+.+.+.}, at: [<ffffffff8134a7bf>] subsys_interface_register+0x4f/0xe0
[    0.504627]
[    0.504627] stack backtrace:
[    0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
[    0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[    0.504627]  ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
[    0.504627]  ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
[    0.504627]  00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
[    0.504627] Call Trace:
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff814c062b>] dump_stack+0x19/0x1b
[    0.504627]  [<ffffffff81098bce>] print_unlock_imbalance_bug+0xfe/0x110
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8109d0be>] lock_release_non_nested+0x1ee/0x310
[    0.504627]  [<ffffffff81099d0e>] ? mark_held_locks+0xae/0x120
[    0.504627]  [<ffffffff811510cb>] ? kfree+0xcb/0x1d0
[    0.504627]  [<ffffffff813d77ea>] ? cpufreq_policy_free+0x4a/0x60
[    0.504627]  [<ffffffff813d927a>] ? cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8109d2a4>] lock_release+0xc4/0x250
[    0.504627]  [<ffffffff8106c9f3>] up_read+0x23/0x40
[    0.504627]  [<ffffffff813d927a>] cpufreq_add_dev+0x13a/0x3e0
[    0.504627]  [<ffffffff8134a809>] subsys_interface_register+0x99/0xe0
[    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[    0.504627]  [<ffffffff813d7f0d>] cpufreq_register_driver+0x9d/0x1d0
[    0.504627]  [<ffffffff81b19f3b>] ? cpufreq_gov_dbs_init+0x12/0x12
[    0.504627]  [<ffffffff81b1a039>] acpi_cpufreq_init+0xfe/0x1f8
[    0.504627]  [<ffffffff810002ba>] do_one_initcall+0xda/0x180
[    0.504627]  [<ffffffff81ae301e>] kernel_init_freeable+0x12c/0x1bb
[    0.504627]  [<ffffffff81ae2841>] ? do_early_param+0x8c/0x8c
[    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140
[    0.504627]  [<ffffffff814b4dde>] kernel_init+0xe/0xf0
[    0.504627]  [<ffffffff814d029a>] ret_from_fork+0x7a/0xb0
[    0.504627]  [<ffffffff814b4dd0>] ? rest_init+0x140/0x140

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Acked-and-tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-21 02:04:31 +02:00
Viresh Kumar
1b27429446 cpufreq: Use cpufreq_policy_list for iterating over policies
To iterate over all policies we currently iterate over all online
CPUs and then get the policy for each of them which is suboptimal.
Use the newly created cpufreq_policy_list for this purpose instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar
474deff744 cpufreq: remove cpufreq_policy_cpu per-cpu variable
cpufreq_policy_cpu per-cpu variables are used for storing the ID of
the CPU that manages the given CPU's policy.  However, we also store
a policy pointer for each cpu in cpufreq_cpu_data, so the
cpufreq_policy_cpu information is simply redundant.

It is better to use cpufreq_cpu_data to retrieve a policy and get
policy->cpu from there, so make that happen everywhere and drop the
cpufreq_policy_cpu per-cpu variables which aren't necessary any more.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar
9e9fd80167 cpufreq: remove unnecessary check in __cpufreq_governor()
We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put
governor module as we are sure event can only be START/STOP here.

Remove the useless check.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar
9515f4d69b cpufreq: remove policy from cpufreq_policy_list during suspend
cpufreq_policy_list is a list of active policies.  We do remove
policies from this list when all CPUs belonging to that policy are
removed.  But during system suspend we don't really free a policy
struct as it will be used again during resume, so we didn't remove
it from cpufreq_policy_list as well..

However, this is incorrect.  We are saying this policy isn't valid
anymore and must not be referenced (though we haven't freed it), but
it can still be used by code that iterates over cpufreq_policy_list.

Remove policy from this list during system suspend as well.
Of course, we must add it back whenever the first CPU belonging to
that policy shows up.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Viresh Kumar
edab2fbc21 cpufreq: Fix white space in __cpufreq_remove_dev()
Align closing brace '}' of an if block.

[rjw: Subject and changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-20 15:43:50 +02:00
Rafael J. Wysocki
878f6e074e Revert "cpufreq: Use cpufreq_policy_list for iterating over policies"
Revert commit eb60852 (cpufreq: Use cpufreq_policy_list for iterating
over policies), because it breaks system suspend/resume on multiple
machines.

It either causes resume to block indefinitely or causes the BUG_ON()
in lock_policy_rwsem_##mode() to trigger on sysfs accesses to cpufreq
attributes.

Conflicts:
	drivers/cpufreq/cpufreq.c
2013-08-18 15:35:59 +02:00
Viresh Kumar
3de9bdeb28 cpufreq: improve error checking on return values of __cpufreq_governor()
The __cpufreq_governor() function can fail in rare cases especially
if there are bugs in cpufreq drivers.  Thus we must stop processing
as soon as this routine fails, otherwise it may result in undefined
behavior.

This patch adds error checking code whenever this routine is called
from any place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:48 +02:00
Viresh Kumar
6eed9404ab cpufreq: Use rwsem for protecting critical sections
Critical sections of the cpufreq core are protected with the help of
the driver module owner's refcount, which isn't the correct approach,
because it causes rmmod to return an error when some routine has
updated that refcount.

Let's use rwsem for this purpose instead.  Only
cpufreq_unregister_driver() will use write sem
and everybody else will use read sem.

[rjw: Subject & changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:47 +02:00
Viresh Kumar
fe492f3f03 cpufreq: Fix broken usage of governor->owner's refcount
The cpufreq governor owner refcount usage is broken.  We should only
increment that refcount when a CPUFREQ_GOV_POLICY_INIT event has come
and it should only be decremented if CPUFREQ_GOV_POLICY_EXIT has come.

Currently, there can be situations where the governor is in use, but
we have allowed it to be unloaded which may result in undefined
behavior.  Let's fix it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:47 +02:00
Viresh Kumar
eb608521f1 cpufreq: Use cpufreq_policy_list for iterating over policies
To iterate over all policies we currently iterate over all CPUs and
then get the policy for each of them.  Let's use the newly created
cpufreq_policy_list for this purpose.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:46 +02:00
Lukasz Majewski
c88a1f8b96 cpufreq: Store cpufreq policies in a list
Policies available in the cpufreq framework are now linked together.
They are accessible via cpufreq_policy_list defined in the cpufreq
core.

[rjw: Fix from Yinghai Lu folded in]
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-10 03:24:06 +02:00
Viresh Kumar
d5b73cd870 cpufreq: Use sizeof(*ptr) convetion for computing sizes
Chapter 14 of Documentation/CodingStyle says:

The preferred form for passing a size of a struct is the following:

	p = kmalloc(sizeof(*p), ...);

The alternative form where struct name is spelled out hurts
readability and introduces an opportunity for a bug when the pointer
variable type is changed but the corresponding sizeof that is passed
to a memory allocator is not.

This wasn't followed consistently in drivers/cpufreq, let's make it
more consistent by always following this rule.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar
3a3e9e06d0 cpufreq: Give consistent names to cpufreq_policy objects
They are called policy, cur_policy, new_policy, data, etc.  Just call
them policy wherever possible.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:10 +02:00
Viresh Kumar
5ff0a26803 cpufreq: Clean up header files included in the core
This patch addresses the following issues in the header files in the
cpufreq core:
 - Include headers in ascending order, so that we don't add same
   many times by mistake.
 - <asm/> must be included after <linux/>, so that they override
   whatever they need to.
 - Remove unnecessary includes.
 - Don't include files already included by cpufreq.h or
   cpufreq_governor.h.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:34:09 +02:00
Rafael J. Wysocki
1133bfa6dc Merge branch 'pm-cpufreq-ondemand' into pm-cpufreq
* pm-cpufreq:
  cpufreq: Remove unused function __cpufreq_driver_getavg()
  cpufreq: Remove unused APERF/MPERF support
  cpufreq: ondemand: Change the calculation of target frequency
2013-08-07 23:11:43 +02:00
Viresh Kumar
d8d3b47112 cpufreq: Pass policy to cpufreq_add_policy_cpu()
The caller of cpufreq_add_policy_cpu() already has a pointer to the
policy structure and there is no need to look it up again in
cpufreq_add_policy_cpu().  Let's pass it directly.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:50 +02:00
Rafael J. Wysocki
10659ab7b5 cpufreq: Avoid double kobject_put() for the same kobject in error code path
The only case triggering a jump to the err_out_unregister label in
__cpufreq_add_dev() is when cpufreq_add_dev_interface() fails.
However, if cpufreq_add_dev_interface() fails, it calls kobject_put()
for the policy kobject in its error code path and since that causes
the kobject's refcount to become 0, the additional kobject_put() for
the same kobject under err_out_unregister and the
wait_for_completion() following it are pointless, so drop them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-08-07 23:02:50 +02:00
Rafael J. Wysocki
71c3461ef7 cpufreq: Do not hold driver module references for additional policy CPUs
The cpufreq core is a little inconsistent in the way it uses the
driver module refcount.

Namely, if __cpufreq_add_dev() is called for a CPU that doesn't
share the policy object with any other CPUs, the driver module
refcount it grabs to start with will be dropped by it before
returning and will be equal to whatever it had been before that
function was invoked.

However, if the given CPU does share the policy object with other
CPUs, either cpufreq_add_policy_cpu() is called to link the new CPU
to the existing policy, or cpufreq_add_dev_symlink() is used to link
the other CPUs sharing the policy with it to the just created policy
object.  In that case, because both cpufreq_add_policy_cpu() and
cpufreq_add_dev_symlink() call cpufreq_cpu_get() for the given
policy (the latter possibly many times) without the balancing
cpufreq_cpu_put() (unless there is an error), the driver module
refcount will be left by __cpufreq_add_dev() with a nonzero value
(different from the initial one).

To remove that inconsistency make cpufreq_add_policy_cpu() execute
cpufreq_cpu_put() for the given policy before returning, which
decrements the driver module refcount so that it will be equal to its
initial value after __cpufreq_add_dev() returns.  Also remove the
cpufreq_cpu_get() call from cpufreq_add_dev_symlink(), since both the
policy refcount and the driver module refcount are nonzero when it is
called and they don't need to be bumped up by it.

Accordingly, drop the cpufreq_cpu_put() from __cpufreq_remove_dev(),
since it is only necessary to balance the cpufreq_cpu_get() called
by cpufreq_add_policy_cpu() or cpufreq_add_dev_symlink().

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2013-08-07 23:02:50 +02:00
Viresh Kumar
308b60e715 cpufreq: Don't pass CPU to cpufreq_add_dev_{symlink|interface}()
Pointer to struct cpufreq_policy is already passed to these routines
and we don't need to send policy->cpu to them as well.  So, get rid
of this extra argument and use policy->cpu everywhere.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Viresh Kumar
e8fdde1011 cpufreq: Remove extra variables from cpufreq_add_dev_symlink()
We call cpufreq_cpu_get() in cpufreq_add_dev_symlink() to increase usage
refcount of policy, but not to get a policy for the given CPU.  So, we
don't really need to capture the return value of this routine.  We can
simply use policy passed as an argument to cpufreq_add_dev_symlink().

Moreover debug print is rewritten to make it more clear.

[rjw: Changelog]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat
5302c3fb2e cpufreq: Perform light-weight init/teardown during suspend/resume
Now that we have the infrastructure to perform a light-weight init/tear-down,
use that in the cpufreq CPU hotplug notifier when invoked from the
suspend/resume path.

This also ensures that the file permissions of the cpufreq sysfs files are
preserved across suspend/resume, something which commit a66b2e (cpufreq:
Preserve sysfs files across suspend/resume) originally intended to do, but
had to be reverted due to other problems.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat
8414809c6a cpufreq: Preserve policy structure across suspend/resume
To perform light-weight cpu-init and teardown in the cpufreq subsystem
during suspend/resume, we need to separate out the 2 main functionalities
of the cpufreq CPU hotplug callbacks, as outlined below:

1. Init/tear-down of core cpufreq and CPU-specific components, which are
   critical to the correct functioning of the cpufreq subsystem.

2. Init/tear-down of cpufreq sysfs files during suspend/resume.

The first part requires accurate updates to the policy structure such as
its ->cpus and ->related_cpus masks, whereas the second part requires that
the policy->kobj structure is not released or re-initialized during
suspend/resume.

To handle both these requirements, we need to allow updates to the policy
structure throughout suspend/resume, but prevent the structure from getting
freed up. Also, we must have a mechanism by which the cpu-up callbacks can
restore the policy structure, without allocating things afresh. (That also
helps avoid memory leaks).

To achieve this, we use 2 schemes:
a. Use a fallback per-cpu storage area for preserving the policy structures
   during suspend, so that they can be restored during resume appropriately.

b. Use the 'frozen' flag to determine when to free or allocate the policy
   structure vs when to restore the policy from the saved fallback storage.
   Thus we can successfully preserve the structure across suspend/resume.

Effectively, this helps us complete the separation of the 'light-weight'
and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
this can be made use of in the suspend/resume scenario.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:49 +02:00
Srivatsa S. Bhat
a82fab2928 cpufreq: Introduce a flag ('frozen') to separate full vs temporary init/teardown
During suspend/resume we would like to do a light-weight init/teardown of
CPUs in the cpufreq subsystem and preserve certain things such as sysfs files
etc across suspend/resume transitions. Add a flag called 'frozen' to help
distinguish the full init/teardown sequence from the light-weight one.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat
f9ba680d23 cpufreq: Extract the handover of policy cpu to a helper function
During cpu offline, when the policy->cpu is going down, some other CPU
present in the policy->cpus mask is nominated as the new policy->cpu.
Extract this functionality from __cpufreq_remove_dev() and implement
it in a helper function. This helps in upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat
e18f1682bc cpufreq: Extract non-interface related stuff from cpufreq_add_dev_interface
cpufreq_add_dev_interface() includes the work of exposing the interface
to the device, as well as a lot of unrelated stuff. Move the latter to
cpufreq_add_dev(), where it is more appropriate.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat
e9698cc5d2 cpufreq: Add helper to perform alloc/free of policy structure
Separate out the allocation of the cpufreq policy structure (along with
its error handling) to a helper function. This makes the code easier to
read and also helps with some upcoming code reorganization.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Srivatsa S. Bhat
23d328994b cpufreq: Fix misplaced call to cpufreq_update_policy()
The call to cpufreq_update_policy() is placed in the CPU hotplug callback
of cpufreq_stats, which has a higher priority than the CPU hotplug callback
of cpufreq-core. As a result, during CPU_ONLINE/CPU_ONLINE_FROZEN, we end up
calling cpufreq_update_policy() *before* calling cpufreq_add_dev() !
And for uninitialized CPUs, it just returns silently, not doing anything.

To add to that, cpufreq_stats is not even the right place to call
cpufreq_update_policy() to begin with. The cpufreq core ought to handle
this in its own callback, from an elegance/relevance perspective.

So move the invocation of cpufreq_update_policy() to cpufreq_cpu_callback,
and place it *after* cpufreq_add_dev().

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-08-07 23:02:48 +02:00
Rafael J. Wysocki
2a99859932 cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
Since cpufreq_cpu_put() called by __cpufreq_remove_dev() drops the
driver module refcount, __cpufreq_remove_dev() causes that refcount
to become negative for the cpufreq driver after a suspend/resume
cycle.

This is not the only bad thing that happens there, however, because
kobject_put() should only be called for the policy kobject at this
point if the CPU is not the last one for that policy.

Namely, if the given CPU is the last one for that policy, the
policy kobject's refcount should be 1 at this point, as set by
cpufreq_add_dev_interface(), and only needs to be dropped once for
the kobject to go away.  This actually happens under the cpu == 1
check, so it need not be done before by cpufreq_cpu_put().

On the other hand, if the given CPU is not the last one for that
policy, this means that cpufreq_add_policy_cpu() has been called
at least once for that policy and cpufreq_cpu_get() has been
called for it too.  To balance that cpufreq_cpu_get(), we need to
call cpufreq_cpu_put() in that case.

Thus, to fix the described problem and keep the reference
counters balanced in both cases, move the cpufreq_cpu_get() call
in __cpufreq_remove_dev() to the code path executed only for
CPUs that share the policy with other CPUs.

Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: 3.10+ <stable@vger.kernel.org>
2013-07-30 00:32:00 +02:00
Stratos Karafotis
cffe4e0e74 cpufreq: Remove unused function __cpufreq_driver_getavg()
The target frequency calculation method in the ondemand governor has
changed and it is now independent of the measured average frequency.
Consequently, the __cpufreq_driver_getavg() function and getavg
member of struct cpufreq_driver are not used any more, so drop them.

[rjw: Changelog]
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-26 01:06:44 +02:00
Linus Torvalds
b7356abb9f Power management and ACPI fixes for 3.11-rc2
- Two cpufreq commits from the 3.10 cycle introduced regressions.
   The first of them was buggy (it did way much more than it needed
   to do) and the second one attempted to fix an issue introduced by
   the first one.  Fixes from Srivatsa S Bhat revert both.
 
 - If autosleep triggers during system shutdown and the shutdown
   callbacks of some device drivers have been called already, it may
   crash the system.  Fix from Liu Shuo prevents that from happening
   by making try_to_suspend() check system_state.
 
 - The ACPI memory hotplug driver doesn't clear its driver_data on
   errors which may cause a NULL poiter dereference to happen later.
   Fix from Toshi Kani.
 
 - The ACPI namespace scanning code should not try to attach scan
   handlers to device objects that have them already, which may confuse
   things quite a bit, and it should rescan the whole namespace branch
   starting at the given node after receiving a bus check notify event
   even if the device at that particular node has been discovered
   already.  Fixes from Rafael J Wysocki.
 
 - New ACPI video blacklist entry for a system whose initial backlight
   setting from the BIOS doesn't make sense.  From Lan Tianyu.
 
 - Garbage string output avoindance for ACPI PNP from Liu Shuo.
 
 - Two Kconfig fixes for issues introduced recently in the s3c24xx
   cpufreq driver (when moving the driver to drivers/cpufreq) from
   Paul Bolle.
 
 - Trivial comment fix in pm_wakeup.h from Chanwoo Choi.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJR6Sq+AAoJEKhOf7ml8uNsrGQP/0HRDW+QmTGM8znDTHXngbn9
 X3pqlpjEOiCtcmJaSJlD7GwLHMscwWcHKEezteaZ7KUI4mcnysJX6EY5YVbNriDC
 xlLcDQn9c6Xx1maCSfp+WMygvqItxZwuc8veRjrT3XtOfCNWS/FlX40Voh63BCAe
 GbfQ/HesmUg5CKplyD8/XypLWh5OFXmHzCe8IhrKGfhsZukXdSgSBjwQZMRrEMsQ
 kJjDCF8zUu0JisiWqL+xE6IFSKme9i6LBlHpzU0Y1g4RqAqkIbuS0Z3vezOYzoTD
 oZjBNa9XAgCS3x0l5g3G0ChgDAU+Mpji/imXA7JysrwbirGFbtPHtQYh2HzpAtnw
 Hkah/0ocBM7/w7VTsUQiRsFPdIJTCBLlm6J38x8yh7n84h4nJgOpK69dBLrMwCUZ
 f3kid6KIPVLBvnC3QSULrCAKUcUcVVWYtNho+sfXBMjP+cPwTmc3DvATnpru6twa
 0KjR5o585UOcciq7EWAoMrCFCfZYF5C4XGaZAxHI/SWooxeCQH84S8vfNLL2epVC
 ixmLYo4X2ANDsnfbUV+ewhB0/L2905Et6NhPUgPD/1rm15MEZbowbB2K0pzr0QL9
 /1hTL61InXx3jLxducJJFKN+HZ0zfDQdTkyafKrR9jb+GsdmnzYJ/vnfDG8MfPjp
 GZ281YBqVmUeYJh5CPU+
 =IUmn
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are fixes collected over the last week, most importnatly two
  cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
  fix preventing systems using it from crashing during shutdown and two
  ACPI scan fixes related to hotplug.

  Specifics:

   - Two cpufreq commits from the 3.10 cycle introduced regressions.
     The first of them was buggy (it did way much more than it needed to
     do) and the second one attempted to fix an issue introduced by the
     first one.  Fixes from Srivatsa S Bhat revert both.

   - If autosleep triggers during system shutdown and the shutdown
     callbacks of some device drivers have been called already, it may
     crash the system.  Fix from Liu Shuo prevents that from happening
     by making try_to_suspend() check system_state.

   - The ACPI memory hotplug driver doesn't clear its driver_data on
     errors which may cause a NULL poiter dereference to happen later.
     Fix from Toshi Kani.

   - The ACPI namespace scanning code should not try to attach scan
     handlers to device objects that have them already, which may
     confuse things quite a bit, and it should rescan the whole
     namespace branch starting at the given node after receiving a bus
     check notify event even if the device at that particular node has
     been discovered already.  Fixes from Rafael J Wysocki.

   - New ACPI video blacklist entry for a system whose initial backlight
     setting from the BIOS doesn't make sense.  From Lan Tianyu.

   - Garbage string output avoindance for ACPI PNP from Liu Shuo.

   - Two Kconfig fixes for issues introduced recently in the s3c24xx
     cpufreq driver (when moving the driver to drivers/cpufreq) from
     Paul Bolle.

   - Trivial comment fix in pm_wakeup.h from Chanwoo Choi"

* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
  PNP / ACPI: avoid garbage in resource name
  cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
  cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
  cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
  PM / Sleep: Fix comment typo in pm_wakeup.h
  PM / Sleep: avoid 'autosleep' in shutdown progress
  cpufreq: Revert commit a66b2e to fix suspend/resume regression
  ACPI / memhotplug: Fix a stale pointer in error path
  ACPI / scan: Always call acpi_bus_scan() for bus check notifications
  ACPI / scan: Do not try to attach scan handlers to devices having them
2013-07-19 09:59:06 -07:00
Paul Gortmaker
2760984f65 cpufreq: delete __cpuinit usage from all cpufreq files
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

This removes all the drivers/cpufreq uses of the __cpuinit macros
from all C files.

[1] https://lkml.org/lkml/2013/5/20/589

[v2: leave 2nd lines of args misaligned as requested by Viresh]
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: cpufreq@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-14 19:36:57 -04:00
Srivatsa S. Bhat
aae760ed21 cpufreq: Revert commit a66b2e to fix suspend/resume regression
commit a66b2e (cpufreq: Preserve sysfs files across suspend/resume)
has unfortunately caused several things in the cpufreq subsystem to
break subtly after a suspend/resume cycle.

The intention of that patch was to retain the file permissions of the
cpufreq related sysfs files across suspend/resume.  To achieve that,
the commit completely removed the calls to cpufreq_add_dev() and
__cpufreq_remove_dev() during suspend/resume transitions.  But the
problem is that those functions do 2 kinds of things:
  1. Low-level initialization/tear-down that are critical to the
     correct functioning of cpufreq-core.
  2. Kobject and sysfs related initialization/teardown.

Ideally we should have reorganized the code to cleanly separate these
two responsibilities, and skipped only the sysfs related parts during
suspend/resume.  Since we skipped the entire callbacks instead (which
also included some CPU and cpufreq-specific critical components),
cpufreq subsystem started behaving erratically after suspend/resume.

So revert the commit to fix the regression.  We'll revisit and address
the original goal of that commit separately, since it involves quite a
bit of careful code reorganization and appears to be non-trivial.

(While reverting the commit, note that another commit f51e1eb
 (cpufreq: Fix cpufreq regression after suspend/resume) already
 reverted part of the original set of changes.  So revert only the
 remaining ones).

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Paul Bolle <pebolle@tiscali.nl>
Cc: 3.10+ <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-15 01:31:36 +02:00
Viresh Kumar
266c13d767 cpufreq: Fix serialization of frequency transitions
Commit 7c30ed ("cpufreq: make sure frequency transitions are serialized")
interacts poorly with systems that have a single core freqency for all
cores.  On such systems we have a single policy for all cores with
several CPUs.  When we do a frequency transition the governor calls the
pre and post change notifiers which causes cpufreq_notify_transition()
per CPU.  Since the policy is the same for all of them all CPUs after
the first and the warnings added are generated by checking a per-policy
flag the warnings will be triggered for all cores after the first.

Fix this by allowing notifier to be called for n times. Where n is the number of
cpus in policy->cpus.

Reported-and-tested-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-07-04 13:12:44 +02:00
Lan Tianyu
f4fd379784 acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
Commits fcf8058 (cpufreq: Simplify cpufreq_add_dev()) and aa77a52
(cpufreq: acpi-cpufreq: Don't set policy->related_cpus from .init())
changed the contents of the "related_cpus" sysfs attribute on systems
where acpi-cpufreq is used and user space can't get the list of CPUs
which are in the same hardware coordination CPU domain (provided by
the ACPI AML method _PSD) via "related_cpus" any more.

To make up for that loss add a new sysfs attribute "freqdomian_cpus"
for the acpi-cpufreq driver which exposes the list of CPUs in the
same domain regardless of whether it is coordinated by hardware or
software.

[rjw: Changelog, documentation]
References: https://bugzilla.kernel.org/show_bug.cgi?id=58761
Reported-by: Jean-Philippe Halimi <jean-philippe.halimi@exascale-computing.eu>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27 21:51:09 +02:00
Viresh Kumar
7c30ed532c cpufreq: make sure frequency transitions are serialized
Whenever we are changing frequency of a cpu, we are calling PRECHANGE and
POSTCHANGE notifiers. They must be serialized. i.e. PRECHANGE or POSTCHANGE
shouldn't be called twice contiguously.

This can happen due to bugs in users of __cpufreq_driver_target() or actual
cpufreq drivers who are sending these notifiers.

This patch adds some protection against this. Now, we keep track of the last
transaction and see if something went wrong.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-27 21:49:55 +02:00
Viresh Kumar
0956df9c84 cpufreq: make __cpufreq_notify_transition() static
__cpufreq_notify_transition() is used only in cpufreq.c,
make it static.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-06-21 01:08:16 +02:00