Commit Graph

86 Commits

Author SHA1 Message Date
Deepthi Dharwar
4202735e8a cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
This is the first step towards global registration of cpuidle
states. The statistics used primarily by the governor are per-cpu
and have to be split from rest of the fields inside cpuidle_state,
which would be made global i.e. single copy. The driver_data field
is also per-cpu and moved.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:49 -05:00
Deepthi Dharwar
e978aa7d7d cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
Cpuidle governor only suggests the state to enter using the
governor->select() interface, but allows the low level driver to
override the recommended state. The actual entered state
may be different because of software or hardware demotion. Software
demotion is done by the back-end cpuidle driver and can be accounted
correctly. Current cpuidle code uses last_state field to capture the
actual state entered and based on that updates the statistics for the
state entered.

Ideally the driver enter routine should update the counters,
and it should return the state actually entered rather than the time
spent there. The generic cpuidle code should simply handle where
the counters live in the sysfs namespace, not updating the counters.

Reference:
https://lkml.org/lkml/2011/3/25/52

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com>
Tested-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-06 21:13:30 -05:00
Paul Gortmaker
7c52d55170 x86: fix up files really needing to include module.h
These files aren't just exporting symbols -- they are also defining
a MODULE_LICENSE etc. so give them the full module.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:30:36 -04:00
Thomas Renninger
15e123e5d7 intel_idle: Rename cpuidle states
Userspace apps might have to cut off parts off the
idle state name for display reasons.
Switch NHM-C1 to C1-NHM (and others) so that a cut off
name is unique and makes sense to the user.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: lenb@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-28 11:04:45 -05:00
Len Brown
bfb53ccf1c intel_idle: disable Atom/Lincroft HW C-state auto-demotion
Just as we had to disable auto-demotion for NHM/WSM,
we need to do the same for Atom (Lincroft version).

In particular, auto-demotion will prevent Lincroft
from entering the S0i3 idle power saving state.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17 17:08:48 -05:00
Len Brown
14796fca2b intel_idle: disable NHM/WSM HW C-state auto-demotion
Hardware C-state auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead demoting to a shallower state,
which is less expensive, but saves less power.

Modern Linux should generally get exactly the states it requests.
In particular, when a CPU is taken off-line, it must not be demoted, else
it can prevent the entire package from reaching deep C-states.

https://bugzilla.kernel.org/show_bug.cgi?id=25252

Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-17 17:08:46 -05:00
Shaohua Li
ec30f343d6 fix a shutdown regression in intel_idle
Fix a shutdown regression caused by 2a2d31c8dc ("intel_idle: open
broadcast clock event").  The clockevent framework can automatically
shutdown broadcast timers for hotremove CPUs.  And we get a shutdown
regression when we shutdown broadcast timer for hot remove CPU, so just
delete some code.

Also fix some section mismatch.

Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-25 05:57:34 +10:00
Len Brown
43952886f0 Merge branch 'cpuidle-perf-events' into idle-test 2011-01-12 18:06:19 -05:00
Len Brown
56dbed129d Merge branch 'linus' into idle-test 2011-01-12 18:06:06 -05:00
Thomas Renninger
f77cfe4ea2 cpuidle/x86/perf: fix power:cpu_idle double end events and throw cpu_idle events from the cpuidle layer
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.

It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
  - arch/arm/mach-at91/cpuidle.c
  - arch/arm/mach-davinci/cpuidle.c
  - arch/arm/mach-kirkwood/cpuidle.c
  - arch/arm/mach-omap2/cpuidle34xx.c
  - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
  - arch/x86/kernel/process.c (did throw events before, but was a mess)
  - drivers/idle/intel_idle.c (did throw events before)

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.

This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 18:05:16 -05:00
Shaohua Li
2a2d31c8dc intel_idle: open broadcast clock event
Intel_idle driver uses CLOCK_EVT_NOTIFY_BROADCAST_ENTER
CLOCK_EVT_NOTIFY_BROADCAST_EXIT
for broadcast clock events. The _ENTER/_EXIT doesn't really open broadcast clock
events, please see processor_idle.c for an example. In some situation, this will
cause boot hang, because some CPUs enters idle but local APIC timer stalls.

Reported-and-tested-by: Yan Zheng <zheng.z.yan@intel.com>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
cc: stable@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:34 -05:00
Len Brown
956d033fb2 cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:33 -05:00
Thomas Renninger
d18960494f ACPI, intel_idle: Cleanup idle= internal variables
Having four variables for the same thing:
  idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.

if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides

Introduces following functional changes/fixes:
  - intel_idle driver does not register if any idle=xy
    boot param is passed.
  - processor_idle.c will also not register a cpuidle driver
    and get active if idle=halt is passed.
    Before a cpuidle driver with one (C1, halt) state got registered
    Now the default_idle function will be used which finally uses
    the same idle call to enter sleep state (safe_halt()), but
    without registering a whole cpuidle driver.

That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.

Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:30 -05:00
Len Brown
ddbd550d50 intel_idle: update Sandy Bridge core C-state residency targets
Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-12 12:47:29 -05:00
Thomas Renninger
25e41933b5 perf: Clean up power events by introducing new, more generic ones
Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Thomas Renninger
61a0d49c33 perf: Do not export power_frequency, but power_start event
power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Jean Pihet <j-pihet@ti.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-2-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
2011-01-04 08:16:54 +01:00
Len Brown
56b9aea3b7 intel_idle: recognize ARAT on WSM-EX
We erroneously ignored the Always Running APIC Timer on WSM-EX.
Move the check for ARAT down so that it can apply to any/all models.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-12-02 01:19:32 -05:00
Linus Torvalds
27afe58fe6 Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  intel_idle: do not use the LAPIC timer for ATOM C2
  intel_idle: add initial Sandy Bridge support
  acpi_idle: delete bogus data from cpuidle_state.power_usage
  intel_idle: delete bogus data from cpuidle_state.power_usage
  intel_idle: simplify test for leave_mm()
2010-10-26 17:28:07 -07:00
Len Brown
c25d29952b intel_idle: do not use the LAPIC timer for ATOM C2
If we use the LAPIC timer during ATOM C2 on
some nvidia chisets, the system stalls.

https://bugzilla.kernel.org/show_bug.cgi?id=21032

Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-26 15:45:17 -04:00
Len Brown
00527cc6bb Merge branch 'intel_idle+snb' into idle-release
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-23 02:35:23 -04:00
Len Brown
d13780d439 intel_idle: add initial Sandy Bridge support
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-23 02:32:08 -04:00
Linus Torvalds
2a8b67fb72 Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line
  x86, hotplug: Move WBINVD back outside the play_dead loop
  x86, hotplug: Use mwait to offline a processor, fix the legacy case
  x86, mwait: Move mwait constants to a common header file
2010-10-21 13:45:38 -07:00
Len Brown
dea44c6b7d intel_idle: delete bogus data from cpuidle_state.power_usage
The mW data in this field is a total fabrication
and serves no purpose other than to mislead
those who might see it in sysfs.  Delete it.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-15 21:23:25 -04:00
Len Brown
c8381cc3d8 intel_idle: simplify test for leave_mm()
A run-time test to invoke leave_mm() for the deepest
supported C-state is redundant, since the appropriate
C-states already have flags with CPUIDLE_FLAG_TLB_FLUSHED set.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-15 20:43:06 -04:00
Len Brown
7fcca7d900 intel_idle: enable Atom C6
ATM-C6 was commented out, pending public documentation.

https://bugzilla.kernel.org/show_bug.cgi?id=19762

Tested-by: Dennis Jansen <Dennis.Jansen@...>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-10-08 22:16:27 -04:00
Suresh Siddha
6110a1f43c intel_idle: Voluntary leave_mm before entering deeper
Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm()
before entering into that state. CPUs tend to flush TLB in those c-states
anyways.

acpi_idle does this with C3-type states, but it was not caried over
when intel_idle was introduced.  intel_idle can apply it
to C-states in addition to those that ACPI might export as C3...

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-30 21:19:22 -04:00
Namhyung Kim
3265eba0be intel_idle: add missing __percpu markup
intel_idle_cpuidle_devices is a percpu pointer
but was missing __percpu markup.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28 23:30:39 -04:00
Thomas Weber
68f160125f intel_idle: Change mode 755 => 644
Remove execution permission from source file.

Signed-off-by: Thomas Weber <weber@corscience.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-09-28 23:30:39 -04:00
H. Peter Anvin
bc83cccc76 x86, mwait: Move mwait constants to a common header file
We have MWAIT constants spread across three different .c files, for no
good reason.  Move them all into a common header file.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <tip-*@git.kernel.org>
2010-09-17 15:36:40 -07:00
Linus Torvalds
5a4179460c Merge branch 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
  intel_idle: recognize Lincroft Atom Processor
  intel_idle: no longer EXPERIMENTAL
  intel_idle: disable module support
  intel_idle: add support for Westmere-EX
  intel_idle: delete power_policy modparam, and choose substate functions
  intel_idle: delete substates DEBUG modparam
2010-08-15 11:17:52 -07:00
Arjan van de Ven
4725fd3ce9 intel_idle: recognize Lincroft Atom Processor
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-14 22:54:52 -04:00
Linus Torvalds
8d91530c5f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Remove pointless printk from p4-clockmod.
  [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
  [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
  [CPUFREQ] Fix section mismatch for longrun_cpu_init.
  [CPUFREQ] powernow-k8: Fix misleading variable naming
  [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
  [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
  [CPUFREQ] fix brace coding style issue.
  [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
  [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
  [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present
  [CPUFREQ] unexport (un)lock_policy_rwsem* functions
  [CPUFREQ] ondemand: Refactor frequency increase code
  [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
  [CPUFREQ] powernow-k8: Limit Pstate transition latency check
  [CPUFREQ] Fix PCC driver error path
  [CPUFREQ] fix double freeing in error path of pcc-cpufreq
  [CPUFREQ] pcc driver should check for pcch method before calling _OSC
  [CPUFREQ] fix memory leak in cpufreq_add_dev
  [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

Manually fix up non-data merge conflict introduced by new calling
conventions for trace_power_start() in commit 6f4f2723d0 ("x86
cpufreq: Make trace_power_frequency cpufreq driver independent"), which
didn't update the intel_idle native hardware cpuidle driver.
2010-08-04 11:13:36 -07:00
Len Brown
ec67a2ba36 intel_idle: add support for Westmere-EX
Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-26 23:40:19 -04:00
Len Brown
0394c6676e intel_idle: delete power_policy modparam, and choose substate functions
The idea behind power policy was that it would start off as a modparam,
and then hook into the new "global" in-kernel power vs energy tunable.
But that tunable isn't happening, so delete the hook here.

With the policy hook gone, the sub-state choice functions
do not do anything useful, so delete them from the critical path.

To handle sub-states in the future, we will advertise them
with dedicated cpuidle_state entries.  That is necessary
because some of the sub-states will have substantially different
properties than their peer sub-states.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-23 16:04:46 -04:00
Len Brown
c4236282e5 intel_idle: delete substates DEBUG modparam
it isn't useful anymore

Signed-off-by: Len Brown <len.brown@intel.com>
2010-07-23 16:00:33 -04:00
Len Brown
2671717265 intel_idle: native hardware cpuidle driver for latest Intel processors
This EXPERIMENTAL driver supersedes acpi_idle on
Intel Atom Processors, Intel Core i3/i5/i7 Processors
and associated Intel Xeon processors.

It does not support the Intel Core2 processor or earlier.

For kernels configured with ACPI, CONFIG_INTEL_IDLE=y
allows intel_idle to probe before the ACPI processor driver.
Booting with "intel_idle.max_cstate=0" disables intel_idle
and the system will fall back on ACPI's "acpi_idle".

Typical Linux distributions load ACPI processor module early,
making CONFIG_INTEL_IDLE=m not easily useful on ACPI platforms.

intel_idle probes all processors at module_init time.
Processors that are hot-added later will be limited
to using C1 in idle.

Signed-off-by: Len Brown <len.brown@intel.com>
2010-05-28 14:26:20 -04:00