linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Linus Torvalds	ac4e01093f	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull idle update from Len Brown: "Add support for new Haswell-ULT CPU idle power states" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: intel_idle: initial C8, C9, C10 support tools/power turbostat: display C8, C9, C10 residency	2013-05-11 15:23:17 -07:00
Daniel Lezcano	4c637b2175	cpuidle: make a single register function for all The usual scheme to initialize a cpuidle driver on a SMP is: cpuidle_register_driver(drv); for_each_possible_cpu(cpu) { device = &per_cpu(cpuidle_dev, cpu); cpuidle_register_device(device); } This code is duplicated in each cpuidle driver. On UP systems, it is done this way: cpuidle_register_driver(drv); device = &per_cpu(cpuidle_dev, cpu); cpuidle_register_device(device); On UP, the macro 'for_each_cpu' does one iteration: #define for_each_cpu(cpu, mask) \ for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) Hence, the initialization loop is the same for UP than SMP. Beside, we saw different bugs / mis-initialization / return code unchecked in the different drivers, the code is duplicated including bugs. After fixing all these ones, it appears the initialization pattern is the same for everyone. Please note, some drivers are doing dev->state_count = drv->state_count. This is not necessary because it is done by the cpuidle_enable_device function in the cpuidle framework. This is true, until you have the same states for all your devices. Otherwise, the 'low level' API should be used instead with the specific initialization for the driver. Let's add a wrapper function doing this initialization with a cpumask parameter for the coupled idle states and use it for all the drivers. That will save a lot of LOC, consolidate the code, and the modifications in the future could be done in a single place. Another benefit is the consolidation of the cpuidle_device variable which is now in the cpuidle framework and no longer spread accross the different arch specific drivers. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-23 13:45:22 +02:00
Daniel Lezcano	554c06ba3e	cpuidle: remove en_core_tk_irqen flag The en_core_tk_irqen flag is set in all the cpuidle driver which means it is not necessary to specify this flag. Remove the flag and the code related to it. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> # for mach-omap2/* Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-23 13:45:22 +02:00
Len Brown	86239ceb33	intel_idle: initial C8, C9, C10 support Allow intel_idle and cpuidle to utilize C8, C9, C10 when they are present on... "Fourth Generation Intel(R) Core(TM) Processors", which are based on Intel(R) microarchitecture code name Haswell. Signed-off-by: Len Brown <len.brown@intel.com>	2013-04-17 19:23:32 -04:00
Daniel Lezcano	a06df062a1	cpuidle: initialize the broadcast timer framework The commit 89878baa73f0f1c679355006bd8632e5d78f96c2 introduced the CPUIDLE_FLAG_TIMER_STOP flag where we specify a specific idle state stops the local timer. Now use this flag to check at init time if one state will need the broadcast timer and, in this case, setup the broadcast timer framework. That prevents multiple code duplication in the drivers. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:10:28 +02:00
Daniel Lezcano	b60e6a0eb0	cpuidle : handle clockevent notify from the cpuidle framework When a cpu enters a deep idle state, the local timers are stopped and the time framework falls back to the timer device used as a broadcast timer. The different cpuidle drivers are calling clockevents_notify ENTER/EXIT when the idle state stops the local timer. Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle drivers. If the flag is set, the cpuidle core code takes care of the notification on behalf of the driver to avoid pointless code duplication. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:10:27 +02:00
Len Brown	57fa8273a2	cpuidle: remove vestage definition of cpuidle_state_usage.driver_data This field is no longer used. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>	2013-02-11 18:49:51 -05:00
Daniel Lezcano	8aef33a7cf	cpuidle: remove the power_specified field in the driver We realized that the power usage field is never filled and when it is filled for tegra, the power_specified flag is not set causing all of these values to be reset when the driver is initialized with set_power_state(). However, the power_specified flag can be simply removed under the assumption that the states are always backward sorted, which is the case with the current code. This change allows the menu governor select function and the cpuidle_play_dead() to be simplified. Moreover, the set_power_states() function can removed as it does not make sense any more. Drop the power_specified flag from struct cpuidle_driver and make the related changes as described above. As a consequence, this also fixes the bug where on the dynamic C-states system, the power fields are not initialized. [rjw: Changelog] References: https://bugzilla.kernel.org/show_bug.cgi?id=42870 References: https://bugzilla.kernel.org/show_bug.cgi?id=43349 References: https://lkml.org/lkml/2012/10/16/518 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-15 14:18:04 +01:00
Daniel Lezcano	bf4d1b5ddb	cpuidle: support multiple drivers With the tegra3 and the big.LITTLE [1] new architectures, several cpus with different characteristics (latencies and states) can co-exists on the system. The cpuidle framework has the limitation of handling only identical cpus. This patch removes this limitation by introducing the multiple driver support for cpuidle. This option is configurable at compile time and should be enabled for the architectures mentioned above. So there is no impact for the other platforms if the option is disabled. The option defaults to 'n'. Note the multiple drivers support is also compatible with the existing drivers, even if just one driver is needed, all the cpu will be tied to this driver using an extra small chunk of processor memory. The multiple driver support use a per-cpu driver pointer instead of a global variable and the accessor to this variable are done from a cpu context. In order to keep the compatibility with the existing drivers, the function 'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register the specified driver for all the cpus. The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver remains the same except the driver name will be related to the current cpu. The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added allowing to read the per cpu driver name. [1] http://lwn.net/Articles/481055/ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:23 +01:00
Daniel Lezcano	42f67f2aca	cpuidle: move driver's refcount to cpuidle We want to support different cpuidle drivers co-existing together. In this case we should move the refcount to the cpuidle_driver structure to handle several drivers at a time. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:22 +01:00
Daniel Lezcano	349631e0e4	cpuidle / sysfs: move structure declaration into the sysfs.c file The structure cpuidle_state_kobj is not used anywhere except in the sysfs.c file. The definition of this structure is not needed in the cpuidle header file. This patch moves it to the sysfs.c file in order to encapsulate the code a bit more. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:21 +01:00
Arnd Bergmann	c7a9b09b1a	ARM: omap: allow building omap44xx without SMP The new omap4 cpuidle implementation currently requires ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP. This patch makes it possible to build a non-SMP kernel for that platform. This is not normally desired for end-users but can be useful for testing. Without this patch, building rand-0y2jSKT results in: drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke': drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration] It's not clear if this patch is the best solution for the problem at hand. I have made sure that we can now build the kernel in all configurations, but that does not mean it will actually work on an OMAP44xx. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Kevin Hilman <khilman@ti.com> Cc: Tony Lindgren <tony@atomide.com>	2012-08-23 17:16:42 +02:00
Linus Torvalds	476525004a	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & power management update from Len Brown: "Re-write of the turbostat tool. lower overhead was necessary for measuring very large system when they are very idle. IVB support in intel_idle It's what I run on my IVB, others should be able to also:-) ACPICA core update We have found some bugs due to divergence between Linux and the upstream ACPICA base. Most of these patches are to reduce that divergence to reduce the risk of future bugs. Some cpuidle updates, mostly for non-Intel More will be coming, as they depend on this part. Some thermal management changes needed by non-ACPI systems. Some _OST (OS Status Indication) updates for hot ACPI hot-plug." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits) Thermal: Documentation update Thermal: Add Hysteresis attributes Thermal: Make Thermal trip points writeable ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check tools/power: turbostat: fix large c1% issue tools/power: turbostat v2 - re-write for efficiency ACPICA: Update to version 20120711 ACPICA: AcpiSrc: Fix some translation issues for Linux conversion ACPICA: Update header files copyrights to 2012 ACPICA: Add new ACPI table load/unload external interfaces ACPICA: Split file: tbxface.c -> tbxfload.c ACPICA: Add PCC address space to space ID decode function ACPICA: Fix some comment fields ACPICA: Table manager: deploy new firmware error/warning interfaces ACPICA: Add new interfaces for BIOS(firmware) errors and warnings ACPICA: Split exception code utilities to a new file, utexcep.c ACPI: acpi_pad: tune round_robin_time ACPICA: Update to version 20120620 ACPICA: Add support for implicit notify on multiple devices ACPICA: Update comments; no functional change ...	2012-07-26 14:28:55 -07:00
Rafael J. Wysocki	7791bd230c	Merge branch 'pm-domains' * pm-domains: PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset PM / Domains: Replace plain integer with NULL pointer in domain.c file PM / Domains: Add missing static storage class specifier in domain.c file PM / Domains: Allow device callbacks to be added at any time PM / Domains: Add device domain data reference counter PM / Domains: Add preliminary support for cpuidle, v2 PM / Domains: Do not stop devices after restoring their states PM / Domains: Use subsystem runtime suspend/resume callbacks by default	2012-07-19 00:03:17 +02:00
Preeti U Murthy	8651f97bd9	PM / cpuidle: System resume hang fix with cpuidle On certain bios, resume hangs if cpus are allowed to enter idle states during suspend [1]. This was fixed in apci idle driver [2].But intel_idle driver does not have this fix. Thus instead of replicating the fix in both the idle drivers, or in more platform specific idle drivers if needed, the more general cpuidle infrastructure could handle this. A suspend callback in cpuidle_driver could handle this fix. But a cpuidle_driver provides only basic functionalities like platform idle state detection capability and mechanisms to support entry and exit into CPU idle states. All other cpuidle functions are found in the cpuidle generic infrastructure for good reason that all cpuidle drivers, irrepective of their platforms will support these functions. One option therefore would be to register a suspend callback in cpuidle which handles this fix. This could be called through a PM_SUSPEND_PREPARE notifier. But this is too generic a notfier for a driver to handle. Also, ideally the job of cpuidle is not to handle side effects of suspend. It should expose the interfaces which "handle cpuidle 'during' suspend" or any other operation, which the subsystems call during that respective operation. The fix demands that during suspend, no cpus should be allowed to enter deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle ensures that. Not just that it also kicks all the cpus which are already in idle out of their idle states which was being done during cpu hotplug through a CPU_DYING_FROZEN callbacks. Now the question arises about when during suspend should cpuidle_uninstall_idle_handler() be called. Since we are dealing with drivers it seems best to call this function during dpm_suspend(). Delaying the call till dpm_suspend_noirq() does no harm, as long as it is before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg operations. In dpm_suspend_noirq(), it would be wise to place this call before suspend_device_irqs() to avoid ugly interactions with the same. Ananlogously, during resume. References: [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075. [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2 Reported-and-tested-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-10 21:34:49 +02:00
Daniel Lezcano	25ac77613a	ACPI: intel_idle : break dependency between modules When the system is booted with some cpus offline, the idle driver is not initialized. When a cpu is set online, the acpi code call the intel idle init function. Unfortunately this code introduce a dependency between intel_idle and acpi. This patch is intended to remove this dependency by using the notifier of intel_idle. This patch has the benefit of encapsulating the intel_idle driver and remove some exported functions. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-05 22:37:47 +02:00
Rafael J. Wysocki	cbc9ef0287	PM / Domains: Add preliminary support for cpuidle, v2 On some systems there are CPU cores located in the same power domains as I/O devices. Then, power can only be removed from the domain if all I/O devices in it are not in use and the CPU core is idle. Add preliminary support for that to the generic PM domains framework. First, the platform is expected to provide a cpuidle driver with one extra state designated for use with the generic PM domains code. This state should be initially disabled and its exit_latency value should be set to whatever time is needed to bring up the CPU core itself after restoring power to it, not including the domain's power on latency. Its .enter() callback should point to a procedure that will remove power from the domain containing the CPU core at the end of the CPU power transition. The remaining characteristics of the extra cpuidle state, referred to as the "domain" cpuidle state below, (e.g. power usage, target residency) should be populated in accordance with the properties of the hardware. Next, the platform should execute genpd_attach_cpuidle() on the PM domain containing the CPU core. That will cause the generic PM domains framework to treat that domain in a special way such that: * When all devices in the domain have been suspended and it is about to be turned off, the states of the devices will be saved, but power will not be removed from the domain. Instead, the "domain" cpuidle state will be enabled so that power can be removed from the domain when the CPU core is idle and the state has been chosen as the target by the cpuidle governor. * When the first I/O device in the domain is resumed and __pm_genpd_poweron(() is called for the first time after power has been removed from the domain, the "domain" cpuidle state will be disabled to avoid subsequent surprise power removals via cpuidle. The effective exit_latency value of the "domain" cpuidle state depends on the time needed to bring up the CPU core itself after restoring power to it as well as on the power on latency of the domain containing the CPU core. Thus the "domain" cpuidle state's exit_latency has to be recomputed every time the domain's power on latency is updated, which may happen every time power is restored to the domain, if the measured power on latency is greater than the latency stored in the corresponding generic_pm_domain structure. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Kevin Hilman <khilman@ti.com>	2012-07-03 19:07:42 +02:00
Rafael J. Wysocki	6e797a0788	PM / cpuidle: Add driver reference counter Add a reference counter for the cpuidle driver, so that it can't be unregistered when it is in use. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-03 19:06:25 +02:00
ShuoX Liu	dc7fd275ae	cpuidle: move field disable from per-driver to per-cpu Andrew J.Schorr raises a question. When he changes the disable setting on a single CPU, it affects all the other CPUs. Basically, currently, the disable field is per-driver instead of per-cpu. All the C states of the same driver are shared by all CPU in the same machine. The patch changes the `disable' field to per-cpu, so we could set this separately for each cpu. Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Reported-by: Andrew J.Schorr <aschorr@telemetry-investments.com> Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-03 19:05:31 +02:00
Colin Cross	20ff51a36b	cpuidle: coupled: add parallel barrier function Adds cpuidle_coupled_parallel_barrier, which can be used by coupled cpuidle state enter functions to handle resynchronization after determining if any cpu needs to abort. The normal use case will be: static bool abort_flag; static atomic_t abort_barrier; int arch_cpuidle_enter(struct cpuidle_device dev, ...) { if (arch_turn_off_irq_controller()) { / returns an error if an irq is pending and would be lost if idle continued and turned off power / abort_flag = true; } cpuidle_coupled_parallel_barrier(dev, &abort_barrier); if (abort_flag) { / One of the cpus didn't turn off it's irq controller / arch_turn_on_irq_controller(); return -EINTR; } / continue with idle */ ... } This will cause all cpus to abort idle together if one of them needs to abort. Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:49:36 -04:00
Colin Cross	4126c0197b	cpuidle: add support for states that affect multiple cpus On some ARM SMP SoCs (OMAP4460, Tegra 2, and probably more), the cpus cannot be independently powered down, either due to sequencing restrictions (on Tegra 2, cpu 0 must be the last to power down), or due to HW bugs (on OMAP4460, a cpu powering up will corrupt the gic state unless the other cpu runs a work around). Each cpu has a power state that it can enter without coordinating with the other cpu (usually Wait For Interrupt, or WFI), and one or more "coupled" power states that affect blocks shared between the cpus (L2 cache, interrupt controller, and sometimes the whole SoC). Entering a coupled power state must be tightly controlled on both cpus. The easiest solution to implementing coupled cpu power states is to hotplug all but one cpu whenever possible, usually using a cpufreq governor that looks at cpu load to determine when to enable the secondary cpus. This causes problems, as hotplug is an expensive operation, so the number of hotplug transitions must be minimized, leading to very slow response to loads, often on the order of seconds. This file implements an alternative solution, where each cpu will wait in the WFI state until all cpus are ready to enter a coupled state, at which point the coupled state function will be called on all cpus at approximately the same time. Once all cpus are ready to enter idle, they are woken by an smp cross call. At this point, there is a chance that one of the cpus will find work to do, and choose not to enter idle. A final pass is needed to guarantee that all cpus will call the power state enter function at the same time. During this pass, each cpu will increment the ready counter, and continue once the ready counter matches the number of online coupled cpus. If any cpu exits idle, the other cpus will decrement their counter and retry. To use coupled cpuidle states, a cpuidle driver must: Set struct cpuidle_device.coupled_cpus to the mask of all coupled cpus, usually the same as cpu_possible_mask if all cpus are part of the same cluster. The coupled_cpus mask must be set in the struct cpuidle_device for each cpu. Set struct cpuidle_device.safe_state to a state that is not a coupled state. This is usually WFI. Set CPUIDLE_FLAG_COUPLED in struct cpuidle_state.flags for each state that affects multiple cpus. Provide a struct cpuidle_state.enter function for each state that affects multiple cpus. This function is guaranteed to be called on all cpus at approximately the same time. The driver should ensure that the cpus all abort together if any cpu tries to abort once the function is called. update1: cpuidle: coupled: fix count of online cpus online_count was never incremented on boot, and was also counting cpus that were not part of the coupled set. Fix both issues by introducting a new function that counts online coupled cpus, and call it from register as well as the hotplug notifier. update2: cpuidle: coupled: fix decrementing ready count cpuidle_coupled_set_not_ready sometimes refuses to decrement the ready count in order to prevent a race condition. This makes it unsuitable for use when finished with idle. Add a new function cpuidle_coupled_set_done that decrements both the ready count and waiting count, and call it after idle is complete. Cc: Amit Kucheria <amit.kucheria@linaro.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Trinabh Gupta <g.trinabh@gmail.com> Cc: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Reviewed-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Tested-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Colin Cross <ccross@android.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2012-06-02 00:49:09 -04:00
Boris Ostrovsky	02401c06b7	cpuidle: power_usage should be declared signed integer power_usage is always assigned a negative value and should be declared a signed integer Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 03:23:30 -04:00
Boris Ostrovsky	1a022e3f1b	idle, x86: Allow off-lined CPU to enter deeper C states Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 03:23:01 -04:00
Daniel Lezcano	e07510585a	cpuidle: remove unused 'governor_data' field As far as I can see, this field is never used in the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 01:55:40 -04:00
Daniel Lezcano	db70b04407	cpuidle: remove useless array definition in cpuidle_structure All the modules name are ro-data, it is never copied to the array. eg. static struct cpuidle_driver intel_idle_driver = { .name = "intel_idle", .owner = THIS_MODULE, }; It safe to assign the pointer of this ro-data to a const char *. By this way we save 12 bytes. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 01:55:22 -04:00
ShuoX Liu	3a53396b03	cpuidle: add a sysfs entry to disable specific C state for debug purpose. Some C states of new CPU might be not good. One reason is BIOS might configure them incorrectly. To help developers root cause it quickly, the patch adds a new sysfs entry, so developers could disable specific C state manually. In addition, C state might have much impact on performance tuning, as it takes much time to enter/exit C states, which might delay interrupt processing. With the new debug option, developers could check if a deep C state could impact performance and how much impact it could cause. Also add this option in Documentation/cpuidle/sysfs.txt. [akpm@linux-foundation.org: check kstrtol return value] Signed-off-by: ShuoX Liu <shuox.liu@intel.com> Reviewed-by: Yanmin Zhang <yanmin_zhang@intel.com> Reviewed-and-Tested-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-30 01:52:58 -04:00
Robert Lee	e1689795a7	cpuidle: Add common time keeping and irq enabling Make necessary changes to implement time keeping and irq enabling in the core cpuidle code. This will allow the removal of these functionalities from various platform cpuidle implementations whose timekeeping and irq enabling follows the form in this common code. Signed-off-by: Robert Lee <rob.lee@linaro.org> Tested-by: Jean Pihet <j-pihet@ti.com> Tested-by: Amit Daniel <amit.kachhap@linaro.org> Tested-by: Robert Lee <rob.lee@linaro.org> Reviewed-by: Kevin Hilman <khilman@ti.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Acked-by: Jean Pihet <j-pihet@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-03-21 01:59:40 -04:00
Linus Torvalds	507a03c1cb	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux This includes initial support for the recently published ACPI 5.0 spec. In particular, support for the "hardware-reduced" bit that eliminates the dependency on legacy hardware. APEI has patches resulting from testing on real hardware. Plus other random fixes. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits) acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec intel_idle: Split up and provide per CPU initialization func ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2 ACPI processor: Remove unneeded cpuidle_unregister_driver call intel idle: Make idle driver more robust intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle ACPI: kernel-parameters.txt : Add intel_idle.max_cstate intel_idle: remove redundant local_irq_disable() call ACPI processor: Fix error path, also remove sysdev link ACPI: processor: fix acpi_get_cpuid for UP processor intel_idle: fix API misuse ACPI APEI: Convert atomicio routines ACPI: Export interfaces for ioremapping/iounmapping ACPI registers ACPI: Fix possible alignment issues with GAS 'address' references ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ACPI: Store SRAT table revision ACPI, APEI, Resolve false conflict between ACPI NVS and APEI ACPI, Record ACPI NVS regions ACPI, APEI, EINJ, Refine the fix of resource conflict ...	2012-01-18 15:51:48 -08:00
Thomas Renninger	65b7f839ce	intel_idle: Split up and provide per CPU initialization func Function split up, should have no functional change. Provides entry point for physically hotplugged CPUs to initialize and activate cpuidle. Signed-off-by: Thomas Renninger <trenn@suse.de> CC: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> CC: Shaohua Li <shaohua.li@intel.com> CC: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com>	2012-01-17 23:59:12 -05:00
Deepthi Dharwar	e179816ce6	powerpc/cpuidle: Enable cpuidle and directly call cpuidle_idle_call() for pSeries This patch enables cpuidle for pSeries and pSeries_idle is directly called from the idle loop. As a result of pSeries_idle, cpuidle driver registered with cpuidle subsystem comes into action. On failure of loading of the driver or cpuidle framework default idle is executed as part of the function. This patch also removes the routines pseries_shared_idle_sleep and pseries_dedicated_idle_sleep as they are now implemented as part of pseries_idle cpuidle driver. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Signed-off-by: Arun R Bharadwaj <arun.r.bharadwaj@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2011-12-08 13:57:20 +11:00
Linus Torvalds	3c00303206	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: cpuidle: Single/Global registration of idle states cpuidle: Split cpuidle_state structure and move per-cpu statistics fields cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning ACPI: Export FADT pm_profile integer value to userspace thermal: Prevent polling from happening during system suspend ACPI: Drop ACPI_NO_HARDWARE_INIT ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast() PNPACPI: Simplify disabled resource registration ACPI: Fix possible recursive locking in hwregs.c ACPI: use kstrdup() mrst pmu: update comment tools/power turbostat: less verbose debugging	2011-11-07 10:13:52 -08:00
Deepthi Dharwar	46bcfad7a8	cpuidle: Single/Global registration of idle states This patch makes the cpuidle_states structure global (single copy) instead of per-cpu. The statistics needed on per-cpu basis by the governor are kept per-cpu. This simplifies the cpuidle subsystem as state registration is done by single cpu only. Having single copy of cpuidle_states saves memory. Rare case of asymmetric C-states can be handled within the cpuidle driver and architectures such as POWER do not have asymmetric C-states. Having single/global registration of all the idle states, dynamic C-state transitions on x86 are handled by the boot cpu. Here, the boot cpu would disable all the devices, re-populate the states and later enable all the devices, irrespective of the cpu that would receive the notification first. Reference: https://lkml.org/lkml/2011/4/25/83 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:58 -05:00
Deepthi Dharwar	4202735e8a	cpuidle: Split cpuidle_state structure and move per-cpu statistics fields This is the first step towards global registration of cpuidle states. The statistics used primarily by the governor are per-cpu and have to be split from rest of the fields inside cpuidle_state, which would be made global i.e. single copy. The driver_data field is also per-cpu and moved. Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:49 -05:00
Deepthi Dharwar	b25edc42bf	cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare() The cpuidle_device->prepare() mechanism causes updates to the cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE to tell the governor not to chose a state on a per-cpu basis at run-time. State demotion is now handled by the driver and it returns the actual state entered. Hence, this mechanism is not required. Also this removes per-cpu flags from cpuidle_state enabling it to be made global. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:43 -05:00
Deepthi Dharwar	e978aa7d7d	cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state Cpuidle governor only suggests the state to enter using the governor->select() interface, but allows the low level driver to override the recommended state. The actual entered state may be different because of software or hardware demotion. Software demotion is done by the back-end cpuidle driver and can be accounted correctly. Current cpuidle code uses last_state field to capture the actual state entered and based on that updates the statistics for the state entered. Ideally the driver enter routine should update the counters, and it should return the state actually entered rather than the time spent there. The generic cpuidle code should simply handle where the counters live in the sysfs namespace, not updating the counters. Reference: https://lkml.org/lkml/2011/3/25/52 Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: Trinabh Gupta <g.trinabh@gmail.com> Tested-by: Jean Pihet <j-pihet@ti.com> Reviewed-by: Kevin Hilman <khilman@ti.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Kevin Hilman <khilman@ti.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-11-06 21:13:30 -05:00
Paul Gortmaker	de47725421	include: replace linux/module.h with "struct module" wherever possible The <linux/module.h> pretty much brings in the kitchen sink along with it, so it should be avoided wherever reasonably possible in terms of being included from other commonly used <linux/something.h> files, as it results in a measureable increase on compile times. The worst culprit was probably device.h since it is used everywhere. This file also had an implicit dependency/usage of mutex.h which was masked by module.h, and is also fixed here at the same time. There are over a dozen other headers that simply declare the struct instead of pulling in the whole file, so follow their lead and simply make it a few more. Most of the implicit dependencies on module.h being present by these headers pulling it in have been now weeded out, so we can finally make this change with hopefully minimal breakage. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-10-31 19:32:32 -04:00
Len Brown	a0bfa13738	cpuidle: stop depending on pm_idle cpuidle users should call cpuidle_call_idle() directly rather than via (pm_idle)() function pointer. Architecture may choose to continue using (pm_idle)(), but cpuidle need not depend on it: my_arch_cpu_idle() ... if(cpuidle_call_idle()) pm_idle(); cc: Kevin Hilman <khilman@deeprootsystems.com> cc: Paul Mundt <lethal@linux-sh.org> cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:37 -04:00
Len Brown	d91ee5863b	cpuidle: replace xen access to x86 pm_idle and default_idle When a Xen Dom0 kernel boots on a hypervisor, it gets access to the raw-hardware ACPI tables. While it parses the idle tables for the hypervisor's beneift, it uses HLT for its own idle. Rather than have xen scribble on pm_idle and access default_idle, have it simply disable_cpuidle() so acpi_idle will not load and architecture default HLT will be used. cc: xen-devel@lists.xensource.com Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2011-08-03 19:06:36 -04:00
Len Brown	5392083748	cpuidle: CPUIDLE_FLAG_CHECK_BM is omap3_idle specific Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:34 -05:00
Len Brown	956d033fb2	cpuidle: CPUIDLE_FLAG_TLB_FLUSHED is specific to intel_idle Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:33 -05:00
Len Brown	642f11c53f	cpuidle: delete unused CPUIDLE_FLAG_SHALLOW, BALANCED, DEEP definitions Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:32 -05:00
Len Brown	d247632c08	cpuidle: delete NOP CPUIDLE_FLAG_POLL it serves no purpose Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-12 12:47:31 -05:00
Suresh Siddha	6110a1f43c	intel_idle: Voluntary leave_mm before entering deeper Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm() before entering into that state. CPUs tend to flush TLB in those c-states anyways. acpi_idle does this with C3-type states, but it was not caried over when intel_idle was introduced. intel_idle can apply it to C-states in addition to those that ACPI might export as C3... Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-09-30 21:19:22 -04:00
Ai Li	71abbbf856	cpuidle: extend cpuidle and menu governor to handle dynamic states On some SoC chips, HW resources may be in use during any particular idle period. As a consequence, the cpuidle states that the SoC is safe to enter can change from idle period to idle period. In addition, the latency and threshold of each cpuidle state can vary, depending on the operating condition when the CPU becomes idle, e.g. the current cpu frequency, the current state of the HW blocks, etc. cpuidle core and the menu governor, in the current form, are geared towards cpuidle states that are static, i.e. the availabiltiy of the states, their latencies, their thresholds are non-changing during run time. cpuidle does not provide any hook that cpuidle drivers can use to adjust those values on the fly for the current idle period before the menu governor selects the target cpuidle state. This patch extends cpuidle core and the menu governor to handle states that are dynamic. There are three additions in the patch and the patch maintains backwards-compatibility with existing cpuidle drivers. 1) add prepare() to struct cpuidle_device. A cpuidle driver can hook into the callback and cpuidle will call prepare() before calling the governor's select function. The callback gives the cpuidle driver a chance to update the dynamic information of the cpuidle states for the current idle period, e.g. state availability, latencies, thresholds, power values, etc. 2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare() function, a cpuidle driver can set/clear the flag to indicate to the menu governor whether a cpuidle state should be ignored, i.e. not available, during the current idle period. 3) add power_specified bit to struct cpuidle_device. The menu governor currently assumes that the cpuidle states are arranged in the order of increasing latency, threshold, and power savings. This is true or can be made true for static states. Once the state parameters are dynamic, the latencies, thresholds, and power savings for the cpuidle states can increase or decrease by different amounts from idle period to idle period. So the assumption of increasing latency, threshold, and power savings from Cn to C(n+1) can no longer be guaranteed. It can be straightforward to calculate the power consumption of each available state and to specify it in power_usage for the idle period. Using the power_usage fields, the menu governor then selects the state that has the lowest power consumption and that still satisfies all other critieria. The power_specified bit defaults to 0. For existing cpuidle drivers, cpuidle detects that power_specified is 0 and fills in a dummy set of power_usage values. Signed-off-by: Ai Li <aili@codeaurora.org> Cc: Len Brown <len.brown@intel.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:45:04 -07:00
Len Brown	752138df0d	cpuidle: make cpuidle_curr_driver static cpuidle_register_driver() sets cpuidle_curr_driver cpuidle_unregister_driver() clears cpuidle_curr_driver We should't expose cpuidle_curr_driver to potential modification except via these interfaces. So make it static and create cpuidle_get_driver() to observe it. Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-27 21:06:58 -04:00
Len Brown	6b2c676bf3	cpuidle: fail to register if !CONFIG_CPU_IDLE Signed-off-by: Len Brown <len.brown@intel.com>	2010-05-27 01:56:24 -04:00
Venkatesh Pallipadi	dcb84f335b	cpuidle acpi driver: fix oops on AC<->DC cpuidle and acpi driver interaction bug with the way cpuidle_register_driver() is called. Due to this bug, there will be oops on AC<->DC on some systems, where they support C-states in one DC and not in AC. The current code does ON BOOT: Look at CST and other C-state info to see whether more than C1 is supported. If it is, then acpi processor_idle does a cpuidle_register_driver() call, which internally enables the device. ON CST change notification (AC<->DC) and on suspend-resume: acpi driver temporarily disables device, updates the device with any new C-states, and reenables the device. The problem is is on boot, there are no C2, C3 states supported and we skip the register. Later on AC<->DC, we may get a CST notification and we try to reevaluate CST and enabled the device, without actually registering it. This causes breakage as we try to create /sys fs sub directory, without the parent directory which is created at register time. Thanks to Sanjeev for reporting the problem here. http://bugzilla.kernel.org/show_bug.cgi?id=10394 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-06-11 19:13:45 -04:00
Yi Yang	8b78cf602f	cpuidle: fix cpuidle time and usage overflow cpuidle C-state sysfs node time and usage are very easy to overflow because they are all of unsigned int type, time will overflow within about two hours, usage will take longer time to overflow, but they are increasing for ever. This patch will convert them to unsigned long long. Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-03-26 00:45:26 -04:00
Venkatesh Pallipadi	4fcb2fcd4d	ACPI, cpuidle: Clarify C-state description in sysfs Add a new sysfs entry under cpuidle states. desc - can be used by driver to communicate to userspace any specific information about the state. This helps in identifying the exact hardware C-states behind the ACPI C-state definition. Idea is to export this through powertop, which will help to map the C-state reported by powertop to actual hardware C-state. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-02-14 00:09:55 -05:00
Len Brown	9b71315421	Revert "cpuidle: build fix for non-x86" This reverts commit `f757397097`. which ironically broke the ia64 build	2008-02-07 04:16:34 -05:00

1 2

56 Commits