linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-04 13:26:47 +07:00

Author	SHA1	Message	Date
Linus Torvalds	4d25ec1966	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: - Improve thermal cpu_cooling interaction with cpufreq core. The cpu_cooling driver is designed to use CPU frequency scaling to avoid high thermal states for a platform. But it wasn't glued really well with cpufreq core. For example clipped-cpus is copied from the policy structure and its much better to use the policy->cpus (or related_cpus) fields directly as they may have got updated. Not that things were broken before this series, but they can be optimized a bit more. This series tries to improve interactions between cpufreq core and cpu_cooling driver and does some fixes/cleanups to the cpu_cooling driver. (Viresh Kumar) - A couple of fixes and cleanups in thermal core and imx, hisilicon, bcm_2835, int340x thermal drivers. (Arvind Yadav, Dan Carpenter, Sumeet Pawnikar, Srinivas Pandruvada, Willy WOLFF) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (24 commits) thermal: bcm2835: fix an error code in probe() thermal: hisilicon: Handle return value of clk_prepare_enable thermal: imx: Handle return value of clk_prepare_enable thermal: int340x: check for sensor when PTYP is missing Thermal/int340x: Fix few typos and kernel-doc style thermal: fix source code documentation for parameters thermal: cpu_cooling: Replace kmalloc with kmalloc_array thermal: cpu_cooling: Rearrange struct cpufreq_cooling_device thermal: cpu_cooling: 'freq' can't be zero in cpufreq_state2power() thermal: cpu_cooling: don't store cpu_dev in cpufreq_cdev thermal: cpu_cooling: get_level() can't fail thermal: cpu_cooling: create structure for idle time stats thermal: cpu_cooling: merge frequency and power tables thermal: cpu_cooling: get rid of 'allowed_cpus' thermal: cpu_cooling: OPPs are registered for all CPUs thermal: cpu_cooling: store cpufreq policy cpufreq: create cpufreq_table_count_valid_entries() thermal: cpu_cooling: use cpufreq_policy to register cooling device thermal: cpu_cooling: get rid of a variable in cpufreq_set_cur_state() thermal: cpu_cooling: remove cpufreq_cooling_get_level() ...	2017-07-14 13:12:32 -07:00
Len Brown	f8475cef90	x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-27 01:47:32 +02:00
Viresh Kumar	55d8529313	cpufreq: create cpufreq_table_count_valid_entries() We need such a routine at two places already, lets create one. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>	2017-05-27 17:32:28 -07:00
Rafael J. Wysocki	1b72e7fd30	cpufreq: schedutil: Use policy-dependent transition delays Make the schedutil governor take the initial (default) value of the rate_limit_us sysfs attribute from the (new) transition_delay_us policy parameter (to be set by the scaling driver). That will allow scaling drivers to make schedutil use smaller default values of rate_limit_us and reduce the default average time interval between consecutive frequency changes. Make intel_pstate set transition_delay_us to 500. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2017-04-17 18:37:27 +02:00
Viresh Kumar	565ebe8073	cpufreq: Fix typos in comments - s/freqnency/frequency/ - s/accomodating/accommodating/ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-04 00:47:59 +01:00
Viresh Kumar	052f573f5c	cpufreq: Remove CPUFREQ_START notifier event Its not used anymore, remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-04 00:05:30 +01:00
Viresh Kumar	f9f41e3ef9	cpufreq: Remove policy create/remove notifiers Those were added by: commit `fcd7af917a` ("cpufreq: stats: handle cpufreq_unregister_driver() and suspend/resume properly") but aren't used anymore since: commit `1aefc75b24` ("cpufreq: stats: Make the stats code non-modular"). Remove them. Also remove the redundant parameter to the respective routines. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-02-03 23:59:38 +01:00
Rafael J. Wysocki	30248feff5	cpufreq: Make cpufreq_update_policy() void The return value of cpufreq_update_policy() is never used, so make it void. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-11-21 14:35:43 +01:00
Markus Mayer	ee7930ee27	cpufreq: stats: New sysfs attribute for clearing statistics Allow CPUfreq statistics to be cleared by writing anything to /sys/.../cpufreq/stats/reset. Signed-off-by: Markus Mayer <mmayer@broadcom.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-11 01:51:11 +01:00
Sergey Senozhatsky	c6fe46a79e	cpufreq: fix overflow in cpufreq_table_find_index_dl() 'best' is always less or equals to 'pos', so `best - pos' returns a negative value which is then getting casted to `unsigned int' and passed to __cpufreq_driver_target()->acpi_cpufreq_target() for policy->freq_table selection. This results in BUG: unable to handle kernel paging request at ffff881019b469f8 IP: [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] PGD 267f067 PUD 0 Oops: 0000 [#1] PREEMPT SMP CPU: 6 PID: 70 Comm: kworker/6:1 Not tainted 4.9.0-rc1-next-20161017-dbg-dirty Workqueue: events dbs_work_handler task: ffff88041b808000 task.stack: ffff88041b810000 RIP: 0010:[<ffffffffa00356c1>] [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP: 0018:ffff88041b813c60 EFLAGS: 00010282 RAX: ffff880419b46a00 RBX: ffff88041b848400 RCX: ffff880419b20f80 RDX: 00000000001dff38 RSI: 00000000ffffffff RDI: ffff88041b848400 RBP: ffff88041b813cb0 R08: 0000000000000006 R09: 0000000000000040 R10: ffffffff8207f9e0 R11: ffffffff8173595b R12: 0000000000000000 R13: ffff88041f1dff38 R14: 0000000000262900 R15: 0000000bfffffff4 FS: 0000000000000000(0000) GS:ffff88041f000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff881019b469f8 CR3: 000000041a2d3000 CR4: 00000000001406e0 Stack: ffff88041b813cb0 ffffffff813347f9 ffff88041b813ca0 ffffffff81334663 ffff88041f1d4bc0 ffff88041b848400 0000000000000000 0000000000000000 0000000000262900 0000000000000000 ffff88041b813d00 ffffffff813355dc Call Trace: [<ffffffff813347f9>] ? cpufreq_freq_transition_begin+0xf1/0xfc [<ffffffff81334663>] ? get_cpu_idle_time+0x97/0xa6 [<ffffffff813355dc>] __cpufreq_driver_target+0x3b6/0x44e [<ffffffff81336ca3>] cs_dbs_timer+0x11a/0x135 [<ffffffff81336fda>] dbs_work_handler+0x39/0x62 [<ffffffff81057823>] process_one_work+0x280/0x4a5 [<ffffffff81058719>] worker_thread+0x24f/0x397 [<ffffffff810584ca>] ? rescuer_thread+0x30b/0x30b [<ffffffff81418380>] ? nl80211_get_key+0x29/0x36a [<ffffffff8105d2b7>] kthread+0xfc/0x104 [<ffffffff8107ceea>] ? put_lock_stats.isra.9+0xe/0x20 [<ffffffff8105d1bb>] ? kthread_create_on_node+0x3f/0x3f [<ffffffff814b2092>] ret_from_fork+0x22/0x30 Code: 56 4d 6b ff 0c 41 55 41 54 53 48 83 ec 28 48 8b 15 ad 1e 00 00 44 8b 41 08 48 8b 87 c8 00 00 00 49 89 d5 4e 03 2c c5 80 b2 78 81 <46> 8b 74 38 04 45 3b 75 00 75 11 31 c0 83 39 00 0f 84 1c 01 00 RIP [<ffffffffa00356c1>] acpi_cpufreq_target+0x4f/0x190 [acpi_cpufreq] RSP <ffff88041b813c60> CR2: ffff881019b469f8 ---[ end trace 16d9fc7a17897d37 ]--- [ rjw: In some cases this bug may also cause incorrect frequencies to be selected by cpufreq governors. ] Fixes: `899bb6642f` (cpufreq: skip invalid entries when searching the frequency) Link: http://marc.info/?l=linux-kernel&m=147672030714331&w=2 Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com> Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-20 16:35:50 +02:00
Aaro Koskinen	899bb6642f	cpufreq: skip invalid entries when searching the frequency Skip invalid entries when searching the frequency. This fixes cpufreq at least on loongson2 MIPS board. Fixes: `da0c6dc00c` (cpufreq: Handle sorted frequency tables more efficiently) Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-12 21:01:18 +02:00
Steve Muckle	e3c0623608	cpufreq: add cpufreq_driver_resolve_freq() Cpufreq governors may need to know what a particular target frequency maps to in the driver without necessarily wanting to set the frequency. Support this operation via a new cpufreq API, cpufreq_driver_resolve_freq(). This API returns the lowest driver frequency equal or greater than the target frequency (CPUFREQ_RELATION_L), subject to any policy (min/max) or driver limitations. The mapping is also cached in the policy so that a subsequent fast_switch operation can avoid repeating the same lookup. The API will call a new cpufreq driver callback, resolve_freq(), if it has been registered by the driver. Otherwise the frequency is resolved via cpufreq_frequency_table_target(). Rather than require ->target() style drivers to provide a resolve_freq() callback it is left to the caller to ensure that the driver implements this callback if necessary to use cpufreq_driver_resolve_freq(). Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Steve Muckle <smuckle@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-21 14:46:08 +02:00
Viresh Kumar	da0c6dc00c	cpufreq: Handle sorted frequency tables more efficiently cpufreq drivers aren't required to provide a sorted frequency table today, and even the ones which provide a sorted table aren't handled efficiently by cpufreq core. This patch adds infrastructure to verify if the freq-table provided by the drivers is sorted or not, and use efficient helpers if they are sorted. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-07-07 00:13:20 +02:00
Viresh Kumar	d218ed7739	cpufreq: Return index from cpufreq_frequency_table_target() This routine can't fail unless the frequency table is invalid and doesn't contain any valid entries. Make it return the index and WARN() in case it is used for an invalid table. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:06 +02:00
Viresh Kumar	7ab4aabbaa	cpufreq: Drop freq-table param to cpufreq_frequency_table_target() The policy already has this pointer set, use it instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:06 +02:00
Viresh Kumar	f8bfc116ca	cpufreq: Remove cpufreq_frequency_get_table() Most of the callers of cpufreq_frequency_get_table() already have the pointer to a valid 'policy' structure and they don't really need to go through the per-cpu variable first and then a check to validate the frequency, in order to find the freq-table for the policy. Directly use the policy->freq_table field instead for them. Only one user of that API is left after above changes, cpu_cooling.c and it accesses the freq_table in a racy way as the policy can get freed in between. Fix it by using cpufreq_cpu_get() properly. Since there are no more users of cpufreq_frequency_get_table() left, get rid of it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Javi Merino <javi.merino@arm.com> (cpu_cooling.c) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-09 00:58:05 +02:00
Rafael J. Wysocki	1aefc75b24	cpufreq: stats: Make the stats code non-modular The modularity of cpufreq_stats is quite problematic. First off, the usage of policy notifiers for the initialization and cleanup in the cpufreq_stats module is inherently racy with respect to CPU offline/online and the initialization and cleanup of the cpufreq driver. Second, fast frequency switching (used by the schedutil governor) cannot be enabled if any transition notifiers are registered, so if the cpufreq_stats module (that registers a transition notifier for updating transition statistics) is loaded, the schedutil governor cannot use fast frequency switching. On the other hand, allowing cpufreq_stats to be built as a module doesn't really add much value. Arguably, there's not much reason for that code to be modular at all. For the above reasons, make the cpufreq stats code non-modular, modify the core to invoke functions provided by that code directly and drop the notifiers from it. Make the stats sysfs attributes appear empty if fast frequency switching is enabled as the statistics will not be updated in that case anyway (and returning -EBUSY from those attributes breaks powertop). While at it, clean up Kconfig help for the CPU_FREQ_STAT and CPU_FREQ_STAT_DETAILS options. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:41 +02:00
Rafael J. Wysocki	9a15fb2c79	cpufreq: Drop the 'initialized' field from struct cpufreq_governor The 'initialized' field in struct cpufreq_governor is only used by the conservative governor (as a usage counter) and the way that happens is far from straightforward and arguably incorrect. Namely, the value of 'initialized' is checked by cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and the results of those checks are passed (as the second argument) to the ->init() and ->exit() callbacks in struct dbs_governor. Those callbacks are only implemented by the ondemand and conservative governors and ondemand doesn't use their second argument at all. In turn, the conservative governor uses it to decide whether or not to either register or unregister a transition notifier. That whole mechanism is not only unnecessarily convoluted, but also racy, because the 'initialized' field of struct cpufreq_governor is updated in cpufreq_init_governor() and cpufreq_exit_governor() under policy->rwsem which doesn't help if one of these functions is run twice in parallel for different policies (which isn't impossible in principle), for example. Instead of it, add a proper usage counter to the conservative governor and update it from cs_init() and cs_exit() which is guaranteed to be non-racy, as those functions are only called under gov_dbs_data_mutex which is global. With that in place, drop the 'initialized' field from struct cpufreq_governor as it is not used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:39 +02:00
Viresh Kumar	bf2be2de84	cpufreq: governor: Create cpufreq_policy_apply_limits() Create a new helper to avoid code duplication across governors. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-02 23:24:39 +02:00
Rafael J. Wysocki	e788892ba3	cpufreq: governor: Get rid of governor events The design of the cpufreq governor API is not very straightforward, as struct cpufreq_governor provides only one callback to be invoked from different code paths for different purposes. The purpose it is invoked for is determined by its second "event" argument, causing it to act as a "callback multiplexer" of sorts. Unfortunately, that leads to extra complexity in governors, some of which implement the ->governor() callback as a switch statement that simply checks the event argument and invokes a separate function to handle that specific event. That extra complexity can be eliminated by replacing the all-purpose ->governor() callback with a family of callbacks to carry out specific governor operations: initialization and exit, start and stop and policy limits updates. That also turns out to reduce the code size too, so do it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:15 +02:00
Rafael J. Wysocki	6c9d9c8192	cpufreq: Call cpufreq_disable_fast_switch() in sugov_exit() Due to differences in the cpufreq core's handling of runtime CPU offline and nonboot CPUs disabling during system suspend-to-RAM, fast frequency switching gets disabled after a suspend-to-RAM and resume cycle on all of the nonboot CPUs. To prevent that from happening, move the invocation of cpufreq_disable_fast_switch() from cpufreq_exit_governor() to sugov_exit(), as the schedutil governor is the only user of fast frequency switching today anyway. That simply prevents cpufreq_disable_fast_switch() from being called without invoking the ->governor callback for the CPUFREQ_GOV_POLICY_EXIT event (which happens during system suspend now). Fixes: `b7898fda5b` (cpufreq: Support for fast frequency switching) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-08 22:41:36 +02:00
Rafael J. Wysocki	b7898fda5b	cpufreq: Support for fast frequency switching Modify the ACPI cpufreq driver to provide a method for switching CPU frequencies from interrupt context and update the cpufreq core to support that method if available. Introduce a new cpufreq driver callback, ->fast_switch, to be invoked for frequency switching from interrupt context by (future) governors supporting that feature via (new) helper function cpufreq_driver_fast_switch(). Add two new policy flags, fast_switch_possible, to be set by the cpufreq driver if fast frequency switching can be used for the given policy and fast_switch_enabled, to be set by the governor if it is going to use fast frequency switching for the given policy. Also add a helper for setting the latter. Since fast frequency switching is inherently incompatible with cpufreq transition notifiers, make it possible to set the fast_switch_enabled only if there are no transition notifiers already registered and make the registration of new transition notifiers fail if fast_switch_enabled is set for at least one policy. Implement the ->fast_switch callback in the ACPI cpufreq driver and make it set fast_switch_possible during policy initialization as appropriate. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:03 +02:00
Rafael J. Wysocki	379480d825	cpufreq: Move governor symbols to cpufreq.h Move definitions of symbols related to transition latency and sampling rate to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	66893b6ac9	cpufreq: Move governor attribute set headers to cpufreq.h Move definitions and function headers related to struct gov_attr_set to include/linux/cpufreq.h so they can be used by (future) goverernors located outside of drivers/cpufreq/. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:02 +02:00
Rafael J. Wysocki	a5acbfbd70	Merge branch 'pm-cpufreq-governor' into pm-cpufreq	2016-03-10 20:46:03 +01:00
Rafael J. Wysocki	adaf9fcd13	cpufreq: Move scheduler-related code to the sched directory Create cpufreq.c under kernel/sched/ and move the cpufreq code related to the scheduler to that file and to sched.h. Redefine cpufreq_update_util() as a static inline function to avoid function calls at its call sites in the scheduler code (as suggested by Peter Zijlstra). Also move the definition of struct update_util_data and declaration of cpufreq_set_update_util_data() from include/linux/cpufreq.h to include/linux/sched.h. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2016-03-10 20:44:47 +01:00
Viresh Kumar	242aa883a6	cpufreq: Remove 'policy->governor_enabled' The entire sequence of events (like INIT/START or STOP/EXIT) for which cpufreq_governor() is called, is guaranteed to be protected by policy->rwsem now. The additional checks that were added earlier (as we were forced to drop policy->rwsem before calling cpufreq_governor() for EXIT event), aren't required anymore. Over that, they weren't sufficient really. They just take care of START/STOP events, but not INIT/EXIT and the state machine was never maintained properly by them. Kill the unnecessary checks and policy->governor_enabled field. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:12 +01:00
Viresh Kumar	68e80dae09	Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Earlier, when the struct freq-attr was used to represent governor attributes, the standard cpufreq show/store sysfs attribute callbacks were applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That could have resulted in an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. The previous commit, "cpufreq: governor: New sysfs show/store callbacks for governor tunables", fixed the original ABBA deadlock by adding new governor specific show/store callbacks. We don't have to drop rwsem around invocations of governor event CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now. Fixes: `955ef48335` (cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reported-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:59 +01:00
Rafael J. Wysocki	34e2c555f3	cpufreq: Add mechanism for registering utilization update callbacks Introduce a mechanism by which parts of the cpufreq subsystem ("setpolicy" drivers or the core) can register callbacks to be executed from cpufreq_update_util() which is invoked by the scheduler's update_load_avg() on CPU utilization changes. This allows the "setpolicy" drivers to dispense with their timers and do all of the computations they need and frequency/voltage adjustments in the update_load_avg() code path, among other things. The update_load_avg() changes were suggested by Peter Zijlstra. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@kernel.org>	2016-03-09 14:39:19 +01:00
Rafael J. Wysocki	34b0870515	cpufreq: Simplify the cpufreq_for_each_valid_entry() That macro uses an internal static inline function that is first totally unnecessary and second hard to read, so simplify it and get rid of that monster. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-26 22:11:56 +01:00
Rafael J. Wysocki	de1df26b7c	cpufreq: Clean up default and fallback governor setup The preprocessor magic used for setting the default cpufreq governor (and for using the performance governor as a fallback one for that matter) is really nasty, so replace it with __weak functions and overrides. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-05 02:37:42 +01:00
Rafael J. Wysocki	7a6c79f2fe	cpufreq: Simplify core code related to boost support Notice that the boost_supported field in struct cpufreq_driver is redundant, because the driver's ->set_boost callback may be left unset if "boost" is not supported. Moreover, the only driver populating the ->set_boost callback is acpi_cpufreq, so make it avoid populating that callback if "boost" is not supported, rework the core to check ->set_boost instead of boost_supported to verify "boost" support and drop boost_supported which isn't used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-01-01 03:49:51 +01:00
Rafael J. Wysocki	41669da030	cpufreq: Make cpufreq_boost_supported() static cpufreq_boost_supported() is not used outside of cpufreq.c, so make it static. While at it, refactor it as a one-liner (which it really is). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-01-01 03:49:51 +01:00
Srinivas Pandruvada	69030dd1c3	cpufreq: use last policy after online for drivers with ->setpolicy For cpufreq drivers which use setpolicy interface, after offline->online the policy is set to default. This can be reproduced by setting the default policy of intel_pstate or longrun to ondemand and then change to "performance". After offline and online, the setpolicy will be called with the policy=ondemand. For drivers using governors this condition is handled by storing last_governor, during offline and restoring during online. The same should be done for drivers using setpolicy interface. Storing last_policy during offline and restoring during online. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-02 23:50:33 +01:00
Viresh Kumar	96bdda61f5	cpufreq: create cpu/cpufreq/policyX directories The cpufreq sysfs interface had been a bit inconsistent as one of the CPUs for a policy had a real directory within its sysfs 'cpuX' directory and all other CPUs had links to it. That also made the code a bit complex as we need to take care of moving the sysfs directory if the CPU containing the real directory is getting physically hot-unplugged. Solve this by creating 'policyX' directories (per-policy) in /sys/devices/system/cpu/cpufreq/ directory, where X is the CPU for which the policy was first created. This also removes the need of keeping kobj_cpu and we can remove it now. Suggested-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: is more of a general agreement from the person that he is Reviewed-by: is a more strict tag and implies that the reviewer has Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-28 09:21:12 +01:00
Viresh Kumar	c82bd44437	cpufreq: remove cpufreq_sysfs_{create\|remove}_file() They don't do anything special now, remove the unnecessary wrapper. Reviewed-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-28 09:21:12 +01:00
Viresh Kumar	8eec1020f0	cpufreq: create cpu/cpufreq at boot time Later patches will need to create policy specific directories in /sys/devices/system/cpu/cpufreq/ directory and so the cpufreq directory wouldn't be ever empty. And so no fun creating/destroying it on need basis anymore. Create it once on system boot. Reviewed-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-10-28 09:21:12 +01:00
Rafael J. Wysocki	1f0bd44e93	cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get() cpufreq_cpu_get() called by get_cur_freq_on_cpu() is overkill, because the ->get() callback is always invoked in a context in which all of the conditions checked by cpufreq_cpu_get() are guaranteed to be satisfied. Use cpufreq_cpu_get_raw() instead of it and drop the corresponding cpufreq_cpu_put() from get_cur_freq_on_cpu(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2015-09-16 02:17:49 +02:00
Rafael J. Wysocki	ac2a29c8a4	Merge branch 'pm-opp' * pm-opp: PM / OPP: Drop unlikely before IS_ERR(_OR_NULL) PM / OPP: Fix static checker warning (broken 64bit big endian systems) PM / OPP: Free resources and properly return error on failure cpufreq-dt: make scaling_boost_freqs sysfs attr available when boost is enabled cpufreq: dt: Add support for turbo/boost mode cpufreq: dt: Add support for operating-points-v2 bindings cpufreq: Allow drivers to enable boost support after registering driver cpufreq: Update boost flag while initializing freq table from OPPs PM / OPP: add dev_pm_opp_is_turbo() helper PM / OPP: Add helpers for initializing CPU OPPs PM / OPP: Add support for opp-suspend PM / OPP: Add OPP sharing information to OPP library PM / OPP: Add clock-latency-ns support PM / OPP: Add support to parse "operating-points-v2" bindings PM / OPP: Break _opp_add_dynamic() into smaller functions PM / OPP: Allocate dev_opp from _add_device_opp() PM / OPP: Create _remove_device_opp() for freeing dev_opp PM / OPP: Relocate few routines PM / OPP: Create a directory for opp bindings PM / OPP: Update bindings to make opp-hz a 64 bit value	2015-09-01 15:52:41 +02:00
Viresh Kumar	c7a7b418dd	cpufreq: rename cpufreq_real_policy as cpufreq_user_policy Its all about caching min/max freq requested by userspace, and the name 'cpufreq_real_policy' doesn't fit that well. Rename it to cpufreq_user_policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-09-01 15:50:39 +02:00
Viresh Kumar	88dc438495	cpufreq: remove redundant 'policy' field from user_policy Its always same as policy->policy, and there is no need to keep another copy of it. Remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-09-01 15:50:39 +02:00
Viresh Kumar	e27f8bd248	cpufreq: remove redundant 'governor' field from user_policy Its always same as policy->governor, and there is no need to keep another copy of it. Remove it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-09-01 15:50:39 +02:00
Viresh Kumar	6bfb7c7434	cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event What's being done from CPUFREQ_INCOMPATIBLE, can also be done with CPUFREQ_ADJUST. There is nothing special with CPUFREQ_INCOMPATIBLE notifier. Kill CPUFREQ_INCOMPATIBLE and fix its usage sites. This also updates the numbering of notifier events to remove holes. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-09-01 15:50:38 +02:00
Bartlomiej Zolnierkiewicz	21c36d3571	cpufreq-dt: make scaling_boost_freqs sysfs attr available when boost is enabled Make scaling_boost_freqs sysfs attribute is available when cpufreq-dt driver is used and boost support is enabled. Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-08-08 04:35:10 +02:00
Viresh Kumar	44139ed494	cpufreq: Allow drivers to enable boost support after registering driver In some cases it wouldn't be known at time of driver registration, if the driver needs to support boost frequencies. For example, while getting boost information from DT with opp-v2 bindings, we need to parse the bindings for all the CPUs to know if turbo/boost OPPs are supported or not. One way out to do that efficiently is to delay supporting boost mode (i.e. creating /sys/devices/system/cpu/cpufreq/boost file), until the time OPP bindings are parsed. At that point, the driver can enable boost support. This can be done at ->init(), where the frequency table is created. To do that, the driver requires few APIs from cpufreq core that let him do this. This patch provides these APIs. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-08-07 03:25:23 +02:00
Rafael J. Wysocki	559ed40752	cpufreq: Avoid attempts to create duplicate symbolic links After commit `87549141d5` (cpufreq: Stop migrating sysfs files on hotplug) there is a problem with CPUs that share cpufreq policy objects with other CPUs and are initially offline. Say CPU1 shares a policy with CPU0 which is online and is registered first. As part of the registration process, cpufreq_add_dev() is called for it. It creates the policy object and a symbolic link to it from the CPU1's sysfs directory. If CPU1 is registered subsequently and it is offline at that time, cpufreq_add_dev() will attempt to create a symbolic link to the policy object for it, but that link is present already, so a warning about that will be triggered. To avoid that warning, make cpufreq use an additional CPU mask containing related CPUs that are actually present for each policy object. That mask is initialized when the policy object is populated after its creation (for the first online CPU using it) and it includes CPUs from the "policy CPUs" mask returned by the cpufreq driver's ->init() callback that are physically present at that time. Symbolic links to the policy are created only for the CPUs in that mask. If cpufreq_add_dev() is invoked for an offline CPU, it checks the new mask and only creates the symlink if the CPU was not in it (the CPU is added to the mask at the same time). In turn, cpufreq_remove_dev() drops the given CPU from the new mask, removes its symlink to the policy object and returns, unless it is the CPU owning the policy object. In that case, the policy object is moved to a new CPU's sysfs directory or deleted if the CPU being removed was the last user of the policy. While at it, notice that cpufreq_remove_dev() can't fail, because its return value is ignored, so make it ignore return values from __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish() and prevent these functions from aborting on errors returned by __cpufreq_governor(). Also drop the now unused sif argument from them. Fixes: `87549141d5` (cpufreq: Stop migrating sysfs files on hotplug) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2015-07-28 17:19:26 +02:00
Saravana Kannan	9d16f20711	cpufreq: Track cpu managing sysfs kobjects separately In order to prepare for the next few commits, that will stop migrating sysfs files on cpu hotplug, this patch starts managing sysfs-cpu separately. The behavior is still the same as we are still migrating sysfs files on hotplug, later commits would change that. Signed-off-by: Saravana Kannan <skannan@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-05-23 00:49:04 +02:00
Viresh Kumar	4573237b01	cpufreq: Manage governor usage history with 'policy->last_governor' History of which governor was used last is common to all CPUs within a policy and maintaining it per-cpu isn't the best approach for sure. Apart from wasting memory, this also increases the complexity of managing this data structure as it has to be updated for all CPUs. To make that somewhat simpler, lets store this information in a new field 'last_governor' in struct cpufreq_policy and update it on removal of last cpu of a policy. As a side-effect it also solves an old problem, consider a system with two clusters 0 & 1. And there is one policy per cluster. Cluster 0: CPU0 and 1. Cluster 1: CPU2 and 3. - CPU2 is first brought online, and governor is set to performance (default as cpufreq_cpu_governor wasn't set). - Governor is changed to ondemand. - CPU2 is taken offline and cpufreq_cpu_governor is updated for CPU2. - CPU3 is brought online. - Because cpufreq_cpu_governor wasn't set for CPU3, the default governor performance is picked for CPU3. This patch fixes the bug as we now have a single variable to update for policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-05-15 02:44:17 +02:00
Viresh Kumar	d9f354460d	cpufreq: remove CPUFREQ_UPDATE_POLICY_CPU notifications CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which doesn't use it anymore. Remove them. This also decrements values of other notification macros defined after CPUFREQ_UPDATE_POLICY_CPU by 1 to remove gaps. Hopefully all users are using macro's instead of direct numbers and so they wouldn't break as macro values are changed now. Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-01-23 23:06:44 +01:00
Viresh Kumar	7c418ff099	cpufreq: Remove (now) unused 'last_cpu' from struct cpufreq_policy 'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of it. Reviewed-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-01-23 23:06:44 +01:00

1 2 3 4

167 Commits