linux_dsm_epyc7002/drivers/thermal/thermal_core.c

1688 lines
44 KiB
C
Raw Normal View History

// SPDX-License-Identifier: GPL-2.0
/*
* thermal.c - Generic Thermal Management Sysfs support.
*
* Copyright (C) 2008 Intel Corp
* Copyright (C) 2008 Zhang Rui <rui.zhang@intel.com>
* Copyright (C) 2008 Sujith Thomas <sujith.thomas@intel.com>
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/device.h>
#include <linux/err.h>
#include <linux/export.h>
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-24 15:04:11 +07:00
#include <linux/slab.h>
#include <linux/kdev_t.h>
#include <linux/idr.h>
#include <linux/thermal.h>
#include <linux/reboot.h>
#include <linux/string.h>
#include <linux/of.h>
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
#include <linux/suspend.h>
#define CREATE_TRACE_POINTS
#include <trace/events/thermal.h>
#include "thermal_core.h"
#include "thermal_hwmon.h"
static DEFINE_IDA(thermal_tz_ida);
static DEFINE_IDA(thermal_cdev_ida);
static LIST_HEAD(thermal_tz_list);
static LIST_HEAD(thermal_cdev_list);
static LIST_HEAD(thermal_governor_list);
static DEFINE_MUTEX(thermal_list_lock);
static DEFINE_MUTEX(thermal_governor_lock);
static DEFINE_MUTEX(poweroff_lock);
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
static atomic_t in_suspend;
static bool power_off_triggered;
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
static struct thermal_governor *def_governor;
/*
* Governor section: set of functions to handle thermal governors
*
* Functions to help in the life cycle of thermal governors within
* the thermal core and by the thermal governor code.
*/
static struct thermal_governor *__find_governor(const char *name)
{
struct thermal_governor *pos;
if (!name || !name[0])
return def_governor;
list_for_each_entry(pos, &thermal_governor_list, governor_list)
if (!strncasecmp(name, pos->name, THERMAL_NAME_LENGTH))
return pos;
return NULL;
}
/**
* bind_previous_governor() - bind the previous governor of the thermal zone
* @tz: a valid pointer to a struct thermal_zone_device
* @failed_gov_name: the name of the governor that failed to register
*
* Register the previous governor of the thermal zone after a new
* governor has failed to be bound.
*/
static void bind_previous_governor(struct thermal_zone_device *tz,
const char *failed_gov_name)
{
if (tz->governor && tz->governor->bind_to_tz) {
if (tz->governor->bind_to_tz(tz)) {
dev_err(&tz->device,
"governor %s failed to bind and the previous one (%s) failed to bind again, thermal zone %s has no governor\n",
failed_gov_name, tz->governor->name, tz->type);
tz->governor = NULL;
}
}
}
/**
* thermal_set_governor() - Switch to another governor
* @tz: a valid pointer to a struct thermal_zone_device
* @new_gov: pointer to the new governor
*
* Change the governor of thermal zone @tz.
*
* Return: 0 on success, an error if the new governor's bind_to_tz() failed.
*/
static int thermal_set_governor(struct thermal_zone_device *tz,
struct thermal_governor *new_gov)
{
int ret = 0;
if (tz->governor && tz->governor->unbind_from_tz)
tz->governor->unbind_from_tz(tz);
if (new_gov && new_gov->bind_to_tz) {
ret = new_gov->bind_to_tz(tz);
if (ret) {
bind_previous_governor(tz, new_gov->name);
return ret;
}
}
tz->governor = new_gov;
return ret;
}
int thermal_register_governor(struct thermal_governor *governor)
{
int err;
const char *name;
struct thermal_zone_device *pos;
if (!governor)
return -EINVAL;
mutex_lock(&thermal_governor_lock);
err = -EBUSY;
if (!__find_governor(governor->name)) {
bool match_default;
err = 0;
list_add(&governor->governor_list, &thermal_governor_list);
match_default = !strncmp(governor->name,
DEFAULT_THERMAL_GOVERNOR,
THERMAL_NAME_LENGTH);
if (!def_governor && match_default)
def_governor = governor;
}
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node) {
/*
* only thermal zones with specified tz->tzp->governor_name
* may run with tz->govenor unset
*/
if (pos->governor)
continue;
name = pos->tzp->governor_name;
if (!strncasecmp(name, governor->name, THERMAL_NAME_LENGTH)) {
int ret;
ret = thermal_set_governor(pos, governor);
if (ret)
dev_err(&pos->device,
"Failed to set governor %s for thermal zone %s: %d\n",
governor->name, pos->type, ret);
}
}
mutex_unlock(&thermal_list_lock);
mutex_unlock(&thermal_governor_lock);
return err;
}
void thermal_unregister_governor(struct thermal_governor *governor)
{
struct thermal_zone_device *pos;
if (!governor)
return;
mutex_lock(&thermal_governor_lock);
if (!__find_governor(governor->name))
goto exit;
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node) {
if (!strncasecmp(pos->governor->name, governor->name,
THERMAL_NAME_LENGTH))
thermal_set_governor(pos, NULL);
}
mutex_unlock(&thermal_list_lock);
list_del(&governor->governor_list);
exit:
mutex_unlock(&thermal_governor_lock);
}
int thermal_zone_device_set_policy(struct thermal_zone_device *tz,
char *policy)
{
struct thermal_governor *gov;
int ret = -EINVAL;
mutex_lock(&thermal_governor_lock);
mutex_lock(&tz->lock);
gov = __find_governor(strim(policy));
if (!gov)
goto exit;
ret = thermal_set_governor(tz, gov);
exit:
mutex_unlock(&tz->lock);
mutex_unlock(&thermal_governor_lock);
thermal_notify_tz_gov_change(tz->id, policy);
return ret;
}
int thermal_build_list_of_policies(char *buf)
{
struct thermal_governor *pos;
ssize_t count = 0;
ssize_t size = PAGE_SIZE;
mutex_lock(&thermal_governor_lock);
list_for_each_entry(pos, &thermal_governor_list, governor_list) {
size = PAGE_SIZE - count;
count += scnprintf(buf + count, size, "%s ", pos->name);
}
count += scnprintf(buf + count, size, "\n");
mutex_unlock(&thermal_governor_lock);
return count;
}
static void __init thermal_unregister_governors(void)
{
struct thermal_governor **governor;
for_each_governor_table(governor)
thermal_unregister_governor(*governor);
}
static int __init thermal_register_governors(void)
{
int ret = 0;
struct thermal_governor **governor;
for_each_governor_table(governor) {
ret = thermal_register_governor(*governor);
if (ret) {
pr_err("Failed to register governor: '%s'",
(*governor)->name);
break;
}
pr_info("Registered thermal governor '%s'",
(*governor)->name);
}
if (ret) {
struct thermal_governor **gov;
for_each_governor_table(gov) {
if (gov == governor)
break;
thermal_unregister_governor(*gov);
}
}
return ret;
}
/*
* Zone update section: main control loop applied to each zone while monitoring
*
* in polling mode. The monitoring is done using a workqueue.
* Same update may be done on a zone by calling thermal_zone_device_update().
*
* An update means:
* - Non-critical trips will invoke the governor responsible for that zone;
* - Hot trips will produce a notification to userspace;
* - Critical trip point will cause a system shutdown.
*/
static void thermal_zone_device_set_polling(struct thermal_zone_device *tz,
int delay)
{
if (delay > 1000)
mod_delayed_work(system_freezable_power_efficient_wq,
&tz->poll_queue,
round_jiffies(msecs_to_jiffies(delay)));
else if (delay)
mod_delayed_work(system_freezable_power_efficient_wq,
&tz->poll_queue,
msecs_to_jiffies(delay));
else
thermal: Fix deadlock in thermal thermal_zone_device_check 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid a use-after-free issue. However, cancel_delayed_work_sync could be called insides the WQ causing deadlock. [54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000 [54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.642447] c0 1162 Call trace: [54109.642456] c0 1162 __switch_to+0x138/0x158 [54109.642467] c0 1162 __schedule+0xba4/0x1434 [54109.642480] c0 1162 schedule_timeout+0xa0/0xb28 [54109.642492] c0 1162 wait_for_common+0x138/0x2e8 [54109.642511] c0 1162 flush_work+0x348/0x40c [54109.642522] c0 1162 __cancel_work_timer+0x180/0x218 [54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.642574] c0 1162 process_one_work+0x3cc/0x69c [54109.642583] c0 1162 worker_thread+0x49c/0x7c0 [54109.642593] c0 1162 kthread+0x17c/0x1b0 [54109.642602] c0 1162 ret_from_fork+0x10/0x18 [54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000 [54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.643077] c0 1162 Call trace: [54109.643085] c0 1162 __switch_to+0x138/0x158 [54109.643095] c0 1162 __schedule+0xba4/0x1434 [54109.643104] c0 1162 schedule_timeout+0xa0/0xb28 [54109.643114] c0 1162 wait_for_common+0x138/0x2e8 [54109.643122] c0 1162 flush_work+0x348/0x40c [54109.643131] c0 1162 __cancel_work_timer+0x180/0x218 [54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.643167] c0 1162 process_one_work+0x3cc/0x69c [54109.643177] c0 1162 worker_thread+0x49c/0x7c0 [54109.643186] c0 1162 kthread+0x17c/0x1b0 [54109.643195] c0 1162 ret_from_fork+0x10/0x18 [54109.644500] c0 1162 cat D 0 7766 1 0x00000001 [54109.644515] c0 1162 Call trace: [54109.644524] c0 1162 __switch_to+0x138/0x158 [54109.644536] c0 1162 __schedule+0xba4/0x1434 [54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0 [54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0 [54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20 [54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360 [54109.644586] c0 1162 temp_show+0x30/0x78 [54109.644609] c0 1162 dev_attr_show+0x5c/0xf0 [54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4 [54109.644636] c0 1162 kernfs_seq_show+0x48/0x88 [54109.644656] c0 1162 seq_read+0x1f4/0x73c [54109.644664] c0 1162 kernfs_fop_read+0x84/0x318 [54109.644683] c0 1162 __vfs_read+0x50/0x1bc [54109.644692] c0 1162 vfs_read+0xa4/0x140 [54109.644701] c0 1162 SyS_read+0xbc/0x144 [54109.644708] c0 1162 el0_svc_naked+0x34/0x38 [54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic] Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") Cc: stable@vger.kernel.org Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-11-13 03:42:23 +07:00
cancel_delayed_work(&tz->poll_queue);
}
static inline bool should_stop_polling(struct thermal_zone_device *tz)
{
return !thermal_zone_device_is_enabled(tz);
}
static void monitor_thermal_zone(struct thermal_zone_device *tz)
{
bool stop;
stop = should_stop_polling(tz);
mutex_lock(&tz->lock);
if (!stop && tz->passive)
thermal_zone_device_set_polling(tz, tz->passive_delay);
else if (!stop && tz->polling_delay)
thermal_zone_device_set_polling(tz, tz->polling_delay);
else
thermal_zone_device_set_polling(tz, 0);
mutex_unlock(&tz->lock);
}
static void handle_non_critical_trips(struct thermal_zone_device *tz, int trip)
{
tz->governor ? tz->governor->throttle(tz, trip) :
def_governor->throttle(tz, trip);
}
/**
* thermal_emergency_poweroff_func - emergency poweroff work after a known delay
* @work: work_struct associated with the emergency poweroff function
*
* This function is called in very critical situations to force
* a kernel poweroff after a configurable timeout value.
*/
static void thermal_emergency_poweroff_func(struct work_struct *work)
{
/*
* We have reached here after the emergency thermal shutdown
* Waiting period has expired. This means orderly_poweroff has
* not been able to shut off the system for some reason.
* Try to shut down the system immediately using kernel_power_off
* if populated
*/
WARN(1, "Attempting kernel_power_off: Temperature too high\n");
kernel_power_off();
/*
* Worst of the worst case trigger emergency restart
*/
WARN(1, "Attempting emergency_restart: Temperature too high\n");
emergency_restart();
}
static DECLARE_DELAYED_WORK(thermal_emergency_poweroff_work,
thermal_emergency_poweroff_func);
/**
* thermal_emergency_poweroff - Trigger an emergency system poweroff
*
* This may be called from any critical situation to trigger a system shutdown
* after a known period of time. By default this is not scheduled.
*/
static void thermal_emergency_poweroff(void)
{
int poweroff_delay_ms = CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS;
/*
* poweroff_delay_ms must be a carefully profiled positive value.
* Its a must for thermal_emergency_poweroff_work to be scheduled
*/
if (poweroff_delay_ms <= 0)
return;
schedule_delayed_work(&thermal_emergency_poweroff_work,
msecs_to_jiffies(poweroff_delay_ms));
}
static void handle_critical_trips(struct thermal_zone_device *tz,
int trip, enum thermal_trip_type trip_type)
{
thermal: consistently use int for temperatures The thermal code uses int, long and unsigned long for temperatures in different places. Using an unsigned type limits the thermal framework to positive temperatures without need. Also several drivers currently will report temperatures near UINT_MAX for temperatures below 0°C. This will probably immediately shut the machine down due to overtemperature if started below 0°C. 'long' is 64bit on several architectures. This is not needed since INT_MAX °mC is above the melting point of all known materials. Consistently use a plain 'int' for temperatures throughout the thermal code and the drivers. This only changes the places in the drivers where the temperature is passed around as pointer, when drivers internally use another type this is not changed. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Lukasz Majewski <l.majewski@samsung.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Peter Feuerer <peter@piie.net> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Jean Delvare <jdelvare@suse.de> Cc: Peter Feuerer <peter@piie.net> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Lukasz Majewski <l.majewski@samsung.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: linux-acpi@vger.kernel.org Cc: platform-driver-x86@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-omap@vger.kernel.org Cc: linux-samsung-soc@vger.kernel.org Cc: Guenter Roeck <linux@roeck-us.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: Darren Hart <dvhart@infradead.org> Cc: lm-sensors@lm-sensors.org Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-07-24 13:12:54 +07:00
int trip_temp;
tz->ops->get_trip_temp(tz, trip, &trip_temp);
/* If we have not crossed the trip_temp, we do not care. */
if (trip_temp <= 0 || tz->temperature < trip_temp)
return;
trace_thermal_zone_trip(tz, trip, trip_type);
if (tz->ops->notify)
tz->ops->notify(tz, trip, trip_type);
if (trip_type == THERMAL_TRIP_CRITICAL) {
dev_emerg(&tz->device,
"critical temperature reached (%d C), shutting down\n",
tz->temperature / 1000);
mutex_lock(&poweroff_lock);
if (!power_off_triggered) {
/*
* Queue a backup emergency shutdown in the event of
* orderly_poweroff failure
*/
thermal_emergency_poweroff();
orderly_poweroff(true);
power_off_triggered = true;
}
mutex_unlock(&poweroff_lock);
}
}
static void handle_thermal_trip(struct thermal_zone_device *tz, int trip)
{
enum thermal_trip_type type;
int trip_temp, hyst = 0;
2016-03-18 09:03:24 +07:00
/* Ignore disabled trip points */
if (test_bit(trip, &tz->trips_disabled))
return;
tz->ops->get_trip_temp(tz, trip, &trip_temp);
tz->ops->get_trip_type(tz, trip, &type);
if (tz->ops->get_trip_hyst)
tz->ops->get_trip_hyst(tz, trip, &hyst);
if (tz->last_temperature != THERMAL_TEMP_INVALID) {
if (tz->last_temperature < trip_temp &&
tz->temperature >= trip_temp)
thermal_notify_tz_trip_up(tz->id, trip);
if (tz->last_temperature >= trip_temp &&
tz->temperature < (trip_temp - hyst))
thermal_notify_tz_trip_down(tz->id, trip);
}
if (type == THERMAL_TRIP_CRITICAL || type == THERMAL_TRIP_HOT)
handle_critical_trips(tz, trip, type);
else
handle_non_critical_trips(tz, trip);
/*
* Alright, we handled this trip successfully.
* So, start monitoring again.
*/
monitor_thermal_zone(tz);
}
static void update_temperature(struct thermal_zone_device *tz)
{
thermal: consistently use int for temperatures The thermal code uses int, long and unsigned long for temperatures in different places. Using an unsigned type limits the thermal framework to positive temperatures without need. Also several drivers currently will report temperatures near UINT_MAX for temperatures below 0°C. This will probably immediately shut the machine down due to overtemperature if started below 0°C. 'long' is 64bit on several architectures. This is not needed since INT_MAX °mC is above the melting point of all known materials. Consistently use a plain 'int' for temperatures throughout the thermal code and the drivers. This only changes the places in the drivers where the temperature is passed around as pointer, when drivers internally use another type this is not changed. Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Lukasz Majewski <l.majewski@samsung.com> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Peter Feuerer <peter@piie.net> Cc: Punit Agrawal <punit.agrawal@arm.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Eduardo Valentin <edubezval@gmail.com> Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Jean Delvare <jdelvare@suse.de> Cc: Peter Feuerer <peter@piie.net> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Lukasz Majewski <l.majewski@samsung.com> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Thierry Reding <thierry.reding@gmail.com> Cc: linux-acpi@vger.kernel.org Cc: platform-driver-x86@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-omap@vger.kernel.org Cc: linux-samsung-soc@vger.kernel.org Cc: Guenter Roeck <linux@roeck-us.net> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: Darren Hart <dvhart@infradead.org> Cc: lm-sensors@lm-sensors.org Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2015-07-24 13:12:54 +07:00
int temp, ret;
ret = thermal_zone_get_temp(tz, &temp);
if (ret) {
if (ret != -EAGAIN)
dev_warn(&tz->device,
"failed to read out thermal zone (%d)\n",
ret);
return;
}
mutex_lock(&tz->lock);
tz->last_temperature = tz->temperature;
tz->temperature = temp;
mutex_unlock(&tz->lock);
thermal: debug: add debug statement for core and step_wise To ease debugging thermal problem, add these dynamic debug statements so that user do not need rebuild kernel to see these info. Based on a patch from Zhang Rui for debugging on bugzilla: https://bugzilla.kernel.org/attachment.cgi?id=98671 A sample output after we turn on dynamic debug with the following cmd: # echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control is like: [ 355.147627] update_temperature: thermal thermal_zone0: last_temperature=52000, current_temperature=55000 [ 355.147636] thermal_zone_trip_update: thermal thermal_zone0: Trip1[type=1,temp=79000]:trend=2,throttle=0 [ 355.147644] get_target_state: thermal cooling_device8: cur_state=0 [ 355.147647] thermal_zone_trip_update: thermal cooling_device8: old_target=-1, target=-1 [ 355.147652] get_target_state: thermal cooling_device7: cur_state=0 [ 355.147655] thermal_zone_trip_update: thermal cooling_device7: old_target=-1, target=-1 [ 355.147660] get_target_state: thermal cooling_device6: cur_state=0 [ 355.147663] thermal_zone_trip_update: thermal cooling_device6: old_target=-1, target=-1 [ 355.147668] get_target_state: thermal cooling_device5: cur_state=0 [ 355.147671] thermal_zone_trip_update: thermal cooling_device5: old_target=-1, target=-1 [ 355.147678] thermal_zone_trip_update: thermal thermal_zone0: Trip2[type=0,temp=90000]:trend=1,throttle=0 [ 355.147776] get_target_state: thermal cooling_device0: cur_state=0 [ 355.147783] thermal_zone_trip_update: thermal cooling_device0: old_target=-1, target=-1 [ 355.147792] thermal_zone_trip_update: thermal thermal_zone0: Trip3[type=0,temp=80000]:trend=1,throttle=0 [ 355.147845] get_target_state: thermal cooling_device1: cur_state=0 [ 355.147849] thermal_zone_trip_update: thermal cooling_device1: old_target=-1, target=-1 [ 355.147856] thermal_zone_trip_update: thermal thermal_zone0: Trip4[type=0,temp=70000]:trend=1,throttle=0 [ 355.147904] get_target_state: thermal cooling_device2: cur_state=0 [ 355.147908] thermal_zone_trip_update: thermal cooling_device2: old_target=-1, target=-1 [ 355.147915] thermal_zone_trip_update: thermal thermal_zone0: Trip5[type=0,temp=60000]:trend=1,throttle=0 [ 355.147963] get_target_state: thermal cooling_device3: cur_state=0 [ 355.147967] thermal_zone_trip_update: thermal cooling_device3: old_target=-1, target=-1 [ 355.147973] thermal_zone_trip_update: thermal thermal_zone0: Trip6[type=0,temp=55000]:trend=1,throttle=1 [ 355.148022] get_target_state: thermal cooling_device4: cur_state=0 [ 355.148025] thermal_zone_trip_update: thermal cooling_device4: old_target=-1, target=1 [ 355.148036] thermal_cdev_update: thermal cooling_device4: zone0->target=1 [ 355.169279] thermal_cdev_update: thermal cooling_device4: set to state 1 Signed-off-by: Aaron Lu <aaron.lu@intel.com> Acked-by: Eduardo Valentin <eduardo.valentin@ti.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2013-12-02 12:54:26 +07:00
trace_thermal_temperature(tz);
thermal_genl_sampling_temp(tz->id, temp);
}
static void thermal_zone_device_init(struct thermal_zone_device *tz)
{
struct thermal_instance *pos;
tz->temperature = THERMAL_TEMP_INVALID;
list_for_each_entry(pos, &tz->thermal_instances, tz_node)
pos->initialized = false;
}
static void thermal_zone_device_reset(struct thermal_zone_device *tz)
{
tz->passive = 0;
thermal_zone_device_init(tz);
}
static int thermal_zone_device_set_mode(struct thermal_zone_device *tz,
enum thermal_device_mode mode)
{
int ret = 0;
mutex_lock(&tz->lock);
/* do nothing if mode isn't changing */
if (mode == tz->mode) {
mutex_unlock(&tz->lock);
return ret;
}
if (tz->ops->change_mode)
ret = tz->ops->change_mode(tz, mode);
if (!ret)
tz->mode = mode;
mutex_unlock(&tz->lock);
thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
if (mode == THERMAL_DEVICE_ENABLED)
thermal_notify_tz_enable(tz->id);
else
thermal_notify_tz_disable(tz->id);
return ret;
}
int thermal_zone_device_enable(struct thermal_zone_device *tz)
{
return thermal_zone_device_set_mode(tz, THERMAL_DEVICE_ENABLED);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_enable);
int thermal_zone_device_disable(struct thermal_zone_device *tz)
{
return thermal_zone_device_set_mode(tz, THERMAL_DEVICE_DISABLED);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_disable);
int thermal_zone_device_is_enabled(struct thermal_zone_device *tz)
{
enum thermal_device_mode mode;
mutex_lock(&tz->lock);
mode = tz->mode;
mutex_unlock(&tz->lock);
return mode == THERMAL_DEVICE_ENABLED;
}
void thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
int count;
if (should_stop_polling(tz))
return;
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
if (atomic_read(&in_suspend))
return;
if (!tz->ops->get_temp)
return;
update_temperature(tz);
thermal_zone_set_trips(tz);
tz->notify_event = event;
for (count = 0; count < tz->trips; count++)
handle_thermal_trip(tz, count);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_update);
/**
* thermal_notify_framework - Sensor drivers use this API to notify framework
* @tz: thermal zone device
* @trip: indicates which trip point has been crossed
*
* This function handles the trip events from sensor drivers. It starts
* throttling the cooling devices according to the policy configured.
* For CRITICAL and HOT trip points, this notifies the respective drivers,
* and does actual throttling for other trip points i.e ACTIVE and PASSIVE.
* The throttling policy is based on the configured platform data; if no
* platform data is provided, this uses the step_wise throttling policy.
*/
void thermal_notify_framework(struct thermal_zone_device *tz, int trip)
{
handle_thermal_trip(tz, trip);
}
EXPORT_SYMBOL_GPL(thermal_notify_framework);
static void thermal_zone_device_check(struct work_struct *work)
{
struct thermal_zone_device *tz = container_of(work, struct
thermal_zone_device,
poll_queue.work);
thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
}
/*
* Power actor section: interface to power actors to estimate power
*
* Set of functions used to interact to cooling devices that know
* how to estimate their devices power consumption.
*/
/**
* power_actor_get_max_power() - get the maximum power that a cdev can consume
* @cdev: pointer to &thermal_cooling_device
* @tz: a valid thermal zone device pointer
* @max_power: pointer in which to store the maximum power
*
* Calculate the maximum power consumption in milliwats that the
* cooling device can currently consume and store it in @max_power.
*
* Return: 0 on success, -EINVAL if @cdev doesn't support the
* power_actor API or -E* on other error.
*/
int power_actor_get_max_power(struct thermal_cooling_device *cdev,
struct thermal_zone_device *tz, u32 *max_power)
{
if (!cdev_is_power_actor(cdev))
return -EINVAL;
return cdev->ops->state2power(cdev, tz, 0, max_power);
}
/**
* power_actor_get_min_power() - get the mainimum power that a cdev can consume
* @cdev: pointer to &thermal_cooling_device
* @tz: a valid thermal zone device pointer
* @min_power: pointer in which to store the minimum power
*
* Calculate the minimum power consumption in milliwatts that the
* cooling device can currently consume and store it in @min_power.
*
* Return: 0 on success, -EINVAL if @cdev doesn't support the
* power_actor API or -E* on other error.
*/
int power_actor_get_min_power(struct thermal_cooling_device *cdev,
struct thermal_zone_device *tz, u32 *min_power)
{
unsigned long max_state;
int ret;
if (!cdev_is_power_actor(cdev))
return -EINVAL;
ret = cdev->ops->get_max_state(cdev, &max_state);
if (ret)
return ret;
return cdev->ops->state2power(cdev, tz, max_state, min_power);
}
/**
* power_actor_set_power() - limit the maximum power a cooling device consumes
* @cdev: pointer to &thermal_cooling_device
* @instance: thermal instance to update
* @power: the power in milliwatts
*
* Set the cooling device to consume at most @power milliwatts. The limit is
* expected to be a cap at the maximum power consumption.
*
* Return: 0 on success, -EINVAL if the cooling device does not
* implement the power actor API or -E* for other failures.
*/
int power_actor_set_power(struct thermal_cooling_device *cdev,
struct thermal_instance *instance, u32 power)
{
unsigned long state;
int ret;
if (!cdev_is_power_actor(cdev))
return -EINVAL;
ret = cdev->ops->power2state(cdev, instance->tz, power, &state);
if (ret)
return ret;
instance->target = state;
thermal: fix race condition when updating cooling device When multiple thermal zones are bound to the same cooling device, multiple kernel threads may want to update the cooling device state by calling thermal_cdev_update(). Having cdev not protected by a mutex can lead to a race condition. Consider the following situation with two kernel threads k1 and k2: Thread k1 Thread k2 || || call thermal_cdev_update() || ... || set_cur_state(cdev, target); call power_actor_set_power() || ... || instance->target = state; || cdev->updated = false; || || cdev->updated = true; || // completes execution call thermal_cdev_update() || // cdev->updated == true || return; || \/ time k2 has already looped through the thermal instances looking for the deepest cooling device state and is preempted right before setting cdev->updated to true. Now, k1 runs, modifies the thermal instance state and sets cdev->updated to false. Then, k1 is preempted and k2 continues the execution by setting cdev->updated to true, therefore preventing k1 from performing the update. Notice that this is not an issue if k2 looks at the instance->target modified by k1 "after" it is assigned by k1. In fact, in this case the update will happen anyway and k1 can safely return immediately from thermal_cdev_update(). This may lead to a situation where a thermal governor never updates the cooling device. For example, this is the case for the step_wise governor: when calling the function thermal_zone_trip_update(), the governor may always get a new state equal to the old one (which, however, wasn't notified to the cooling device) and will therefore skip the update. CC: Zhang Rui <rui.zhang@intel.com> CC: Eduardo Valentin <edubezval@gmail.com> CC: Peter Feuerer <peter@piie.net> Reported-by: Toby Huang <toby.huang@arm.com> Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-06-02 21:25:31 +07:00
mutex_lock(&cdev->lock);
cdev->updated = false;
thermal: fix race condition when updating cooling device When multiple thermal zones are bound to the same cooling device, multiple kernel threads may want to update the cooling device state by calling thermal_cdev_update(). Having cdev not protected by a mutex can lead to a race condition. Consider the following situation with two kernel threads k1 and k2: Thread k1 Thread k2 || || call thermal_cdev_update() || ... || set_cur_state(cdev, target); call power_actor_set_power() || ... || instance->target = state; || cdev->updated = false; || || cdev->updated = true; || // completes execution call thermal_cdev_update() || // cdev->updated == true || return; || \/ time k2 has already looped through the thermal instances looking for the deepest cooling device state and is preempted right before setting cdev->updated to true. Now, k1 runs, modifies the thermal instance state and sets cdev->updated to false. Then, k1 is preempted and k2 continues the execution by setting cdev->updated to true, therefore preventing k1 from performing the update. Notice that this is not an issue if k2 looks at the instance->target modified by k1 "after" it is assigned by k1. In fact, in this case the update will happen anyway and k1 can safely return immediately from thermal_cdev_update(). This may lead to a situation where a thermal governor never updates the cooling device. For example, this is the case for the step_wise governor: when calling the function thermal_zone_trip_update(), the governor may always get a new state equal to the old one (which, however, wasn't notified to the cooling device) and will therefore skip the update. CC: Zhang Rui <rui.zhang@intel.com> CC: Eduardo Valentin <edubezval@gmail.com> CC: Peter Feuerer <peter@piie.net> Reported-by: Toby Huang <toby.huang@arm.com> Signed-off-by: Michele Di Giorgio <michele.digiorgio@arm.com> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-06-02 21:25:31 +07:00
mutex_unlock(&cdev->lock);
thermal_cdev_update(cdev);
return 0;
}
void thermal_zone_device_rebind_exception(struct thermal_zone_device *tz,
const char *cdev_type, size_t size)
{
struct thermal_cooling_device *cdev = NULL;
mutex_lock(&thermal_list_lock);
list_for_each_entry(cdev, &thermal_cdev_list, node) {
/* skip non matching cdevs */
if (strncmp(cdev_type, cdev->type, size))
continue;
/* re binding the exception matching the type pattern */
thermal_zone_bind_cooling_device(tz, THERMAL_TRIPS_NONE, cdev,
THERMAL_NO_LIMIT,
THERMAL_NO_LIMIT,
THERMAL_WEIGHT_DEFAULT);
}
mutex_unlock(&thermal_list_lock);
}
int for_each_thermal_governor(int (*cb)(struct thermal_governor *, void *),
void *data)
{
struct thermal_governor *gov;
int ret = 0;
mutex_lock(&thermal_governor_lock);
list_for_each_entry(gov, &thermal_governor_list, governor_list) {
ret = cb(gov, data);
if (ret)
break;
}
mutex_unlock(&thermal_governor_lock);
return ret;
}
int for_each_thermal_cooling_device(int (*cb)(struct thermal_cooling_device *,
void *), void *data)
{
struct thermal_cooling_device *cdev;
int ret = 0;
mutex_lock(&thermal_list_lock);
list_for_each_entry(cdev, &thermal_cdev_list, node) {
ret = cb(cdev, data);
if (ret)
break;
}
mutex_unlock(&thermal_list_lock);
return ret;
}
int for_each_thermal_zone(int (*cb)(struct thermal_zone_device *, void *),
void *data)
{
struct thermal_zone_device *tz;
int ret = 0;
mutex_lock(&thermal_list_lock);
list_for_each_entry(tz, &thermal_tz_list, node) {
ret = cb(tz, data);
if (ret)
break;
}
mutex_unlock(&thermal_list_lock);
return ret;
}
struct thermal_zone_device *thermal_zone_get_by_id(int id)
{
struct thermal_zone_device *tz, *match = NULL;
mutex_lock(&thermal_list_lock);
list_for_each_entry(tz, &thermal_tz_list, node) {
if (tz->id == id) {
match = tz;
break;
}
}
mutex_unlock(&thermal_list_lock);
return match;
}
void thermal_zone_device_unbind_exception(struct thermal_zone_device *tz,
const char *cdev_type, size_t size)
{
struct thermal_cooling_device *cdev = NULL;
mutex_lock(&thermal_list_lock);
list_for_each_entry(cdev, &thermal_cdev_list, node) {
/* skip non matching cdevs */
if (strncmp(cdev_type, cdev->type, size))
continue;
/* unbinding the exception matching the type pattern */
thermal_zone_unbind_cooling_device(tz, THERMAL_TRIPS_NONE,
cdev);
}
mutex_unlock(&thermal_list_lock);
}
/*
* Device management section: cooling devices, zones devices, and binding
*
* Set of functions provided by the thermal core for:
* - cooling devices lifecycle: registration, unregistration,
* binding, and unbinding.
* - thermal zone devices lifecycle: registration, unregistration,
* binding, and unbinding.
*/
/**
* thermal_zone_bind_cooling_device() - bind a cooling device to a thermal zone
* @tz: pointer to struct thermal_zone_device
* @trip: indicates which trip point the cooling devices is
* associated with in this thermal zone.
* @cdev: pointer to struct thermal_cooling_device
* @upper: the Maximum cooling state for this trip point.
* THERMAL_NO_LIMIT means no upper limit,
* and the cooling device can be in max_state.
* @lower: the Minimum cooling state can be used for this trip point.
* THERMAL_NO_LIMIT means no lower limit,
* and the cooling device can be in cooling state 0.
* @weight: The weight of the cooling device to be bound to the
* thermal zone. Use THERMAL_WEIGHT_DEFAULT for the
* default value
*
* This interface function bind a thermal cooling device to the certain trip
* point of a thermal zone device.
* This function is usually called in the thermal zone device .bind callback.
*
* Return: 0 on success, the proper error value otherwise.
*/
int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
int trip,
struct thermal_cooling_device *cdev,
unsigned long upper, unsigned long lower,
unsigned int weight)
{
struct thermal_instance *dev;
struct thermal_instance *pos;
struct thermal_zone_device *pos1;
struct thermal_cooling_device *pos2;
unsigned long max_state;
int result, ret;
if (trip >= tz->trips || (trip < 0 && trip != THERMAL_TRIPS_NONE))
return -EINVAL;
list_for_each_entry(pos1, &thermal_tz_list, node) {
if (pos1 == tz)
break;
}
list_for_each_entry(pos2, &thermal_cdev_list, node) {
if (pos2 == cdev)
break;
}
if (tz != pos1 || cdev != pos2)
return -EINVAL;
ret = cdev->ops->get_max_state(cdev, &max_state);
if (ret)
return ret;
/* lower default 0, upper default max_state */
lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
upper = upper == THERMAL_NO_LIMIT ? max_state : upper;
if (lower > upper || upper > max_state)
return -EINVAL;
dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
dev->tz = tz;
dev->cdev = cdev;
dev->trip = trip;
dev->upper = upper;
dev->lower = lower;
dev->target = THERMAL_NO_TARGET;
dev->weight = weight;
result = ida_simple_get(&tz->ida, 0, 0, GFP_KERNEL);
if (result < 0)
goto free_mem;
dev->id = result;
sprintf(dev->name, "cdev%d", dev->id);
result =
sysfs_create_link(&tz->device.kobj, &cdev->device.kobj, dev->name);
if (result)
goto release_ida;
sprintf(dev->attr_name, "cdev%d_trip_point", dev->id);
sysfs_attr_init(&dev->attr.attr);
dev->attr.attr.name = dev->attr_name;
dev->attr.attr.mode = 0444;
dev->attr.show = trip_point_show;
result = device_create_file(&tz->device, &dev->attr);
if (result)
goto remove_symbol_link;
sprintf(dev->weight_attr_name, "cdev%d_weight", dev->id);
sysfs_attr_init(&dev->weight_attr.attr);
dev->weight_attr.attr.name = dev->weight_attr_name;
dev->weight_attr.attr.mode = S_IWUSR | S_IRUGO;
dev->weight_attr.show = weight_show;
dev->weight_attr.store = weight_store;
result = device_create_file(&tz->device, &dev->weight_attr);
if (result)
goto remove_trip_file;
mutex_lock(&tz->lock);
mutex_lock(&cdev->lock);
list_for_each_entry(pos, &tz->thermal_instances, tz_node)
if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
result = -EEXIST;
break;
}
if (!result) {
list_add_tail(&dev->tz_node, &tz->thermal_instances);
list_add_tail(&dev->cdev_node, &cdev->thermal_instances);
atomic_set(&tz->need_update, 1);
}
mutex_unlock(&cdev->lock);
mutex_unlock(&tz->lock);
if (!result)
return 0;
device_remove_file(&tz->device, &dev->weight_attr);
remove_trip_file:
device_remove_file(&tz->device, &dev->attr);
remove_symbol_link:
sysfs_remove_link(&tz->device.kobj, dev->name);
release_ida:
ida_simple_remove(&tz->ida, dev->id);
free_mem:
kfree(dev);
return result;
}
EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
/**
* thermal_zone_unbind_cooling_device() - unbind a cooling device from a
* thermal zone.
* @tz: pointer to a struct thermal_zone_device.
* @trip: indicates which trip point the cooling devices is
* associated with in this thermal zone.
* @cdev: pointer to a struct thermal_cooling_device.
*
* This interface function unbind a thermal cooling device from the certain
* trip point of a thermal zone device.
* This function is usually called in the thermal zone device .unbind callback.
*
* Return: 0 on success, the proper error value otherwise.
*/
int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
int trip,
struct thermal_cooling_device *cdev)
{
struct thermal_instance *pos, *next;
mutex_lock(&tz->lock);
mutex_lock(&cdev->lock);
list_for_each_entry_safe(pos, next, &tz->thermal_instances, tz_node) {
if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
list_del(&pos->tz_node);
list_del(&pos->cdev_node);
mutex_unlock(&cdev->lock);
mutex_unlock(&tz->lock);
goto unbind;
}
}
mutex_unlock(&cdev->lock);
mutex_unlock(&tz->lock);
return -ENODEV;
unbind:
thermal: remove dangling 'weight_attr' device file This file isn't getting removed while we unbind a device from thermal zone. And this causes following messages when the device is registered again: WARNING: CPU: 0 PID: 2228 at /home/viresh/linux/fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x70() sysfs: cannot create duplicate filename '/devices/virtual/thermal/thermal_zone0/cdev0_weight' Modules linked in: cpufreq_dt(+) [last unloaded: cpufreq_dt] CPU: 0 PID: 2228 Comm: insmod Not tainted 4.2.0-rc3-00059-g44fffd9473eb #272 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [<c00153e8>] (unwind_backtrace) from [<c0012368>] (show_stack+0x10/0x14) [<c0012368>] (show_stack) from [<c053a684>] (dump_stack+0x84/0xc4) [<c053a684>] (dump_stack) from [<c002284c>] (warn_slowpath_common+0x80/0xb0) [<c002284c>] (warn_slowpath_common) from [<c00228ac>] (warn_slowpath_fmt+0x30/0x40) [<c00228ac>] (warn_slowpath_fmt) from [<c012d524>] (sysfs_warn_dup+0x60/0x70) [<c012d524>] (sysfs_warn_dup) from [<c012d244>] (sysfs_add_file_mode_ns+0x13c/0x190) [<c012d244>] (sysfs_add_file_mode_ns) from [<c012d2d4>] (sysfs_create_file_ns+0x3c/0x48) [<c012d2d4>] (sysfs_create_file_ns) from [<c03c04a8>] (thermal_zone_bind_cooling_device+0x260/0x358) [<c03c04a8>] (thermal_zone_bind_cooling_device) from [<c03c2e70>] (of_thermal_bind+0x88/0xb4) [<c03c2e70>] (of_thermal_bind) from [<c03c10d0>] (__thermal_cooling_device_register+0x17c/0x2e0) [<c03c10d0>] (__thermal_cooling_device_register) from [<c03c3f50>] (__cpufreq_cooling_register+0x3a0/0x51c) [<c03c3f50>] (__cpufreq_cooling_register) from [<bf00505c>] (cpufreq_ready+0x44/0x88 [cpufreq_dt]) [<bf00505c>] (cpufreq_ready [cpufreq_dt]) from [<c03d6c30>] (cpufreq_add_dev+0x4a0/0x7dc) [<c03d6c30>] (cpufreq_add_dev) from [<c02cd3ec>] (subsys_interface_register+0x94/0xd8) [<c02cd3ec>] (subsys_interface_register) from [<c03d785c>] (cpufreq_register_driver+0x10c/0x1f0) [<c03d785c>] (cpufreq_register_driver) from [<bf0057d4>] (dt_cpufreq_probe+0x60/0x8c [cpufreq_dt]) [<bf0057d4>] (dt_cpufreq_probe [cpufreq_dt]) from [<c02d03e4>] (platform_drv_probe+0x44/0xa4) [<c02d03e4>] (platform_drv_probe) from [<c02cead8>] (driver_probe_device+0x174/0x2b4) [<c02cead8>] (driver_probe_device) from [<c02ceca4>] (__driver_attach+0x8c/0x90) [<c02ceca4>] (__driver_attach) from [<c02cd078>] (bus_for_each_dev+0x68/0x9c) [<c02cd078>] (bus_for_each_dev) from [<c02ce2f0>] (bus_add_driver+0x19c/0x214) [<c02ce2f0>] (bus_add_driver) from [<c02cf490>] (driver_register+0x78/0xf8) [<c02cf490>] (driver_register) from [<c0009710>] (do_one_initcall+0x8c/0x1d4) [<c0009710>] (do_one_initcall) from [<c05396b0>] (do_init_module+0x5c/0x1b8) [<c05396b0>] (do_init_module) from [<c0086490>] (load_module+0xd34/0xed8) [<c0086490>] (load_module) from [<c0086704>] (SyS_init_module+0xd0/0x120) [<c0086704>] (SyS_init_module) from [<c000f480>] (ret_fast_syscall+0x0/0x3c) ---[ end trace 3be0e7b7dc6e3c4f ]--- Fixes: db91651311c8 ("thermal: export weight to sysfs") Acked-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2015-07-23 16:02:32 +07:00
device_remove_file(&tz->device, &pos->weight_attr);
device_remove_file(&tz->device, &pos->attr);
sysfs_remove_link(&tz->device.kobj, pos->name);
ida_simple_remove(&tz->ida, pos->id);
kfree(pos);
return 0;
}
EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
static void thermal_release(struct device *dev)
{
struct thermal_zone_device *tz;
struct thermal_cooling_device *cdev;
if (!strncmp(dev_name(dev), "thermal_zone",
sizeof("thermal_zone") - 1)) {
tz = to_thermal_zone(dev);
thermal_zone_destroy_device_groups(tz);
kfree(tz);
} else if (!strncmp(dev_name(dev), "cooling_device",
sizeof("cooling_device") - 1)) {
cdev = to_cooling_device(dev);
kfree(cdev);
}
}
static struct class thermal_class = {
.name = "thermal",
.dev_release = thermal_release,
};
static inline
void print_bind_err_msg(struct thermal_zone_device *tz,
struct thermal_cooling_device *cdev, int ret)
{
dev_err(&tz->device, "binding zone %s with cdev %s failed:%d\n",
tz->type, cdev->type, ret);
}
static void __bind(struct thermal_zone_device *tz, int mask,
struct thermal_cooling_device *cdev,
unsigned long *limits,
unsigned int weight)
{
int i, ret;
for (i = 0; i < tz->trips; i++) {
if (mask & (1 << i)) {
unsigned long upper, lower;
upper = THERMAL_NO_LIMIT;
lower = THERMAL_NO_LIMIT;
if (limits) {
lower = limits[i * 2];
upper = limits[i * 2 + 1];
}
ret = thermal_zone_bind_cooling_device(tz, i, cdev,
upper, lower,
weight);
if (ret)
print_bind_err_msg(tz, cdev, ret);
}
}
}
static void bind_cdev(struct thermal_cooling_device *cdev)
{
int i, ret;
const struct thermal_zone_params *tzp;
struct thermal_zone_device *pos = NULL;
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node) {
if (!pos->tzp && !pos->ops->bind)
continue;
if (pos->ops->bind) {
ret = pos->ops->bind(pos, cdev);
if (ret)
print_bind_err_msg(pos, cdev, ret);
continue;
}
tzp = pos->tzp;
if (!tzp || !tzp->tbp)
continue;
for (i = 0; i < tzp->num_tbps; i++) {
if (tzp->tbp[i].cdev || !tzp->tbp[i].match)
continue;
if (tzp->tbp[i].match(pos, cdev))
continue;
tzp->tbp[i].cdev = cdev;
__bind(pos, tzp->tbp[i].trip_mask, cdev,
tzp->tbp[i].binding_limits,
tzp->tbp[i].weight);
}
}
mutex_unlock(&thermal_list_lock);
}
/**
* __thermal_cooling_device_register() - register a new thermal cooling device
* @np: a pointer to a device tree node.
* @type: the thermal cooling device type.
* @devdata: device private data.
* @ops: standard thermal cooling devices callbacks.
*
* This interface function adds a new thermal cooling device (fan/processor/...)
* to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
* to all the thermal zone devices registered at the same time.
* It also gives the opportunity to link the cooling device to a device tree
* node, so that it can be bound to a thermal zone created out of device tree.
*
* Return: a pointer to the created struct thermal_cooling_device or an
* ERR_PTR. Caller must check return value with IS_ERR*() helpers.
*/
static struct thermal_cooling_device *
__thermal_cooling_device_register(struct device_node *np,
const char *type, void *devdata,
const struct thermal_cooling_device_ops *ops)
{
struct thermal_cooling_device *cdev;
struct thermal_zone_device *pos = NULL;
int result;
if (type && strlen(type) >= THERMAL_NAME_LENGTH)
return ERR_PTR(-EINVAL);
if (!ops || !ops->get_max_state || !ops->get_cur_state ||
!ops->set_cur_state)
return ERR_PTR(-EINVAL);
cdev = kzalloc(sizeof(*cdev), GFP_KERNEL);
if (!cdev)
return ERR_PTR(-ENOMEM);
result = ida_simple_get(&thermal_cdev_ida, 0, 0, GFP_KERNEL);
if (result < 0) {
kfree(cdev);
return ERR_PTR(result);
}
cdev->id = result;
strlcpy(cdev->type, type ? : "", sizeof(cdev->type));
mutex_init(&cdev->lock);
INIT_LIST_HEAD(&cdev->thermal_instances);
cdev->np = np;
cdev->ops = ops;
cdev->updated = false;
cdev->device.class = &thermal_class;
cdev->devdata = devdata;
thermal: Add cooling device's statistics in sysfs This extends the sysfs interface for thermal cooling devices and exposes some pretty useful statistics. These statistics have proven to be quite useful specially while doing benchmarks related to the task scheduler, where we want to make sure that nothing has disrupted the test, specially the cooling device which may have put constraints on the CPUs. The information exposed here tells us to what extent the CPUs were constrained by the thermal framework. The write-only "reset" file is used to reset the statistics. The read-only "time_in_state_ms" file shows the time (in msec) spent by the device in the respective cooling states, and it prints one line per cooling state. The read-only "total_trans" file shows single positive integer value showing the total number of cooling state transitions the device has gone through since the time the cooling device is registered or the time when statistics were reset last. The read-only "trans_table" file shows a two dimensional matrix, where an entry <i,j> (row i, column j) represents the number of transitions from State_i to State_j. This is how the directory structure looks like for a single cooling device: $ ls -R /sys/class/thermal/cooling_device0/ /sys/class/thermal/cooling_device0/: cur_state max_state power stats subsystem type uevent /sys/class/thermal/cooling_device0/power: autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status /sys/class/thermal/cooling_device0/stats: reset time_in_state_ms total_trans trans_table This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and ARM 64-bit Hisilicon hikey960 board running Android. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-04-02 17:56:25 +07:00
thermal_cooling_device_setup_sysfs(cdev);
dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
result = device_register(&cdev->device);
if (result) {
ida_simple_remove(&thermal_cdev_ida, cdev->id);
put_device(&cdev->device);
return ERR_PTR(result);
}
/* Add 'this' new cdev to the global cdev list */
mutex_lock(&thermal_list_lock);
list_add(&cdev->node, &thermal_cdev_list);
mutex_unlock(&thermal_list_lock);
/* Update binding information for 'this' new cdev */
bind_cdev(cdev);
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node)
if (atomic_cmpxchg(&pos->need_update, 1, 0))
thermal_zone_device_update(pos,
THERMAL_EVENT_UNSPECIFIED);
mutex_unlock(&thermal_list_lock);
return cdev;
}
/**
* thermal_cooling_device_register() - register a new thermal cooling device
* @type: the thermal cooling device type.
* @devdata: device private data.
* @ops: standard thermal cooling devices callbacks.
*
* This interface function adds a new thermal cooling device (fan/processor/...)
* to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
* to all the thermal zone devices registered at the same time.
*
* Return: a pointer to the created struct thermal_cooling_device or an
* ERR_PTR. Caller must check return value with IS_ERR*() helpers.
*/
struct thermal_cooling_device *
thermal_cooling_device_register(const char *type, void *devdata,
const struct thermal_cooling_device_ops *ops)
{
return __thermal_cooling_device_register(NULL, type, devdata, ops);
}
EXPORT_SYMBOL_GPL(thermal_cooling_device_register);
/**
* thermal_of_cooling_device_register() - register an OF thermal cooling device
* @np: a pointer to a device tree node.
* @type: the thermal cooling device type.
* @devdata: device private data.
* @ops: standard thermal cooling devices callbacks.
*
* This function will register a cooling device with device tree node reference.
* This interface function adds a new thermal cooling device (fan/processor/...)
* to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
* to all the thermal zone devices registered at the same time.
*
* Return: a pointer to the created struct thermal_cooling_device or an
* ERR_PTR. Caller must check return value with IS_ERR*() helpers.
*/
struct thermal_cooling_device *
thermal_of_cooling_device_register(struct device_node *np,
const char *type, void *devdata,
const struct thermal_cooling_device_ops *ops)
{
return __thermal_cooling_device_register(np, type, devdata, ops);
}
EXPORT_SYMBOL_GPL(thermal_of_cooling_device_register);
static void thermal_cooling_device_release(struct device *dev, void *res)
{
thermal_cooling_device_unregister(
*(struct thermal_cooling_device **)res);
}
/**
* devm_thermal_of_cooling_device_register() - register an OF thermal cooling
* device
* @dev: a valid struct device pointer of a sensor device.
* @np: a pointer to a device tree node.
* @type: the thermal cooling device type.
* @devdata: device private data.
* @ops: standard thermal cooling devices callbacks.
*
* This function will register a cooling device with device tree node reference.
* This interface function adds a new thermal cooling device (fan/processor/...)
* to /sys/class/thermal/ folder as cooling_device[0-*]. It tries to bind itself
* to all the thermal zone devices registered at the same time.
*
* Return: a pointer to the created struct thermal_cooling_device or an
* ERR_PTR. Caller must check return value with IS_ERR*() helpers.
*/
struct thermal_cooling_device *
devm_thermal_of_cooling_device_register(struct device *dev,
struct device_node *np,
char *type, void *devdata,
const struct thermal_cooling_device_ops *ops)
{
struct thermal_cooling_device **ptr, *tcd;
ptr = devres_alloc(thermal_cooling_device_release, sizeof(*ptr),
GFP_KERNEL);
if (!ptr)
return ERR_PTR(-ENOMEM);
tcd = __thermal_cooling_device_register(np, type, devdata, ops);
if (IS_ERR(tcd)) {
devres_free(ptr);
return tcd;
}
*ptr = tcd;
devres_add(dev, ptr);
return tcd;
}
EXPORT_SYMBOL_GPL(devm_thermal_of_cooling_device_register);
static void __unbind(struct thermal_zone_device *tz, int mask,
struct thermal_cooling_device *cdev)
{
int i;
for (i = 0; i < tz->trips; i++)
if (mask & (1 << i))
thermal_zone_unbind_cooling_device(tz, i, cdev);
}
/**
* thermal_cooling_device_unregister - removes a thermal cooling device
* @cdev: the thermal cooling device to remove.
*
* thermal_cooling_device_unregister() must be called when a registered
* thermal cooling device is no longer needed.
*/
void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
{
int i;
const struct thermal_zone_params *tzp;
struct thermal_zone_device *tz;
struct thermal_cooling_device *pos = NULL;
if (!cdev)
return;
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_cdev_list, node)
if (pos == cdev)
break;
if (pos != cdev) {
/* thermal cooling device not found */
mutex_unlock(&thermal_list_lock);
return;
}
list_del(&cdev->node);
/* Unbind all thermal zones associated with 'this' cdev */
list_for_each_entry(tz, &thermal_tz_list, node) {
if (tz->ops->unbind) {
tz->ops->unbind(tz, cdev);
continue;
}
if (!tz->tzp || !tz->tzp->tbp)
continue;
tzp = tz->tzp;
for (i = 0; i < tzp->num_tbps; i++) {
if (tzp->tbp[i].cdev == cdev) {
__unbind(tz, tzp->tbp[i].trip_mask, cdev);
tzp->tbp[i].cdev = NULL;
}
}
}
mutex_unlock(&thermal_list_lock);
ida_simple_remove(&thermal_cdev_ida, cdev->id);
device_del(&cdev->device);
thermal: Add cooling device's statistics in sysfs This extends the sysfs interface for thermal cooling devices and exposes some pretty useful statistics. These statistics have proven to be quite useful specially while doing benchmarks related to the task scheduler, where we want to make sure that nothing has disrupted the test, specially the cooling device which may have put constraints on the CPUs. The information exposed here tells us to what extent the CPUs were constrained by the thermal framework. The write-only "reset" file is used to reset the statistics. The read-only "time_in_state_ms" file shows the time (in msec) spent by the device in the respective cooling states, and it prints one line per cooling state. The read-only "total_trans" file shows single positive integer value showing the total number of cooling state transitions the device has gone through since the time the cooling device is registered or the time when statistics were reset last. The read-only "trans_table" file shows a two dimensional matrix, where an entry <i,j> (row i, column j) represents the number of transitions from State_i to State_j. This is how the directory structure looks like for a single cooling device: $ ls -R /sys/class/thermal/cooling_device0/ /sys/class/thermal/cooling_device0/: cur_state max_state power stats subsystem type uevent /sys/class/thermal/cooling_device0/power: autosuspend_delay_ms runtime_active_time runtime_suspended_time control runtime_status /sys/class/thermal/cooling_device0/stats: reset time_in_state_ms total_trans trans_table This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and ARM 64-bit Hisilicon hikey960 board running Android. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-04-02 17:56:25 +07:00
thermal_cooling_device_destroy_sysfs(cdev);
put_device(&cdev->device);
}
EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
static void bind_tz(struct thermal_zone_device *tz)
{
int i, ret;
struct thermal_cooling_device *pos = NULL;
const struct thermal_zone_params *tzp = tz->tzp;
if (!tzp && !tz->ops->bind)
return;
mutex_lock(&thermal_list_lock);
/* If there is ops->bind, try to use ops->bind */
if (tz->ops->bind) {
list_for_each_entry(pos, &thermal_cdev_list, node) {
ret = tz->ops->bind(tz, pos);
if (ret)
print_bind_err_msg(tz, pos, ret);
}
goto exit;
}
if (!tzp || !tzp->tbp)
goto exit;
list_for_each_entry(pos, &thermal_cdev_list, node) {
for (i = 0; i < tzp->num_tbps; i++) {
if (tzp->tbp[i].cdev || !tzp->tbp[i].match)
continue;
if (tzp->tbp[i].match(tz, pos))
continue;
tzp->tbp[i].cdev = pos;
__bind(tz, tzp->tbp[i].trip_mask, pos,
tzp->tbp[i].binding_limits,
tzp->tbp[i].weight);
}
}
exit:
mutex_unlock(&thermal_list_lock);
}
/**
* thermal_zone_device_register() - register a new thermal zone device
* @type: the thermal zone device type
* @trips: the number of trip points the thermal zone support
* @mask: a bit string indicating the writeablility of trip points
* @devdata: private device data
* @ops: standard thermal zone device callbacks
* @tzp: thermal zone platform parameters
* @passive_delay: number of milliseconds to wait between polls when
* performing passive cooling
* @polling_delay: number of milliseconds to wait between polls when checking
* whether trip points have been crossed (0 for interrupt
* driven systems)
*
* This interface function adds a new thermal zone device (sensor) to
* /sys/class/thermal folder as thermal_zone[0-*]. It tries to bind all the
* thermal cooling devices registered at the same time.
* thermal_zone_device_unregister() must be called when the device is no
* longer needed. The passive cooling depends on the .get_trend() return value.
*
* Return: a pointer to the created struct thermal_zone_device or an
* in case of error, an ERR_PTR. Caller must check return value with
* IS_ERR*() helpers.
*/
struct thermal_zone_device *
thermal_zone_device_register(const char *type, int trips, int mask,
void *devdata, struct thermal_zone_device_ops *ops,
struct thermal_zone_params *tzp, int passive_delay,
int polling_delay)
{
struct thermal_zone_device *tz;
enum thermal_trip_type trip_type;
2016-03-18 09:03:24 +07:00
int trip_temp;
int id;
int result;
int count;
struct thermal_governor *governor;
if (!type || strlen(type) == 0) {
pr_err("Error: No thermal zone type defined\n");
return ERR_PTR(-EINVAL);
}
if (type && strlen(type) >= THERMAL_NAME_LENGTH) {
pr_err("Error: Thermal zone name (%s) too long, should be under %d chars\n",
type, THERMAL_NAME_LENGTH);
return ERR_PTR(-EINVAL);
}
if (trips > THERMAL_MAX_TRIPS || trips < 0 || mask >> trips) {
pr_err("Error: Incorrect number of thermal trips\n");
return ERR_PTR(-EINVAL);
}
if (!ops) {
pr_err("Error: Thermal zone device ops not defined\n");
return ERR_PTR(-EINVAL);
}
if (trips > 0 && (!ops->get_trip_type || !ops->get_trip_temp))
return ERR_PTR(-EINVAL);
tz = kzalloc(sizeof(*tz), GFP_KERNEL);
if (!tz)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&tz->thermal_instances);
ida_init(&tz->ida);
mutex_init(&tz->lock);
id = ida_simple_get(&thermal_tz_ida, 0, 0, GFP_KERNEL);
if (id < 0) {
result = id;
goto free_tz;
}
tz->id = id;
strlcpy(tz->type, type, sizeof(tz->type));
tz->ops = ops;
tz->tzp = tzp;
tz->device.class = &thermal_class;
tz->devdata = devdata;
tz->trips = trips;
tz->passive_delay = passive_delay;
tz->polling_delay = polling_delay;
/* sys I/F */
/* Add nodes that are always present via .groups */
result = thermal_zone_create_device_groups(tz, mask);
if (result)
goto remove_id;
/* A new thermal zone needs to be updated anyway. */
atomic_set(&tz->need_update, 1);
dev_set_name(&tz->device, "thermal_zone%d", tz->id);
result = device_register(&tz->device);
if (result)
goto release_device;
for (count = 0; count < trips; count++) {
2016-03-18 09:03:24 +07:00
if (tz->ops->get_trip_type(tz, count, &trip_type))
set_bit(count, &tz->trips_disabled);
if (tz->ops->get_trip_temp(tz, count, &trip_temp))
set_bit(count, &tz->trips_disabled);
/* Check for bogus trip points */
if (trip_temp == 0)
set_bit(count, &tz->trips_disabled);
}
/* Update 'this' zone's governor information */
mutex_lock(&thermal_governor_lock);
if (tz->tzp)
governor = __find_governor(tz->tzp->governor_name);
else
governor = def_governor;
result = thermal_set_governor(tz, governor);
if (result) {
mutex_unlock(&thermal_governor_lock);
goto unregister;
}
mutex_unlock(&thermal_governor_lock);
if (!tz->tzp || !tz->tzp->no_hwmon) {
result = thermal_add_hwmon_sysfs(tz);
if (result)
goto unregister;
}
mutex_lock(&thermal_list_lock);
list_add_tail(&tz->node, &thermal_tz_list);
mutex_unlock(&thermal_list_lock);
/* Bind cooling devices for this zone */
bind_tz(tz);
INIT_DELAYED_WORK(&tz->poll_queue, thermal_zone_device_check);
thermal_zone_device_reset(tz);
/* Update the new thermal zone and mark it as already updated. */
if (atomic_cmpxchg(&tz->need_update, 1, 0))
thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED);
thermal_notify_tz_create(tz->id, tz->type);
return tz;
unregister:
device_del(&tz->device);
release_device:
put_device(&tz->device);
tz = NULL;
remove_id:
ida_simple_remove(&thermal_tz_ida, id);
free_tz:
kfree(tz);
return ERR_PTR(result);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_register);
/**
* thermal_device_unregister - removes the registered thermal zone device
* @tz: the thermal zone device to remove
*/
void thermal_zone_device_unregister(struct thermal_zone_device *tz)
{
int i, tz_id;
const struct thermal_zone_params *tzp;
struct thermal_cooling_device *cdev;
struct thermal_zone_device *pos = NULL;
if (!tz)
return;
tzp = tz->tzp;
tz_id = tz->id;
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node)
if (pos == tz)
break;
if (pos != tz) {
/* thermal zone device not found */
mutex_unlock(&thermal_list_lock);
return;
}
list_del(&tz->node);
/* Unbind all cdevs associated with 'this' thermal zone */
list_for_each_entry(cdev, &thermal_cdev_list, node) {
if (tz->ops->unbind) {
tz->ops->unbind(tz, cdev);
continue;
}
if (!tzp || !tzp->tbp)
break;
for (i = 0; i < tzp->num_tbps; i++) {
if (tzp->tbp[i].cdev == cdev) {
__unbind(tz, tzp->tbp[i].trip_mask, cdev);
tzp->tbp[i].cdev = NULL;
}
}
}
mutex_unlock(&thermal_list_lock);
thermal: Fix deadlock in thermal thermal_zone_device_check 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") changed cancel_delayed_work to cancel_delayed_work_sync to avoid a use-after-free issue. However, cancel_delayed_work_sync could be called insides the WQ causing deadlock. [54109.642398] c0 1162 kworker/u17:1 D 0 11030 2 0x00000000 [54109.642437] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.642447] c0 1162 Call trace: [54109.642456] c0 1162 __switch_to+0x138/0x158 [54109.642467] c0 1162 __schedule+0xba4/0x1434 [54109.642480] c0 1162 schedule_timeout+0xa0/0xb28 [54109.642492] c0 1162 wait_for_common+0x138/0x2e8 [54109.642511] c0 1162 flush_work+0x348/0x40c [54109.642522] c0 1162 __cancel_work_timer+0x180/0x218 [54109.642544] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.642553] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.642563] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.642574] c0 1162 process_one_work+0x3cc/0x69c [54109.642583] c0 1162 worker_thread+0x49c/0x7c0 [54109.642593] c0 1162 kthread+0x17c/0x1b0 [54109.642602] c0 1162 ret_from_fork+0x10/0x18 [54109.643051] c0 1162 kworker/u17:2 D 0 16245 2 0x00000000 [54109.643067] c0 1162 Workqueue: thermal_passive_wq thermal_zone_device_check [54109.643077] c0 1162 Call trace: [54109.643085] c0 1162 __switch_to+0x138/0x158 [54109.643095] c0 1162 __schedule+0xba4/0x1434 [54109.643104] c0 1162 schedule_timeout+0xa0/0xb28 [54109.643114] c0 1162 wait_for_common+0x138/0x2e8 [54109.643122] c0 1162 flush_work+0x348/0x40c [54109.643131] c0 1162 __cancel_work_timer+0x180/0x218 [54109.643141] c0 1162 handle_thermal_trip+0x2c4/0x5a4 [54109.643150] c0 1162 thermal_zone_device_update+0x1b4/0x25c [54109.643159] c0 1162 thermal_zone_device_check+0x18/0x24 [54109.643167] c0 1162 process_one_work+0x3cc/0x69c [54109.643177] c0 1162 worker_thread+0x49c/0x7c0 [54109.643186] c0 1162 kthread+0x17c/0x1b0 [54109.643195] c0 1162 ret_from_fork+0x10/0x18 [54109.644500] c0 1162 cat D 0 7766 1 0x00000001 [54109.644515] c0 1162 Call trace: [54109.644524] c0 1162 __switch_to+0x138/0x158 [54109.644536] c0 1162 __schedule+0xba4/0x1434 [54109.644546] c0 1162 schedule_preempt_disabled+0x80/0xb0 [54109.644555] c0 1162 __mutex_lock+0x3a8/0x7f0 [54109.644563] c0 1162 __mutex_lock_slowpath+0x14/0x20 [54109.644575] c0 1162 thermal_zone_get_temp+0x84/0x360 [54109.644586] c0 1162 temp_show+0x30/0x78 [54109.644609] c0 1162 dev_attr_show+0x5c/0xf0 [54109.644628] c0 1162 sysfs_kf_seq_show+0xcc/0x1a4 [54109.644636] c0 1162 kernfs_seq_show+0x48/0x88 [54109.644656] c0 1162 seq_read+0x1f4/0x73c [54109.644664] c0 1162 kernfs_fop_read+0x84/0x318 [54109.644683] c0 1162 __vfs_read+0x50/0x1bc [54109.644692] c0 1162 vfs_read+0xa4/0x140 [54109.644701] c0 1162 SyS_read+0xbc/0x144 [54109.644708] c0 1162 el0_svc_naked+0x34/0x38 [54109.845800] c0 1162 D 720.000s 1->7766->7766 cat [panic] Fixes: 1851799e1d29 ("thermal: Fix use-after-free when unregistering thermal zone device") Cc: stable@vger.kernel.org Signed-off-by: Wei Wang <wvw@google.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2019-11-13 03:42:23 +07:00
cancel_delayed_work_sync(&tz->poll_queue);
thermal_set_governor(tz, NULL);
thermal_remove_hwmon_sysfs(tz);
ida_simple_remove(&thermal_tz_ida, tz->id);
ida_destroy(&tz->ida);
mutex_destroy(&tz->lock);
device_unregister(&tz->device);
thermal_notify_tz_delete(tz_id);
}
EXPORT_SYMBOL_GPL(thermal_zone_device_unregister);
/**
* thermal_zone_get_zone_by_name() - search for a zone and returns its ref
* @name: thermal zone name to fetch the temperature
*
* When only one zone is found with the passed name, returns a reference to it.
*
* Return: On success returns a reference to an unique thermal zone with
* matching name equals to @name, an ERR_PTR otherwise (-EINVAL for invalid
* paramenters, -ENODEV for not found and -EEXIST for multiple matches).
*/
struct thermal_zone_device *thermal_zone_get_zone_by_name(const char *name)
{
struct thermal_zone_device *pos = NULL, *ref = ERR_PTR(-EINVAL);
unsigned int found = 0;
if (!name)
goto exit;
mutex_lock(&thermal_list_lock);
list_for_each_entry(pos, &thermal_tz_list, node)
if (!strncasecmp(name, pos->type, THERMAL_NAME_LENGTH)) {
found++;
ref = pos;
}
mutex_unlock(&thermal_list_lock);
/* nothing has been found, thus an error code for it */
if (found == 0)
ref = ERR_PTR(-ENODEV);
else if (found > 1)
/* Success only when an unique zone is found */
ref = ERR_PTR(-EEXIST);
exit:
return ref;
}
EXPORT_SYMBOL_GPL(thermal_zone_get_zone_by_name);
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
static int thermal_pm_notify(struct notifier_block *nb,
unsigned long mode, void *_unused)
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
{
struct thermal_zone_device *tz;
switch (mode) {
case PM_HIBERNATION_PREPARE:
case PM_RESTORE_PREPARE:
case PM_SUSPEND_PREPARE:
atomic_set(&in_suspend, 1);
break;
case PM_POST_HIBERNATION:
case PM_POST_RESTORE:
case PM_POST_SUSPEND:
atomic_set(&in_suspend, 0);
list_for_each_entry(tz, &thermal_tz_list, node) {
thermal: Use mode helpers in drivers Use thermal_zone_device_{en|dis}able() and thermal_zone_device_is_enabled(). Consequently, all set_mode() implementations in drivers: - can stop modifying tzd's "mode" member, - shall stop taking tzd's lock, as it is taken in the helpers - shall stop calling thermal_zone_device_update() as it is called in the helpers - can assume they are called when the mode truly changes, so checks to verify that can be dropped Not providing set_mode() by a driver no longer prevents the core from being able to set tzd's mode, so the relevant check in mode_store() is removed. Other comments: - acpi/thermal.c: tz->thermal_zone->mode will be updated only after we return from set_mode(), so use function parameter in thermal_set_mode() instead, no need to call acpi_thermal_check() in set_mode() - thermal/imx_thermal.c: regmap writes and mode assignment are done in thermal_zone_device_{en|dis}able() and set_mode() callback - thermal/intel/intel_quark_dts_thermal.c: soc_dts_{en|dis}able() are a part of set_mode() callback, so they don't need to modify tzd->mode, and don't need to fall back to the opposite mode if unsuccessful, as the return value will be propagated to thermal_zone_device_{en|dis}able() and ultimately tzd's member will not be changed in thermal_zone_device_set_mode(). - thermal/of-thermal.c: no need to set zone->mode to DISABLED in of_parse_thermal_zones() as a tzd is kzalloc'ed so mode is DISABLED anyway Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@collabora.com> [for acerhdf] Acked-by: Peter Kaestle <peter@piie.net> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20200629122925.21729-8-andrzej.p@collabora.com
2020-06-29 19:29:21 +07:00
if (!thermal_zone_device_is_enabled(tz))
continue;
thermal_zone_device_init(tz);
thermal_zone_device_update(tz,
THERMAL_EVENT_UNSPECIFIED);
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
}
break;
default:
break;
}
return 0;
}
static struct notifier_block thermal_pm_nb = {
.notifier_call = thermal_pm_notify,
};
static int __init thermal_init(void)
{
int result;
result = thermal_netlink_init();
if (result)
goto error;
mutex_init(&poweroff_lock);
result = thermal_register_governors();
if (result)
goto error;
result = class_register(&thermal_class);
if (result)
goto unregister_governors;
result = of_parse_thermal_zones();
if (result)
goto unregister_class;
Thermal: handle thermal zone device properly during system sleep Current thermal code does not handle system sleep well because 1. the cooling device cooling state may be changed during suspend 2. the previous temperature reading becomes invalid after resumed because it is got before system sleep 3. updating thermal zone device during suspending/resuming is wrong because some devices may have already been suspended or may have not been resumed. Thus, the proper way to do this is to cancel all thermal zone device update requirements during suspend/resume, and after all the devices have been resumed, reset and update every registered thermal zone devices. This also fixes a regression introduced by: Commit 19593a1fb1f6 ("ACPI / fan: convert to platform driver") Because, with above commit applied, all the fan devices are attached to the acpi_general_pm_domain, and they are turned on by the pm_domain automatically after resume, without the awareness of thermal core. CC: <stable@vger.kernel.org> #3.18+ Reference: https://bugzilla.kernel.org/show_bug.cgi?id=78201 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=91411 Tested-by: Manuel Krause <manuelkrause@netscape.net> Tested-by: szegad <szegadlo@poczta.onet.pl> Tested-by: prash <prash.n.rao@gmail.com> Tested-by: amish <ammdispose-arch@yahoo.com> Tested-by: Matthias <morpheusxyz123@yahoo.de> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
2015-10-30 15:31:58 +07:00
result = register_pm_notifier(&thermal_pm_nb);
if (result)
pr_warn("Thermal: Can not register suspend notifier, return %d\n",
result);
return 0;
unregister_class:
class_unregister(&thermal_class);
unregister_governors:
thermal_unregister_governors();
error:
ida_destroy(&thermal_tz_ida);
ida_destroy(&thermal_cdev_ida);
mutex_destroy(&thermal_list_lock);
mutex_destroy(&thermal_governor_lock);
mutex_destroy(&poweroff_lock);
return result;
}
postcore_initcall(thermal_init);