Commit Graph

908 Commits

Author SHA1 Message Date
Rafael J. Wysocki
8a537ece3d PM / wakeup: Integrate mechanism to abort transitions in progress
The system wakeup framework is not very consistent with respect to
the way it handles suspend-to-idle and generally wakeup events
occurring during transitions to system low-power states.

First off, system transitions in progress are aborted by the event
reporting helpers like pm_wakeup_event() only if the wakeup_count
sysfs attribute is in use (as documented), but there are cases in
which system-wide transitions should be aborted even if that is
not the case.  For example, a wakeup signal from a designated
wakeup device during system-wide PM transition, it should cause
the transition to be aborted right away.

Moreover, there is a freeze_wake() call in wakeup_source_activate(),
but that really is only effective after suspend_freeze_state has
been set to FREEZE_STATE_ENTER by freeze_enter().  However, it
is very unlikely that wakeup_source_activate() will ever be called
at that time, as it could only be triggered by a IRQF_NO_SUSPEND
interrupt handler, so wakeups from suspend-to-idle don't really
occur in wakeup_source_activate().

At the same time there is a way to abort a system suspend in
progress (or wake up the system from suspend-to-idle), which is by
calling pm_system_wakeup(), but in turn that doesn't cause any
wakeup source objects to be activated, so it will not be covered
by wakeup source statistics and will not prevent the system from
suspending again immediately (in case autosleep is used, for
example).  Consequently, if anyone wants to abort system transitions
in progress and allow the wakeup_count mechanism to work, they need
to use both pm_system_wakeup() and pm_wakeup_event(), say, at the
same time which is awkward.

For the above reasons, make it possible to trigger
pm_system_wakeup() from within wakeup_source_activate() and
provide a new pm_wakeup_hard_event() helper to do so within the
wakeup framework.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-05-05 22:54:28 +02:00
Linus Torvalds
1827adb11a Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull sched.h split-up from Ingo Molnar:
 "The point of these changes is to significantly reduce the
  <linux/sched.h> header footprint, to speed up the kernel build and to
  have a cleaner header structure.

  After these changes the new <linux/sched.h>'s typical preprocessed
  size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K
  lines), which is around 40% faster to build on typical configs.

  Not much changed from the last version (-v2) posted three weeks ago: I
  eliminated quirks, backmerged fixes plus I rebased it to an upstream
  SHA1 from yesterday that includes most changes queued up in -next plus
  all sched.h changes that were pending from Andrew.

  I've re-tested the series both on x86 and on cross-arch defconfigs,
  and did a bisectability test at a number of random points.

  I tried to test as many build configurations as possible, but some
  build breakage is probably still left - but it should be mostly
  limited to architectures that have no cross-compiler binaries
  available on kernel.org, and non-default configurations"

* 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits)
  sched/headers: Clean up <linux/sched.h>
  sched/headers: Remove #ifdefs from <linux/sched.h>
  sched/headers: Remove the <linux/topology.h> include from <linux/sched.h>
  sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h>
  sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h>
  sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h>
  sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/init.h>
  sched/core: Remove unused prefetch_stack()
  sched/headers: Remove <linux/rculist.h> from <linux/sched.h>
  sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h>
  sched/headers: Remove <linux/signal.h> from <linux/sched.h>
  sched/headers: Remove <linux/rwsem.h> from <linux/sched.h>
  sched/headers: Remove the runqueue_is_locked() prototype
  sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h>
  sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h>
  sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h>
  sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h>
  ...
2017-03-03 10:16:38 -08:00
Rafael J. Wysocki
9b5e9cb164 Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-sleep'
* pm-cpuidle:
  intel_idle: stop exposing platform acronyms in sysfs
  cpuidle: menu: Avoid taking spinlock for accessing QoS values

* pm-cpufreq:
  cpufreq: intel_pstate: Fix limits issue with operation mode switching
  cpufreq: qoriq: clean up unused code

* pm-sleep:
  PM / hibernate: Define pr_fmt() and use pr_*() instead of printk()
  PM / hibernate: Untangle power_down()
2017-03-03 00:43:11 +01:00
Rafael J. Wysocki
21ff03c484 Merge branches 'pm-core', 'pm-qos', 'pm-domains' and 'pm-opp'
* pm-core:
  PM / runtime: Fix some typos

* pm-qos:
  PM / QoS: Remove global notifiers

* pm-domains:
  PM / Domains: Power off masters immediately in the power off sequence
  PM / Domains: Rename is_async to one_dev_on for genpd_power_off()
  PM / Domains: Move genpd_power_off() above genpd_power_on()

* pm-opp:
  PM / OPP: Documentation: Fix opp-microvolt in examples
  PM / OPP: fix off-by-one bug in dev_pm_opp_get_max_volt_latency loop
2017-03-03 00:34:44 +01:00
Ingo Molnar
b17b01533b sched/headers: Prepare for new header dependencies before moving code to <linux/sched/debug.h>
We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/debug.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:34 +01:00
Ingo Molnar
5b3cc15aff sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h>
Update the .c files that depend on these APIs.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:33 +01:00
Ingo Molnar
174cd4b1e5 sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
Fix up affected files that include this signal functionality via sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:32 +01:00
Rafael J. Wysocki
6dbf5cea05 cpuidle: menu: Avoid taking spinlock for accessing QoS values
After commit 9908859aca (cpuidle/menu: add per CPU PM QoS resume
latency consideration) the cpuidle menu governor calls
dev_pm_qos_read_value() on CPU devices to read the current resume
latency QoS constraint values for them.  That function takes a spinlock
to prevent the device's power.qos pointer from becoming NULL during
the access which is a problem for the RT patchset where spinlocks are
converted into mutexes and the idle loop stops working.

However, it is not even necessary for the menu governor to take
that spinlock, because the power.qos pointer accessed under it
cannot be modified during the access anyway.

For this reason, introduce a "raw" routine for accessing device
QoS resume latency constraints without locking and use it in the
menu governor.

Fixes: 9908859aca (cpuidle/menu: add per CPU PM QoS resume latency consideration)
Acked-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-27 15:07:38 +01:00
Viresh Kumar
d08d1b27fe PM / QoS: Remove global notifiers
They were never used in the kernel, so get rid of them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-23 23:05:58 +01:00
Andrzej Hajda
8cc311167c PM / OPP: fix off-by-one bug in dev_pm_opp_get_max_volt_latency loop
Reading array at given index before checking if index is valid results in
illegal memory access.

The bug was detected using KASAN framework.

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-23 23:00:31 +01:00
Ulf Hansson
2da835452a PM / Domains: Power off masters immediately in the power off sequence
Once a subdomain is powered off, genpd queues a power off work for each of
the subdomain's corresponding masters, thus postponing the masters to be
powered off to a later point.

When genpd used intermediate power off states, which was removed in
commit ba2bbfbf63 ("PM / Domains: Remove intermediate states from the
power off sequence"), this behaviour made sense, but now it simply doesn't.

Genpd can easily try to power off the masters in the same context as the
subdomain, of course by acquiring/releasing the lock. Then, let's convert
to this behaviour, as it avoids unnecessary works from being queued.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-23 22:25:46 +01:00
Ulf Hansson
3c64649d1c PM / Domains: Rename is_async to one_dev_on for genpd_power_off()
The parameter name is_async, for genpd_power_off() gives a poor description
of its purpose. To clarify, let's rename it to one_dev_on and update the
documentation of it in the function header.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-23 22:25:46 +01:00
Ulf Hansson
1f8728b7ad PM / Domains: Move genpd_power_off() above genpd_power_on()
Following changes in genpd_power_on() makes it invoke genpd_power_off().
To enable these changes and avoiding to declare genpd_power_off(), let's
move its implementation above genpd_power_on(). In this way, following
changes should become easier to review.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-23 22:25:45 +01:00
Rafael J. Wysocki
58339feae7 Merge branches 'pm-core', 'pm-qos' and 'pm-domains'
* pm-core:
  PM / wakeirq: report a wakeup_event on dedicated wekup irq
  PM / wakeirq: Fix spurious wake-up events for dedicated wakeirqs
  PM / wakeirq: Enable dedicated wakeirq for suspend

* pm-qos:
  PM / QoS: Fix memory leak on resume_latency.notifiers
  PM / QoS: Remove unneeded linux/miscdevice.h include

* pm-domains:
  PM / Domains: Provide dummy governors if CONFIG_PM_GENERIC_DOMAINS=n
  PM / Domains: Fix asynchronous execution of *noirq() callbacks
  PM / Domains: Correct comment in irq_safe_dev_in_no_sleep_domain()
  PM / Domains: Rename functions in genpd for power on/off
2017-02-20 14:26:02 +01:00
Rafael J. Wysocki
64f758a07a Merge branch 'pm-opp'
* pm-opp: (24 commits)
  PM / OPP: Expose _of_get_opp_desc_node as dev_pm_opp API
  PM / OPP: Make _find_opp_table_unlocked() static
  PM / OPP: Update Documentation to remove RCU specific bits
  PM / OPP: Simplify dev_pm_opp_get_max_volt_latency()
  PM / OPP: Simplify _opp_set_availability()
  PM / OPP: Move away from RCU locking
  PM / OPP: Take kref from _find_opp_table()
  PM / OPP: Update OPP users to put reference
  PM / OPP: Add 'struct kref' to struct dev_pm_opp
  PM / OPP: Use dev_pm_opp_get_opp_table() instead of _add_opp_table()
  PM / OPP: Take reference of the OPP table while adding/removing OPPs
  PM / OPP: Return opp_table from dev_pm_opp_set_*() routines
  PM / OPP: Add 'struct kref' to OPP table
  PM / OPP: Add per OPP table mutex
  PM / OPP: Split out part of _add_opp_table() and _remove_opp_table()
  PM / OPP: Don't expose srcu_head to register notifiers
  PM / OPP: Rename dev_pm_opp_get_suspend_opp() and return OPP rate
  PM / OPP: Don't allocate OPP table from _opp_allocate()
  PM / OPP: Rename and split _dev_pm_opp_remove_table()
  PM / OPP: Add light weight _opp_free() routine
  ...
2017-02-20 14:22:50 +01:00
John Keeping
e84b4a84e5 PM / QoS: Fix memory leak on resume_latency.notifiers
Since commit 2d984ad132 (PM / QoS: Introcuce latency tolerance device
PM QoS type) we reassign "c" to point at qos->latency_tolerance before
freeing c->notifiers, but the notifiers field of latency_tolerance is
never used.

Restore the original behaviour of freeing the notifiers pointer on
qos->resume_latency, which is used, and fix the following kmemleak
warning.

unreferenced object 0xed9dba00 (size 64):
  comm "kworker/0:1", pid 36, jiffies 4294670128 (age 15202.983s)
  hex dump (first 32 bytes):
    00 00 00 00 04 ba 9d ed 04 ba 9d ed 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<c06f6084>] kmemleak_alloc+0x74/0xb8
    [<c011c964>] kmem_cache_alloc_trace+0x170/0x25c
    [<c035f448>] dev_pm_qos_constraints_allocate+0x3c/0xe4
    [<c035f574>] __dev_pm_qos_add_request+0x84/0x1a0
    [<c035f6cc>] dev_pm_qos_add_request+0x3c/0x54
    [<c03c3fc4>] usb_hub_create_port_device+0x110/0x2b8
    [<c03b2a60>] hub_probe+0xadc/0xc80
    [<c03bb050>] usb_probe_interface+0x1b4/0x260
    [<c035773c>] driver_probe_device+0x198/0x40c
    [<c0357b14>] __device_attach_driver+0x8c/0x98
    [<c0355bbc>] bus_for_each_drv+0x8c/0x9c
    [<c0357494>] __device_attach+0x98/0x138
    [<c0357c64>] device_initial_probe+0x14/0x18
    [<c03569dc>] bus_probe_device+0x30/0x88
    [<c0354c54>] device_add+0x430/0x554
    [<c03b92d8>] usb_set_configuration+0x660/0x6fc

Fixes: 2d984ad132 (PM / QoS: Introcuce latency tolerance device PM QoS type)
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-18 02:20:53 +01:00
Grygorii Strashko
09bb6e9395 PM / wakeirq: report a wakeup_event on dedicated wekup irq
There are two reasons for reporting wakeup event when dedicated wakeup
IRQ is triggered:

- wakeup events accounting, so proper statistical data will be
  displayed in sysfs and debugfs;

- there are small window when System is entering suspend during which
  dedicated wakeup IRQ can be lost:

dpm_suspend_noirq()
  |- device_wakeup_arm_wake_irqs()
      |- dev_pm_arm_wake_irq(X)
         |- IRQ is enabled and marked as wakeup source
[1]...
  |- suspend_device_irqs()
     |- suspend_device_irq(X)
	|- irqd_set(X, IRQD_WAKEUP_ARMED);
	   |- wakup IRQ armed

The wakeup IRQ can be lost if it's triggered at point [1]
and not armed yet.

Hence, fix above cases by adding simple pm_wakeup_event() call in
handle_threaded_wake_irq().

Fixes: 4990d4fe32 (PM / Wakeirq: Add automated device wake IRQ handling)
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
[ tony@atomide.com: added missing return to avoid warnings ]
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-13 22:29:30 +01:00
Grygorii Strashko
0bf0ee8ef9 PM / wakeirq: Fix spurious wake-up events for dedicated wakeirqs
Dedicated wakeirq is a one time event to wake-up the system from
low-power state and then call pm_runtime_resume() on the device wired
with the dedicated wakeirq.

Sometimes dedicated wakeirqs can get deferred if they trigger after we
call disable_irq_nosync() in dev_pm_disable_wake_irq(). This can happen
if pm_runtime_get() is called around the same time a wakeirq fires.

If an interrupt fires after disable_irq_nosync(), by default it will get
tagged with IRQS_PENDING and will run later on when the interrupt is
enabled again.

Deferred wakeirqs usually just produce pointless wake-up events. But they
can also cause suspend to fail if the deferred wakeirq fires during
dpm_suspend_noirq() for example. So we really don't want to see the
deferred wakeirqs triggering after the device has resumed.

Let's fix the issue by setting IRQ_DISABLE_UNLAZY flag for the dedicated
wakeirqs. The other option would be to implement irq_disable() in the
dedicated wakeirq controller, but that's not a generic solution.

For reference below is what happens with a IRQ_TYPE_EDGE_BOTH IRQ
type wakeirq:

- resume by dedicated IRQ (EDGE_FALLING)
 - suspend_enter()
  ....
 - arch_suspend_enable_irqs()
   |- dedicated IRQ armed and fired
   |- irq_pm_check_wakeup()
      |- disarm, disable IRQ and mark as IRQS_PENDING
  ....
 - dpm_resume_noirq()
   |- resume_device_irqs()
      |- __enable_irq()
         |- check_irq_resend()
            |- handle_threaded_wake_irq()
 	       |- dedicated IRQ processed
   |- device_wakeup_disarm_wake_irqs()
      |- disable_irq_wake()
  ....
 !-> dedicated IRQ (EDGE_RISING)
     -| handle_edge_irq()
        |- IRQ disabled: mask_ack_irq and mark as IRQS_PENDING
  ....
- subsequent suspend
  ....
  |- dpm_suspend_noirq()
     |- device_wakeup_arm_wake_irqs()
        |- __enable_irq()
           |- check_irq_resend()
(a)           |- handle_threaded_wake_irq()
                 |- pm_wakeup_event() --> abort suspend
  ....
     |- suspend_device_irqs()
        |- suspend_device_irq()
           |-  dedicated IRQ armed
  ....
(b)  |- resend_irqs
        |- irq_pm_check_wakeup()
           |- IRQ armed -> abort suspend

because of pending IRQ System suspend can be aborted at points
(a)-not armed or (b)-armed.

Fixes: 4990d4fe32 (PM / Wakeirq: Add automated device wake IRQ handling)
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[ tony@atomide.com: added a comment, updated the description ]
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-13 22:29:30 +01:00
Grygorii Strashko
c843455975 PM / wakeirq: Enable dedicated wakeirq for suspend
We currently rely on runtime PM to enable dedicated wakeirq for suspend.
This assumption fails in the following two cases:

1. If the consumer driver does not have runtime PM implemented, the
   dedicated wakeirq never gets enabled for suspend

2. If the consumer driver has runtime PM implemented, but does not idle
   in suspend

Let's fix the issue by always enabling the dedicated wakeirq during
suspend.

Depends-on: bed570307e (PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend)
Fixes: 4990d4fe32 (PM / Wakeirq: Add automated device wake IRQ handling)
Reported-by: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
[ tony@atomide.com: updated based on bed570307e, added description ]
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-13 22:29:29 +01:00
Dave Gerlach
0764c604c8 PM / OPP: Expose _of_get_opp_desc_node as dev_pm_opp API
Rename _of_get_opp_desc_node to dev_pm_opp_of_get_opp_desc_node and add it
to include/linux/pm_opp.h to allow other drivers, such as platform OPP
and cpufreq drivers, to make use of it.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-09 22:52:17 +01:00
Ulf Hansson
0883ac038b PM / Domains: Fix asynchronous execution of *noirq() callbacks
As the PM core may invoke the *noirq() callbacks asynchronously, the
current lock-less approach in genpd doesn't work. The consequence is that
we may find concurrent operations racing to power on/off the PM domain.

As of now, no immediate errors has been reported, but it's probably only a
matter time. Therefor let's fix the problem now before this becomes a real
issue, by deploying the locking scheme to the relevant functions.

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-09 01:01:26 +01:00
Ulf Hansson
f3c826ac26 PM / Domains: Correct comment in irq_safe_dev_in_no_sleep_domain()
The earlier comment stated that the dev_warn_once() was going to be printed
once per device. Let's fix that, as dev_warn_once() is printed only once,
no matter of the device.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-08 13:31:49 +01:00
Wei Yongjun
6ac4239733 PM / OPP: Make _find_opp_table_unlocked() static
Fixes the following sparse warning:

drivers/base/power/opp/core.c:49:18: warning:
 symbol '_find_opp_table_unlocked' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-07 13:39:25 +01:00
Rafael J. Wysocki
a9306a6363 PM / runtime: Avoid false-positive warnings from might_sleep_if()
The might_sleep_if() assertions in __pm_runtime_idle(),
__pm_runtime_suspend() and __pm_runtime_resume() may generate
false-positive warnings in some situations.  For example, that
happens if a nested pm_runtime_get_sync()/pm_runtime_put() pair
is executed with disabled interrupts within an outer
pm_runtime_get_sync()/pm_runtime_put() section for the same device.
[Generally, pm_runtime_get_sync() may sleep, so it should not be
called with disabled interrupts, but in this particular case the
previous pm_runtime_get_sync() guarantees that the device will not
be suspended, so the inner pm_runtime_get_sync() will return
immediately after incrementing the device's usage counter.]

That started to happen in the i915 driver in 4.10-rc, leading to
the following splat:

 BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1032
 in_atomic(): 1, irqs_disabled(): 0, pid: 1500, name: Xorg
 1 lock held by Xorg/1500:
  #0:  (&dev->struct_mutex){+.+.+.}, at:
  [<ffffffffa0680c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
 CPU: 0 PID: 1500 Comm: Xorg Not tainted
 Call Trace:
  dump_stack+0x85/0xc2
  ___might_sleep+0x196/0x260
  __might_sleep+0x53/0xb0
  __pm_runtime_resume+0x7a/0x90
  intel_runtime_pm_get+0x25/0x90 [i915]
  aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
  i915_vma_bind+0xaf/0x1e0 [i915]
  i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
  i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
  ? trace_hardirqs_on+0xd/0x10
  ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
  ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
  i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
  ? __might_fault+0x4e/0xb0
  i915_gem_execbuffer2+0xc5/0x260 [i915]
  ? __might_fault+0x4e/0xb0
  drm_ioctl+0x206/0x450 [drm]
  ? i915_gem_execbuffer+0x340/0x340 [i915]
  ? __fget+0x5/0x200
  do_vfs_ioctl+0x91/0x6f0
  ? __fget+0x111/0x200
  ? __fget+0x5/0x200
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc6

even though the code triggering it is correct.

Unfortunately, the might_sleep_if() assertions in question are
too coarse-grained to cover such cases correctly, so make them
a bit less sensitive in order to avoid the false-positives.

Reported-and-tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-02-04 00:44:36 +01:00
Viresh Kumar
cdd3e614cf PM / OPP: Simplify dev_pm_opp_get_max_volt_latency()
dev_pm_opp_get_max_volt_latency() calls _find_opp_table() two times
effectively.

Merge _get_regulator_count() into dev_pm_opp_get_max_volt_latency() to
avoid that.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:22 +01:00
Viresh Kumar
a7f3987ea1 PM / OPP: Simplify _opp_set_availability()
As we don't use RCU locking anymore, there is no need to replace an
earlier OPP node with a new one. Just update the existing one.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:22 +01:00
Viresh Kumar
052c6f1914 PM / OPP: Move away from RCU locking
The RCU locking isn't well suited for the OPP core. The RCU locking fits
better for reader heavy stuff, while the OPP core have at max one or two
readers only at a time.

Over that, it was getting very confusing the way RCU locking was used
with the OPP core. The individual OPPs are mostly well handled, i.e. for
an update a new structure was created and then that replaced the older
one. But the OPP tables were updated directly all the time from various
parts of the core. Though they were mostly used from within RCU locked
region, they didn't had much to do with RCU and were governed by the
mutex instead.

And that mixed with the 'opp_table_lock' has made the core even more
confusing.

Now that we are already managing the OPPs and the OPP tables with kernel
reference infrastructure, we can get rid of RCU locking completely and
simplify the code a lot.

Remove all RCU references from code and comments.

Acquire opp_table->lock while parsing the list of OPPs though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:22 +01:00
Viresh Kumar
5b650b3888 PM / OPP: Take kref from _find_opp_table()
Take reference of the OPP table from within _find_opp_table(). Also
update the callers of _find_opp_table() to call
dev_pm_opp_put_opp_table() after they have used the OPP table.

Note that _find_opp_table() increments the reference under the
opp_table_lock.

Now that the OPP table wouldn't get freed until the callers of
_find_opp_table() call dev_pm_opp_put_opp_table(), there is no need to
take the opp_table_lock or rcu_read_lock() around it. Drop them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:22 +01:00
Viresh Kumar
8a31d9d942 PM / OPP: Update OPP users to put reference
This patch updates dev_pm_opp_find_freq_*() routines to get a reference
to the OPPs returned by them.

Also updates the users of dev_pm_opp_find_freq_*() routines to call
dev_pm_opp_put() after they are done using the OPPs.

As it is guaranteed the that OPPs wouldn't get freed while being used,
the RCU read side locking present with the users isn't required anymore.
Drop it as well.

This patch also updates all users of devfreq_recommended_opp() which was
returning an OPP received from the OPP core.

Note that some of the OPP core routines have gained
rcu_read_{lock|unlock}() calls, as those still use RCU specific APIs
within them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> [Devfreq]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Viresh Kumar
7034764a1e PM / OPP: Add 'struct kref' to struct dev_pm_opp
Add kref to struct dev_pm_opp for easier accounting of the OPPs.

Note that the OPPs are freed under the opp_table->lock mutex only.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Viresh Kumar
b83c1899a0 PM / OPP: Use dev_pm_opp_get_opp_table() instead of _add_opp_table()
Migrate all users of _add_opp_table() to use dev_pm_opp_get_opp_table()
to guarantee that the OPP table doesn't get freed while being used.

Also update _managed_opp() to get the reference to the OPP table.

Now that the OPP table wouldn't get freed while these routines are
executing after dev_pm_opp_get_opp_table() is called, there is no need
to take opp_table_lock. Drop them as well.

Now that _add_opp_table(), _remove_opp_table() and the unlocked release
routines aren't used anymore, remove them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Viresh Kumar
31641cda53 PM / OPP: Take reference of the OPP table while adding/removing OPPs
Take reference of the OPP table while adding and removing OPPs, that
helps us remove special checks in _remove_opp_table().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Viresh Kumar
fa30184d19 PM / OPP: Return opp_table from dev_pm_opp_set_*() routines
Now that we have proper kernel reference infrastructure in place for OPP
tables, use it to guarantee that the OPP table isn't freed while being
used by the callers of dev_pm_opp_set_*() APIs.

Make them all return the pointer to the OPP table after taking its
reference and put the reference back with dev_pm_opp_put_*() APIs.

Now that the OPP table wouldn't get freed while these routines are
executing after dev_pm_opp_get_opp_table() is called, there is no need
to take opp_table_lock. Drop them as well.

Remove the rcu specific comments from these routines as they aren't
relevant anymore.

Note that prototypes of dev_pm_opp_{set|put}_regulators() were already
updated by another patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:21 +01:00
Viresh Kumar
f067a982ce PM / OPP: Add 'struct kref' to OPP table
Add kref to struct opp_table for easier accounting of the OPP table.

Note that the new routine dev_pm_opp_get_opp_table() takes the reference
from under the opp_table_lock, which guarantees that the OPP table
doesn't get freed unless dev_pm_opp_put_opp_table() is called for the
OPP table.

Two separate release mechanisms are added: locked and unlocked. In
unlocked version the routines aren't required to take/drop
opp_table_lock as the callers have already done that. This is required
to avoid breaking git bisect, otherwise we may get lockdeps between
commits. Once all the users of OPP table are updated the unlocked
version shall be removed.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:20 +01:00
Viresh Kumar
37a73ec0c9 PM / OPP: Add per OPP table mutex
Add per OPP table lock to protect opp_table->opp_list.

Note that at few places opp_list is used under the rcu_read_lock() and
so a mutex can't be added there for now. This will be fixed by a later
patch.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-30 09:22:20 +01:00
Viresh Kumar
b6160e2693 PM / OPP: Split out part of _add_opp_table() and _remove_opp_table()
Split out parts of _add_opp_table() and _remove_opp_table() into
separate routines. This improves readability as well.

Should result in no functional changes.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:10 +01:00
Viresh Kumar
dc2c9ad52a PM / OPP: Don't expose srcu_head to register notifiers
Let the OPP core provide helpers to register notifiers for any device,
instead of exposing srcu_head outside of the core.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
3aa26a3b2e PM / OPP: Rename dev_pm_opp_get_suspend_opp() and return OPP rate
There is only one user of dev_pm_opp_get_suspend_opp() and that uses it
to get the OPP rate for the suspend_opp.

Rename dev_pm_opp_get_suspend_opp() as dev_pm_opp_get_suspend_opp_freq()
and return the rate directly from it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
8cd2f6e8f3 PM / OPP: Don't allocate OPP table from _opp_allocate()
There is no point in trying to find/allocate the table for every OPP
that is added for a device. It would be far more efficient to allocate
the table only once and pass its pointer to the routines that add the
OPP entry.

Locking is removed from _opp_add_static_v2() and _opp_add_v1() now as
the callers call them with that lock already held.

Call to _remove_opp_table() routine is also removed from _opp_free()
now, as opp_table isn't allocated from within _opp_allocate(). This is
handled by the routines which created the OPP table in the first place.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
9274c89243 PM / OPP: Rename and split _dev_pm_opp_remove_table()
Later patches would want to remove OPP table (and its OPPs) using the
opp_table pointer instead of 'dev'.

In order to prepare for that, rename _dev_pm_opp_remove_table() as
_dev_pm_opp_find_and_remove_table() split out part of it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
969fceb3c7 PM / OPP: Add light weight _opp_free() routine
The OPPs which are never successfully added using _opp_add() are not
required to be freed with the _opp_remove() routine, as a simple kfree()
is enough for them.

Introduce a new light weight routine _opp_free(), which will do that.

That also helps us removing the 'notify' parameter to _opp_remove(),
which isn't required anymore.

Note that _opp_free() contains a call to _remove_opp_table() as the OPP
table might have been added for this very OPP only. The
_remove_opp_table() routine returns quickly if there are more OPPs in
the table. This will be simplified in later patches though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
04a86a84c4 PM / OPP: Error out on failing to add static OPPs for v1 bindings
The code adding static OPPs for V2 bindings already does so. Make the V1
bindings specific code behave the same.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:09 +01:00
Viresh Kumar
63a69ea4b8 PM / OPP: Rename _allocate_opp() to _opp_allocate()
Make the naming consistent with how other routines are named.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:08 +01:00
Viresh Kumar
1715371ddd PM / OPP: Remove useless TODO
This TODO doesn't make sense anymore as we have all the information in a
single OPP table. Remove it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:08 +01:00
Viresh Kumar
7f8538ebae PM / OPP: Fix memory leak while adding duplicate OPPs
There are two types of duplicate OPPs that get different behavior from
the core:
A) An earlier OPP is marked 'available' and has same freq/voltages as
   the new one.
B) An earlier OPP with same frequency, but is marked 'unavailable' OR
   doesn't have same voltages as the new one.

The OPP core returns 0 for the first one, but -EEXIST for the second.

While the OPP core returns 0 for the first case, its callers don't free
the newly allocated OPP structure which isn't used anymore. Fix that by
returning -EBUSY instead of 0, but make the callers return 0 eventually.

As this isn't a critical fix, its not getting marked for stable kernel.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 11:49:08 +01:00
Ulf Hansson
86e12eac1f PM / Domains: Rename functions in genpd for power on/off
Currently the mix of genpd_poweron(), genpd_power_on(),
genpd_sync_poweron() and the ->power_on() callback, makes
a bit difficult to follow the path of execution. The similar
applies to the functions dealing with power off.

In a way to improve this understanding, let's do the following renaming:

genpd_power_on() ->  _genpd_power_on()
genpd_poweron() -> genpd_power_on()
genpd_sync_poweron() -> genpd_sync_power_on()

genpd_power_off() -> _genpd_power_off()
genpd_poweroff() -> genpd_power_off()
genpd_sync_poweroff() -> genpd_sync_power_off()
genpd_poweroff_unused() -> genpd_power_off_unused()

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-27 01:30:31 +01:00
Augusto Mecking Caringi
ab51e6ba00 PM / domains: Fix 'may be used uninitialized' build warning
This patch fixes the following gcc warning:

drivers/base/power/domain.c: In function ‘genpd_runtime_resume’:
drivers/base/power/domain.c:642:14: warning: ‘time_start’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
   elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)

The same problem (in another function in this same file) was fixed in
commit d33d5a6c88 (avoid spurious "may be used uninitialized" warning)

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-31 21:52:07 +01:00
Linus Torvalds
d33d5a6c88 avoid spurious "may be used uninitialized" warning
The timer type simplifications caused a new gcc warning:

  drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’:
  drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
     elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));

despite the actual use of "time_start" not having changed in any way.
It appears that simply changing the type of ktime_t from a union to a
plain scalar type made gcc check the use.

The variable wasn't actually used uninitialized, but gcc apparently
failed to notice that the conditional around the use was exactly the
same as the conditional around the initialization of that variable.

Add an unnecessary initialization just to shut up the compiler.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-25 14:56:58 -08:00
Thomas Gleixner
8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner
2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00