Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc new goodies for 3.17. The short story:
The biggest bit is Michael removing all of pre-POWER4 processor
support from the 64-bit kernel. POWER3 and rs64. This gets rid of a
ton of old cruft that has been bitrotting in a long while. It was
broken for quite a few versions already and nobody noticed. Nobody
uses those machines anymore. While at it, he cleaned up a bunch of
old dusty cabinets, getting rid of a skeletton or two.
Then, we have some base VFIO support for KVM, which allows assigning
of PCI devices to KVM guests, support for large 64-bit BARs on
"powernv" platforms, support for HMI (Hardware Management Interrupts)
on those same platforms, some sparse-vmemmap improvements (for memory
hotplug),
There is the usual batch of Freescale embedded updates (summary in the
merge commit) and fixes here or there, I think that's it for the
highlights"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
powerpc/eeh: Export eeh_iommu_group_to_pe()
powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
powerpc: Reduce scariness of interrupt frames in stack traces
powerpc: start loop at section start of start in vmemmap_populated()
powerpc: implement vmemmap_free()
powerpc: implement vmemmap_remove_mapping() for BOOK3S
powerpc: implement vmemmap_list_free()
powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
powerpc/book3s: handle HMIs for cpus in nap mode.
powerpc/powernv: Invoke opal call to handle hmi.
powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
powerpc/iommu: Fix comments with it_page_shift
powerpc/powernv: Handle compound PE in config accessors
powerpc/powernv: Handle compound PE for EEH
powerpc/powernv: Handle compound PE
powerpc/powernv: Split ioda_eeh_get_state()
powerpc/powernv: Allow to freeze PE
powerpc/powernv: Enable M64 aperatus for PHB3
powerpc/eeh: Aux PE data for error log
...
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for 3.17. It contains:
- misc Cavium Octeon, BCM47xx, BCM63xx and Alchemy updates
- MIPS ptrace updates and cleanups
- various fixes that will also go to -stable
- a number of cleanups and small non-critical fixes.
- NUMA support for the Loongson 3.
- more support for MSA
- support for MAAR
- various FP enhancements and fixes"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (139 commits)
MIPS: jz4740: remove unnecessary null test before debugfs_remove
MIPS: Octeon: remove unnecessary null test before debugfs_remove_recursive
MIPS: ZBOOT: implement stack protector in compressed boot phase
MIPS: mipsreg: remove duplicate MIPS_CONF4_FTLBSETS_SHIFT
MIPS: Bonito64: remove a duplicate define
MIPS: Malta: initialise MAARs
MIPS: Initialise MAARs
MIPS: detect presence of MAARs
MIPS: define MAAR register accessors & bits
MIPS: mark MSA experimental
MIPS: Don't build MSA support unless it can be used
MIPS: consistently clear MSA flags when starting & copying threads
MIPS: 16 byte align MSA vector context
MIPS: disable preemption whilst initialising MSA
MIPS: ensure MSA gets disabled during boot
MIPS: fix read_msa_* & write_msa_* functions on non-MSA toolchains
MIPS: fix MSA context for tasks which don't use FP first
MIPS: init upper 64b of vector registers when MSA is first used
MIPS: save/disable MSA in lose_fpu
MIPS: preserve scalar FP CSR when switching vector context
...
Cpufreq depends on platform firmware to implement PStates. In case of
platform firmware failure, cpufreq should not panic host kernel with
BUG_ON(). Less severe pr_warn() will suffice.
Add firmware_has_feature(FW_FEATURE_OPALv3) check to
skip probing for device-tree on non-powernv platforms.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch is prepared for Multi-chip interconnection. Since each chip
has a ChipConfig register, LOONGSON_CHIPCFG should be an array.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/7185/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
We are calling kobject_move() from two separate places currently and both these
places share another routine update_policy_cpu() which is handling everything
around updating policy->cpu. Moving ownership of policy->kobj also lies under
the role of update_policy_cpu() routine and must be handled from there.
So, Lets move kobject_move() to update_policy_cpu() and get rid of
cpufreq_nominate_new_policy_cpu() as it doesn't have anything significant left.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We are returning -EINVAL instead of the error returned from kobject_move() when
it fails. Propagate the actual error number.
Also add a meaningful print when sysfs_create_link() fails.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While hot-unplugging policy->cpu, we call cpufreq_nominate_new_policy_cpu() to
nominate next owner of policy, i.e. policy->cpu. If we fail to move policy
kobject under the new policy->cpu, we try to update policy->cpus with the old
policy->cpu.
This would have been required in case old-CPU is removed from policy->cpus in
the first place. But its not done before calling
cpufreq_nominate_new_policy_cpu(), but during the POST_DEAD notification which
happens quite late in the hot-unplugging path.
So, this is just some useless code hanging around, get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There exists 350MHz K6-2E+ CPU, so add it to the usual frequency table.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, ondemand calculates the target frequency proportional to load
using the formula:
Target frequency = C * load
where C = policy->cpuinfo.max_freq / 100
Though, in many cases, the minimum available frequency is pretty high and
the above calculation introduces a dead band from load 0 to
100 * policy->cpuinfo.min_freq / policy->cpuinfo.max_freq where the target
frequency is always calculated to less than policy->cpuinfo.min_freq and
the minimum frequency is selected.
For example: on Intel i7-3770 @ 3.4GHz the policy->cpuinfo.min_freq = 1600000
and the policy->cpuinfo.max_freq = 3400000 (without turbo). Thus, the CPU
starts to scale up at a load above 47.
On quad core 1500MHz Krait the policy->cpuinfo.min_freq = 384000
and the policy->cpuinfo.max_freq = 1512000. Thus, the CPU starts to scale
at load above 25.
Change the calculation of target frequency to eliminate the above effect using
the formula:
Target frequency = A + B * load
where A = policy->cpuinfo.min_freq and
B = (policy->cpuinfo.max_freq - policy->cpuinfo->min_freq) / 100
This will map load values 0 to 100 linearly to cpuinfo.min_freq to
cpuinfo.max_freq.
Also, use the CPUFREQ_RELATION_C in __cpufreq_driver_target to select the
closest frequency in frequency_table. This is necessary to avoid selection
of minimum frequency only when load equals to 0. It will also help for selection
of frequencies using a more 'fair' criterion.
Tables below show the difference in selected frequency for specific values
of load without and with this patch. On Intel i7-3770 @ 3.40GHz:
Without With
Load Target Selected Target Selected
0 0 1600000 1600000 1600000
5 170050 1600000 1690050 1700000
10 340100 1600000 1780100 1700000
15 510150 1600000 1870150 1900000
20 680200 1600000 1960200 2000000
25 850250 1600000 2050250 2100000
30 1020300 1600000 2140300 2100000
35 1190350 1600000 2230350 2200000
40 1360400 1600000 2320400 2400000
45 1530450 1600000 2410450 2400000
50 1700500 1900000 2500500 2500000
55 1870550 1900000 2590550 2600000
60 2040600 2100000 2680600 2600000
65 2210650 2400000 2770650 2800000
70 2380700 2400000 2860700 2800000
75 2550750 2600000 2950750 3000000
80 2720800 2800000 3040800 3000000
85 2890850 2900000 3130850 3100000
90 3060900 3100000 3220900 3300000
95 3230950 3300000 3310950 3300000
100 3401000 3401000 3401000 3401000
On ARM quad core 1500MHz Krait:
Without With
Load Target Selected Target Selected
0 0 384000 384000 384000
5 75600 384000 440400 486000
10 151200 384000 496800 486000
15 226800 384000 553200 594000
20 302400 384000 609600 594000
25 378000 384000 666000 702000
30 453600 486000 722400 702000
35 529200 594000 778800 810000
40 604800 702000 835200 810000
45 680400 702000 891600 918000
50 756000 810000 948000 918000
55 831600 918000 1004400 1026000
60 907200 918000 1060800 1026000
65 982800 1026000 1117200 1134000
70 1058400 1134000 1173600 1134000
75 1134000 1134000 1230000 1242000
80 1209600 1242000 1286400 1242000
85 1285200 1350000 1342800 1350000
90 1360800 1458000 1399200 1350000
95 1436400 1458000 1455600 1458000
100 1512000 1512000 1512000 1512000
Tested on Intel i7-3770 CPU @ 3.40GHz and on ARM quad core 1500MHz Krait
(Android smartphone).
Benchmarks on Intel i7 shows a performance improvement on low and medium
work loads with lower power consumption. Specifics:
Phoronix Linux Kernel Compilation 3.1:
Time: -0.40%, energy: -0.07%
Phoronix Apache:
Time: -4.98%, energy: -2.35%
Phoronix FFMPEG:
Time: -6.29%, energy: -4.02%
Also, running mp3 decoding (very low load) shows no differences with and
without this patch.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce CPUFREQ_RELATION_C for frequency selection.
It selects the frequency with the minimum euclidean distance to target.
In case of equal distance between 2 frequencies, it will select the
greater frequency.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PU regulator is not a necessary regulator for cpufreq, not all
i.MX6 SoCs have PU regulator, only if SOC has PU regulator, then its
voltage must be equal to SOC regulator, so remove the dependency
to support i.MX6SX which has no PU regulator.
Signed-off-by: Anson Huang <b20788@freescale.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The specific rounding adds conditionally only 1/256 to fractional
part of core_pct.
We can safely remove it without any noticeable impact in
calculations.
Use div64_u64 instead of div_u64 to avoid possible overflow of
sample->mperf as divisor
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Simplify the code by removing the inline functions pstate_increase and
pstate_decrease and use directly the intel_pstate_set_pstate.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we shift right aperf and mperf variables by FRAC_BITS
to prevent overflow when we convert them to fix point numbers
(shift left by FRAC_BITS).
But this is not necessary, because we actually use delta aperf and mperf
which are much less than APERF and MPERF values.
So, use the unmodified APERF and MPERF values in calculation.
This also adds 8 bits in precision, although the gain is insignificant.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
According to Intel 64 and IA-32 Architectures SDM, Volume 3,
Chapter 14.2, "Software needs to exercise care to avoid delays
between the two RDMSRs (for example interrupts)".
So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF.
This should increase the accuracy of the calculations.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Suppress checkpatch.pl --strict warnings:
CHECK: Alignment should match open parenthesis
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove the unnecessary intermediate assignment and use directly the
pid_params.sample_rate_ms variable.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unnecessary parentheses.
Also, add parentheses in one case for better readability.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We can fit these lines in a single one, under the 80 characters
limit.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
div_s64() accepts the divisor parameter as s32. Helper div_fp()
also accepts divisor as int32_t.
So, remove the unnecessary int64_t type casting.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since we never remove sysfs entry and debugfs files, we can make
the intel_pstate_kobject and debugfs_parent locals.
Also, annotate with __init intel_pstate_sysfs_expose_params()
and intel_pstate_debug_expose_params() in order to be freed
after bootstrap.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.
Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.
Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().
But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.
This bug got introduced with commit 42f921a, which overlooked this scenario.
To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.
Fixes: 42f921a6f1 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com>
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
OPPs can be populated statically, via DT, or added at run time with
dev_pm_opp_add().
While this driver handles the first case correctly, it would fail to populate
OPPs added at runtime. Because call to of_init_opp_table() would fail as there
are no OPPs in DT and probe will return early.
To fix this, remove error checking and call dev_pm_opp_init_cpufreq_table()
unconditionally.
Update bindings as well.
Suggested-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit ff1f0018cf ("drivers: Enable
building of Kirkwood drivers for mach-mvebu") added Kirkwood into
mach-mvebu, adding MACH_KIRKWOOD to ARCH_KIRKWOOD in the KConfig files.
The change for ARM_KIRKWOOD_CPUFREQ replaced ARCH_KIRKWOOD with
MACH_KIRKWOOD, whereas all the other changes were ARCH_KIRKWOOD ||
MACH_KIRKWOOD.
As a consequence of this change, the cpufreq driver is no longer enabled
for ARCH_KIRKWOOD. This patch reinstates ARM_KIRKWOOD_CPUFREQ for
ARCH_KIRKWOOD.
Signed-off-by: Quentin Armitage <quentin@armitage.org.uk>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PM_OPP is a library used by several of the existing cpufreq drivers.
ARM IMX6Q cpufreq driver uses this library for its functionality.
Thus, it should be selected in Kconfig.
Reported-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Nicolas Del Piano <ndel314@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Compaq iPAQ h3600 also has the K4S281632b-1H memory type.
Verified by prying apart a broken board.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commtit 8a7b1227e3 (cpufreq: davinci: move cpufreq driver to
drivers/cpufreq) this added dependancy only for CONFIG_ARCH_DAVINCI_DA850
where as davinci_cpufreq_init() call is used by all davinci platform.
This patch fixes following build error:
arch/arm/mach-davinci/built-in.o: In function `davinci_init_late':
:(.init.text+0x928): undefined reference to `davinci_cpufreq_init'
make: *** [vmlinux] Error 1
Fixes: 8a7b1227e3 (cpufreq: davinci: move cpufreq driver to drivers/cpufreq)
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ensure that cpu->cpu is set before writing MSR_IA32_PERF_CTL during CPU
initialization. Otherwise only cpu0 has its P-state set and all other
cores are left with their values unchanged.
In most cases, this is not too serious because the P-states will be set
correctly when the timer function is run. But when the default governor
is set to performance, the per-CPU current_pstate stays the same forever
and no attempts are made to write the MSRs again.
Signed-off-by: Vincent Minet <vincent@vincent-minet.net>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If turbo is disabled in the BIOS bit 38 should be set in
MSR_IA32_MISC_ENABLE register per section 14.3.2.1 of the SDM Vol 3
document 325384-050US Feb 2014. If this bit is set do *not* attempt
to disable trubo via the MSR_IA32_PERF_CTL register. On some systems
trying to disable turbo via MSR_IA32_PERF_CTL will cause subsequent
writes to MSR_IA32_PERF_CTL not take affect, in fact reading
MSR_IA32_PERF_CTL will not show the IDA/Turbo DISENGAGE bit(32) as
set. A write of bit 32 to zero returns to normal operation.
Also deal with the case where the processor does not support
turbo and the BIOS does not report the fact in MSR_IA32_MISC_ENABLE
but does report the max and turbo P states as the same value.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=64251
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 21855ff5 (intel_pstate: Set turbo VID for BayTrail) introduced
setting the turbo VID which is required to prevent a machine check on
some Baytrail SKUs under heavy graphics based workloads. The
docmumentation update that brought the requirement to light also
changed the bit mask used for enumerating P state and VID values from
0x7f to 0x3f.
This change returns the mask value to 0x7f.
Tested with the Intel NUC DN2820FYK,
BIOS version FYBYT10H.86A.0034.2014.0513.1413 with v3.16-rc1 and
v3.14.8 kernel versions.
Fixes: 21855ff5 (intel_pstate: Set turbo VID for BayTrail)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77951
Reported-and-tested-by: Rune Reterson <rune@megahurts.dk>
Reported-and-tested-by: Eric Eickmeyer <erich@ericheickmeyer.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit bd0fa9bb45 introduced a failure path to cpufreq_update_policy() if
cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy'
label, which exits without unlocking any of the locks the function acquired
earlier. This causes later calls into cpufreq to hang.
Fix this by creating a new 'unlock' label and jumping to that instead.
Fixes: bd0fa9bb45 ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()")
Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There was a mistake in the actual rounding portion this previous patch:
f0fe3cd7e1 (intel_pstate: Correct rounding in busy calculation) such that
the rounding was asymetric and incorrect.
Severity: Not very serious, but can increase target pstate by one extra value.
For real world work flows the issue should self correct (but I have no proof).
It is the equivalent of different PID gains for positive and negative numbers.
Examples:
-3.000000 used to round to -4, rounds to -3 with this patch.
-3.503906 used to round to -5, rounds to -4 with this patch.
Fixes: f0fe3cd7e1 (intel_pstate: Correct rounding in busy calculation)
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
5fbfbcd3e8 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and
REGULATOR") was a little too quick in completely removing the dependency
on the THERMAL driver.
The problem is that while there are inline wrappers to turn the thermal
API calls into empty functions, those do not help if the cpu-thermal
driver is a loadable module and cpufreq-cpu0 is builtin.
Since CONFIG_CPU_THERMAL is a bool option that decides whether the cpu
code is built into the thermal module or not, we have to use a dependency
on the thermal driver itself. However, if CPU_THERMAL is disabled, we
don't need the dependency, hence the strange '!CPU_THERMAL || THERMAL'
construct.
Fixes: 5fbfbcd3e8 ("cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq:
cpufreq: cpufreq-cpu0: remove dependency on THERMAL and REGULATOR
cpufreq: tegra: update comment for clarity
cpufreq: intel_pstate: Remove duplicate CPU ID check
cpufreq: Mark CPU0 driver with CPUFREQ_NEED_INITIAL_FREQ_CHECK flag
cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info'
cpufreq: governor: Be friendly towards latency-sensitive bursty workloads
cpufreq: ppc-corenet-cpu-freq: do_div use quotient
Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64"
cpufreq: Tegra: implement intermediate frequency callbacks
cpufreq: add support for intermediate (stable) frequencies
cpufreq-cpu0 uses thermal framework to register a cooling device, but doesn't
depend on it as there are dummy calls provided by thermal layer when
CONFIG_THERMAL=n. And when these calls fail, the driver is still usable.
Similar explanation is valid for regulators as well. We do have dummy calls
available for regulator APIs and the driver can work even when those calls
fail.
So, we don't really need to mention thermal and regulators as a dependency for
cpufreq-cpu0 in Kconfig as platforms without support for thermal/regulator can
also use this driver. Remove this dependency.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tegra's driver got updated a bit (00917dd cpufreq: Tegra: implement intermediate
frequency callbacks) and implements new 'intermediate freq' infrastructure of
core. Above commit updated comments about when to call
clk_prepare_enable(pll_x_clk) and Doug wasn't satisfied with those comments and
said this:
> The "Though when target-freq is intermediate freq, we don't need to
> take this reference." makes me think that this function is actually
> called when target-freq is intermediate freq. I don't think it is,
> right?
For better clarity just make that comment more explicit about when we call
tegra_target_intermediate().
Reviewed-by: Stephen Warren <swarren@nvidia.com>
Reported-and-reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We check the CPU ID during driver init. There is no need
to do it again per logical CPU initialization.
So, remove the duplicate check.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in frequency table.
Sachin recently found this problem with cpufreq-cpu0 driver when he was testing
it for Exynos.
Set this flag for cpufreq-cpu0 driver.
Reported-and-tested-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
'copy_prev_load' was recently added by commit: 18b46ab (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).
It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.
True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.
So, drop 'copy_prev_load' and use 'prev_load' instead.
Update comments as well to make it more clear.
There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.
Also checkpatch is made more silent as it was reporting this (--strict option):
CHECK: Alignment should match open parenthesis
+ if (unlikely(wall_time > (2 * sampling_rate) &&
+ j_cdbs->prev_load)) {
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cpufreq governors like the ondemand governor calculate the load on the CPU
periodically by employing deferrable timers. A deferrable timer won't fire
if the CPU is completely idle (and there are no other timers to be run), in
order to avoid unnecessary wakeups and thus save CPU power.
However, the load calculation logic is agnostic to all this, and this can
lead to the problem described below.
Time (ms) CPU 1
100 Task-A running
110 Governor's timer fires, finds load as 100% in the last
10ms interval and increases the CPU frequency.
110.5 Task-A running
120 Governor's timer fires, finds load as 100% in the last
10ms interval and increases the CPU frequency.
125 Task-A went to sleep. With nothing else to do, CPU 1
went completely idle.
200 Task-A woke up and started running again.
200.5 Governor's deferred timer (which was originally programmed
to fire at time 130) fires now. It calculates load for the
time period 120 to 200.5, and finds the load is almost zero.
Hence it decreases the CPU frequency to the minimum.
210 Governor's timer fires, finds load as 100% in the last
10ms interval and increases the CPU frequency.
So, after the workload woke up and started running, the frequency was suddenly
dropped to absolute minimum, and after that, there was an unnecessary delay of
10ms (sampling period) to increase the CPU frequency back to a reasonable value.
And this pattern repeats for every wake-up-from-cpu-idle for that workload.
This can be quite undesirable for latency- or response-time sensitive bursty
workloads. So we need to fix the governor's logic to detect such wake-up-from-
cpu-idle scenarios and start the workload at a reasonably high CPU frequency.
One extreme solution would be to fake a load of 100% in such scenarios. But
that might lead to undesirable side-effects such as frequency spikes (which
might also need voltage changes) especially if the previous frequency happened
to be very low.
We just want to avoid the stupidity of dropping down the frequency to a minimum
and then enduring a needless (and long) delay before ramping it up back again.
So, let us simply carry forward the previous load - that is, let us just pretend
that the 'load' for the current time-window is the same as the load for the
previous window. That way, the frequency and voltage will continue to be set
to whatever values they were set at previously. This means that bursty workloads
will get a chance to influence the CPU frequency at which they wake up from
cpu-idle, based on their past execution history. Thus, they might be able to
avoid suffering from slow wakeups and long response-times.
However, we should take care not to over-do this. For example, such a "copy
previous load" logic will benefit cases like this: (where # represents busy
and . represents idle)
##########.........#########.........###########...........##########........
but it will be detrimental in cases like the one shown below, because it will
retain the high frequency (copied from the previous interval) even in a mostly
idle system:
##########.........#.................#.....................#...............
(i.e., the workload finished and the remaining tasks are such that their busy
periods are smaller than the sampling interval, which causes the timer to
always get deferred. So, this will make the copy-previous-load logic copy
the initial high load to subsequent idle periods over and over again, thus
keeping the frequency high unnecessarily).
So, we modify this copy-previous-load logic such that it is used only once
upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
previous load won't get blindly copied over; cpufreq will freshly evaluate the
load in the second idle interval, thus ensuring that the system comes back to
its normal state.
[ The right way to solve this whole problem is to teach the CPU frequency
governors to also track load on a per-task basis, not just a per-CPU basis,
and then use both the data sources intelligently to set the appropriate
frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
so this patch should make the situation bearable until then. ]
Experimental results:
+-------------------+
I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
between its execution such that its total utilization can be a user-defined
value, say 10% or 20% (higher the utilization specified, lesser the amount of
sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.
Behavior observed with tracing (sample taken from 40% utilization runs):
------------------------------------------------------------------------
Without patch:
~~~~~~~~~~~~~~
kworker/8:2-12137 416.335742: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137 416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40753 416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137 416.345744: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137 416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40753 416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip> --------------------------------------------------------------------- <snip>
<...>-40753 416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
<idle>-0 416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
<...>-40753 416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137 416.505739: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137 416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40753 416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137 416.515742: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137 416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
Observation: Ebizzy went idle at 416.402202, and started running again at
416.502130. But cpufreq noticed the long idle period, and dropped the frequency
at 416.505739, only to increase it back again at 416.515742, realizing that the
workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
for almost 13 milliseconds (almost 1 full sample period), and this pattern
repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
a lot.
With patch:
~~~~~~~~~~~
kworker/8:2-29802 464.832535: cpu_frequency: state=2061000 cpu_id=8
<snip> --------------------------------------------------------------------- <snip>
kworker/8:2-29802 464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40738 464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802 464.972536: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-29802 464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40738 464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip> --------------------------------------------------------------------- <snip>
kworker/8:2-29802 465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40738 465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802 465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40738 465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
<idle>-0 465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
<...>-40738 465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802 465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
<...>-40738 465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
Observation: Ebizzy went idle at 465.035797, and started running again at
465.240178. Since ebizzy was the only real workload running on this CPU,
cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
to the run without the patch) and this boost gave a modest improvement in total
throughput, as shown below.
Sleeping-ebizzy records-per-second:
-----------------------------------
Utilization Without patch With patch Difference (Absolute and % values)
10% 274767 277046 + 2279 (+0.829%)
20% 543429 553484 + 10055 (+1.850%)
40% 1090744 1107959 + 17215 (+1.578%)
60% 1634908 1662018 + 27110 (+1.658%)
A rudimentary and somewhat approximately latency-sensitive workload such as
sleeping-ebizzy itself showed a consistent, noticeable performance improvement
with this patch. Hence, workloads that are truly latency-sensitive will benefit
quite a bit from this change. Moreover, this is an overall win-win since this
patch does not hurt power-savings at all (because, this patch does not reduce
the idle time or idle residency; and the high frequency of the CPU when it goes
to cpu-idle does not affect/hurt the power-savings of deep idle states).
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 6712d29319 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost
error) used the remainder from do_div instead of the quotient. Fix that
and add one to ensure minimum is met.
Fixes: 6712d29319 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error)
Signed-off-by: Ed Swarthout <Ed.Swarthout@freescale.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This reverts commit 4920ab8497 (cpufreq: Enable big.LITTLE cpufreq
driver on arm64) that breaks build on arm64.
Fixes: 4920ab8497 (cpufreq: Enable big.LITTLE cpufreq driver on arm64)
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tegra has been switching to intermediate frequency (pll_p_clk) forever.
CPUFreq core has better support for handling notifications for these
frequencies and so we can adapt Tegra's driver to it.
Also do a WARN() if clk_set_parent() fails while moving back to pll_x
as we should have atleast restored to earlier frequency on error.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.
While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.
For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.
To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().
NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- ACPICA update to upstream version 20140424. That includes a
number of fixes and improvements related to things like GPE
handling, table loading, headers, memory mapping and unmapping,
DSDT/SSDT overriding, and the Unload() operator. The acpidump
utility from upstream ACPICA is included too. From Bob Moore,
Lv Zheng, David Box, David Binderman, and Colin Ian King.
- Fixes and cleanups related to ACPI video and backlight interfaces
from Hans de Goede. That includes blacklist entries for some new
machines and using native backlight by default.
- ACPI device enumeration changes to create platform devices
rather than PNP devices for ACPI device objects with _HID by
default. PNP devices will still be created for the ACPI device
object with device IDs corresponding to real PNP devices, so
that change should not break things left and right, and we're
expecting to see more and more ACPI-enumerated platform devices
in the future. From Zhang Rui and Rafael J Wysocki.
- Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing
it to handle system suspend/resume on Asus T100 correctly.
From Heikki Krogerus and Rafael J Wysocki.
- PM core update introducing a mechanism to allow runtime-suspended
devices to stay suspended over system suspend/resume transitions
if certain additional conditions related to coordination within
device hierarchy are met. Related PM documentation update and
ACPI PM domain support for the new feature. From Rafael J Wysocki.
- Fixes and improvements related to the "freeze" sleep state. They
affect several places including cpuidle, PM core, ACPI core, and
the ACPI battery driver. From Rafael J Wysocki and Zhang Rui.
- Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
- Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
Aggregator Device) drivers from Baoquan He, Manuel Schölling,
Tony Camuso, and Toshi Kani.
- System suspend/resume optimization in the ACPI battery driver from
Lan Tianyu.
- OPP (Operating Performance Points) subsystem updates from
Chander Kashyap, Mark Brown, and Nishanth Menon.
- cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
Stratos Karafotis, and Viresh Kumar.
- Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
Viresh Kumar.
- intel_pstate driver fixes and cleanups from Dirk Brandewie,
Doug Smythies, and Stratos Karafotis.
- Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
- Fix for the cpuidle menu governor from Chander Kashyap.
- New ARM clps711x cpuidle driver from Alexander Shiyan.
- Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
Fabian Frederick, Pali Rohár, and Sebastian Capella.
- Intel RAPL (Running Average Power Limit) driver updates from
Jacob Pan.
- PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
- devfreq core updates from Chanwoo Choi and Paul Bolle.
- devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
Bartlomiej Zolnierkiewicz.
- turbostat tool fix from Jean Delvare.
- cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
and Thomas Renninger.
- New ACPI ec_access.c tool for poking at the EC in a safe way
from Thomas Renninger.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJTjl16AAoJEILEb/54YlRxeKgP/RRQSV7lFtf582Dw/5M/iWOg
qYeNtuYFLArEmJ7SpxHdKsU1ZRm3CahAS1j7grvQMQasUxTzoavMcSBNZefeaoNK
d01LVNqcyKCZs3+izRezk5N1IY+AjdrOcqCdIk8rfgFnc6kOttYUrVcIzKuIKAvJ
MsJ5s/uqP8G69FsAA3Ttdtr0HKiQhN4skSt424wntQRDeJNZPBs74mPKBGh8bxlO
Zr/VCDibKQ2Z8jS7x+TzwZrOxgE1/9x0Cub6GAdTvAfS8A+utPwSkneUyopNqpQ+
tJ5rz5R+HpmPMerizBuU+5s+tvjDPtH4/OZvOPSpYraQSFLOwx3hAm+a5k7fOGmc
XWjXnXWT0i0V3iQkwrspTNjX1RgywbsHbmXrcWn192HResvMQ9zk2gH2ch6m8JhN
yTV5V51dOZicpPuaTCvIkJpsV33p6vRz+EdPBiXoEdua5KKtOg8EnQ470dNaMR92
3ZtWmIvSgGlyPyHlSHLfGXbPUwTYvDNV3aheIoXp9E6WY3WJN9J3WXm4EHKBNVaI
H83kwuk1s92cgqh22H5Pcb0CmDcrbkUdP6hhsPS/aL80/EJMljRP2AYW1Y+l1LAf
pzMLmekHFqQEDjFQltwGvFV/EjFeMHnqOgQONx9ygMaayCGGTYSDx3FbRDesf8t9
qhoFcTPSxoo0XjrGrR6b
=tpdF
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm into next
Pull ACPI and power management updates from Rafael Wysocki:
"ACPICA is the leader this time (63 commits), followed by cpufreq (28
commits), devfreq (15 commits), system suspend/hibernation (12
commits), ACPI video and ACPI device enumeration (10 commits each).
We have no major new features this time, but there are a few
significant changes of how things work. The most visible one will
probably be that we are now going to create platform devices rather
than PNP devices by default for ACPI device objects with _HID. That
was long overdue and will be really necessary to be able to use the
same drivers for the same hardware blocks on ACPI and DT-based systems
going forward. We're not expecting fallout from this one (as usual),
but it's something to watch nevertheless.
The second change having a chance to be visible is that ACPI video
will now default to using native backlight rather than the ACPI
backlight interface which should generally help systems with broken
Win8 BIOSes. We're hoping that all problems with the native backlight
handling that we had previously have been addressed and we are in a
good enough shape to flip the default, but this change should be easy
enough to revert if need be.
In addition to that, the system suspend core has a new mechanism to
allow runtime-suspended devices to stay suspended throughout system
suspend/resume transitions if some extra conditions are met
(generally, they are related to coordination within device hierarchy).
However, enabling this feature requires cooperation from the bus type
layer and for now it has only been implemented for the ACPI PM domain
(used by ACPI-enumerated platform devices mostly today).
Also, the acpidump utility that was previously shipped as a separate
tool will now be provided by the upstream ACPICA along with the rest
of ACPICA code, which will allow it to be more up to date and better
supported, and we have one new cpuidle driver (ARM clps711x).
The rest is improvements related to certain specific use cases,
cleanups and fixes all over the place.
Specifics:
- ACPICA update to upstream version 20140424. That includes a number
of fixes and improvements related to things like GPE handling,
table loading, headers, memory mapping and unmapping, DSDT/SSDT
overriding, and the Unload() operator. The acpidump utility from
upstream ACPICA is included too. From Bob Moore, Lv Zheng, David
Box, David Binderman, and Colin Ian King.
- Fixes and cleanups related to ACPI video and backlight interfaces
from Hans de Goede. That includes blacklist entries for some new
machines and using native backlight by default.
- ACPI device enumeration changes to create platform devices rather
than PNP devices for ACPI device objects with _HID by default. PNP
devices will still be created for the ACPI device object with
device IDs corresponding to real PNP devices, so that change should
not break things left and right, and we're expecting to see more
and more ACPI-enumerated platform devices in the future. From
Zhang Rui and Rafael J Wysocki.
- Updates for the ACPI LPSS (Low-Power Subsystem) driver allowing it
to handle system suspend/resume on Asus T100 correctly. From
Heikki Krogerus and Rafael J Wysocki.
- PM core update introducing a mechanism to allow runtime-suspended
devices to stay suspended over system suspend/resume transitions if
certain additional conditions related to coordination within device
hierarchy are met. Related PM documentation update and ACPI PM
domain support for the new feature. From Rafael J Wysocki.
- Fixes and improvements related to the "freeze" sleep state. They
affect several places including cpuidle, PM core, ACPI core, and
the ACPI battery driver. From Rafael J Wysocki and Zhang Rui.
- Miscellaneous fixes and updates of the ACPI core from Aaron Lu,
Bjørn Mork, Hanjun Guo, Lan Tianyu, and Rafael J Wysocki.
- Fixes and cleanups for the ACPI processor and ACPI PAD (Processor
Aggregator Device) drivers from Baoquan He, Manuel Schölling, Tony
Camuso, and Toshi Kani.
- System suspend/resume optimization in the ACPI battery driver from
Lan Tianyu.
- OPP (Operating Performance Points) subsystem updates from Chander
Kashyap, Mark Brown, and Nishanth Menon.
- cpufreq core fixes, updates and cleanups from Srivatsa S Bhat,
Stratos Karafotis, and Viresh Kumar.
- Updates, fixes and cleanups for the Tegra, powernow-k8, imx6q,
s5pv210, nforce2, and powernv cpufreq drivers from Brian Norris,
Jingoo Han, Paul Bolle, Philipp Zabel, Stratos Karafotis, and
Viresh Kumar.
- intel_pstate driver fixes and cleanups from Dirk Brandewie, Doug
Smythies, and Stratos Karafotis.
- Enabling the big.LITTLE cpufreq driver on arm64 from Mark Brown.
- Fix for the cpuidle menu governor from Chander Kashyap.
- New ARM clps711x cpuidle driver from Alexander Shiyan.
- Hibernate core fixes and cleanups from Chen Gang, Dan Carpenter,
Fabian Frederick, Pali Rohár, and Sebastian Capella.
- Intel RAPL (Running Average Power Limit) driver updates from Jacob
Pan.
- PNP subsystem updates from Bjorn Helgaas and Fabian Frederick.
- devfreq core updates from Chanwoo Choi and Paul Bolle.
- devfreq updates for exynos4 and exynos5 from Chanwoo Choi and
Bartlomiej Zolnierkiewicz.
- turbostat tool fix from Jean Delvare.
- cpupower tool updates from Prarit Bhargava, Ramkumar Ramachandra
and Thomas Renninger.
- New ACPI ec_access.c tool for poking at the EC in a safe way from
Thomas Renninger"
* tag 'pm+acpi-3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (187 commits)
ACPICA: Namespace: Remove _PRP method support.
intel_pstate: Improve initial busy calculation
intel_pstate: add sample time scaling
intel_pstate: Correct rounding in busy calculation
intel_pstate: Remove C0 tracking
PM / hibernate: fixed typo in comment
ACPI: Fix x86 regression related to early mapping size limitation
ACPICA: Tables: Add mechanism to control early table checksum verification.
ACPI / scan: use platform bus type by default for _HID enumeration
ACPI / scan: always register ACPI LPSS scan handler
ACPI / scan: always register memory hotplug scan handler
ACPI / scan: always register container scan handler
ACPI / scan: Change the meaning of missing .attach() in scan handlers
ACPI / scan: introduce platform_id device PNP type flag
ACPI / scan: drop unsupported serial IDs from PNP ACPI scan handler ID list
ACPI / scan: drop IDs that do not comply with the ACPI PNP ID rule
ACPI / PNP: use device ID list for PNPACPI device enumeration
ACPI / scan: .match() callback for ACPI scan handlers
ACPI / battery: wakeup the system only when necessary
power_supply: allow power supply devices registered w/o wakeup source
...
SoC-near driver changes that we're merging through our tree. Mostly
because they depend on other changes we have staged, but in some cases
because the driver maintainers preferred that we did it this way.
This contains a largeish cleanup series of the omap_l3_noc bus driver,
cpuidle rework for Exynos, some reset driver conversions and a long
branch of TI EDMA fixes and cleanups, with more to come next release.
The TI EDMA cleanups is a shared branch with the dmaengine tree, with
a handful of Davinci-specific fixes on top.
After discussion at last year's KS (and some more on the mailing lists),
we are here adding a drivers/soc directory. The purpose of this is
to keep per-vendor shared code that's needed by different drivers but
that doesn't fit into the MFD (nor drivers/platform) model. We expect
to keep merging contents for this hierarchy through arm-soc so we can
keep an eye on what the vendors keep adding here and not making it a
free-for-all to shove in crazy stuff.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJTjOFiAAoJEIwa5zzehBx30RYP/0UE+R8ccdsodunmIDrmQ7QP
qFWe1YTWlyXtGBDaPCNfdcU09UYatPKuCv5dJ2ToQCyyFI26PIIhFtnCNXmMuYz+
XPCuqAlJ9hZWx7+j2hXRlyhoZMAaJ5EVVxaK5tnVYXDIfy1Y3xG7i069HD/qGrQp
xrV+XofFmpU2VAds6S+SpecFFfYD7n/pJ1bTSgzPfaUsEUyV882dJ3skgs1VpTzQ
PnL/0Z2t4ePoP3+6p+F7EnJxemLF5IXrlL0c7hODxQKuMqlzoUluywh6SwOHfCQL
u2cc5SFUbbKhExwlGOVibdQMiC0HUOXyRvyYFOIdbv+xNH+Zc/tcoQQ22PWm4Yy1
08qOm3Fr6yw5nH5IT+1wCIFCzJEC/ZHM5B2t+RISFybAMk6Bg1TDYJLmd570zkEL
aTLtS5hdmy4h8Ad5FBtwKNyL//6FJJxhbHUu/m0qaE0phq94+78B2M6vbx6757xC
kCFlpJsHoN0Tn5c9Q1hpTqI/BHxb4UR7Nf+b8Ox8Veuc9JrS35lzi/rWnGxB5WB0
+1KCA8eih9KXTtksxAte1TmSbMciqW559RUR7dNAPXAMPksY2mJV1I+rg0cRsY3i
F90Lnc6LWUM5PYpc4VwiC0sUCLKzTFnpZUELqMOiws3PUblbb0StXuoNo6owbtsK
mp1Juxi1n7VhoN9AFVpL
=SC+e
-----END PGP SIGNATURE-----
Merge tag 'drivers-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into next
Pull ARM SoC driver changes from Olof Johansson:
"SoC-near driver changes that we're merging through our tree. Mostly
because they depend on other changes we have staged, but in some cases
because the driver maintainers preferred that we did it this way.
This contains a largeish cleanup series of the omap_l3_noc bus driver,
cpuidle rework for Exynos, some reset driver conversions and a long
branch of TI EDMA fixes and cleanups, with more to come next release.
The TI EDMA cleanups is a shared branch with the dmaengine tree, with
a handful of Davinci-specific fixes on top.
After discussion at last year's KS (and some more on the mailing
lists), we are here adding a drivers/soc directory. The purpose of
this is to keep per-vendor shared code that's needed by different
drivers but that doesn't fit into the MFD (nor drivers/platform)
model. We expect to keep merging contents for this hierarchy through
arm-soc so we can keep an eye on what the vendors keep adding here and
not making it a free-for-all to shove in crazy stuff"
* tag 'drivers-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (101 commits)
cpufreq: exynos: Fix driver compilation with ARCH_MULTIPLATFORM
tty: serial: msm: Remove direct access to GSBI
power: reset: keystone-reset: introduce keystone reset driver
Documentation: dt: add bindings for keystone pll control controller
Documentation: dt: add bindings for keystone reset driver
soc: qcom: fix of_device_id table
ARM: EXYNOS: Fix kernel panic when unplugging CPU1 on exynos
ARM: EXYNOS: Move the driver to drivers/cpuidle directory
ARM: EXYNOS: Cleanup all unneeded headers from cpuidle.c
ARM: EXYNOS: Pass the AFTR callback to the platform_data
ARM: EXYNOS: Move S5P_CHECK_SLEEP into pm.c
ARM: EXYNOS: Move the power sequence call in the cpu_pm notifier
ARM: EXYNOS: Move the AFTR state function into pm.c
ARM: EXYNOS: Encapsulate the AFTR code into a function
ARM: EXYNOS: Disable cpuidle for exynos5440
ARM: EXYNOS: Encapsulate boot vector code into a function for cpuidle
ARM: EXYNOS: Pass wakeup mask parameter to function for cpuidle
ARM: EXYNOS: Remove ifdef for scu_enable in pm
ARM: EXYNOS: Move scu_enable in the cpu_pm notifier
ARM: EXYNOS: Use the cpu_pm notifier for pm
...
Cleanups for 3.16. Among these are:
- A bunch of misc cleanups for Broadcom platforms, mostly housekeeping
- Enabling Common Clock Framework on the older s3c24xx Samsung chipsets
- Cleanup of the Versatile Express system controller code, moving it to syscon
- Power management cleanups for OMAP platforms
+ a handful of other cleanups across the place
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJTjMwHAAoJEIwa5zzehBx3MjMP/iELgDsqbNE2wxF9Fb5EEnoe
S11q1QIvVrMVdMcKFN5HfW7f+xNso6+4SwXW0cRrJokGvaqRE758WZWuZq0QBUeS
RYMhfpqmI6pTTJUyy6i6OyXhuRqu8rQ1NPEAatYrKzmtwFX1H4t25f1YtZWhBcK8
ONi45FHeH1OKGGpjpT63uhWEzLk+LZI2MtgxmWoFcemf7guX6vEPJVuVRi8eqLoS
9vl1cAkweYgGhjvQFcSXENaguV50dZlLc9C41dJk9KVvJfRt7o+/cRbG5YpGvnp5
Liu+OWM72w0BkgNk6wDN4kaPX5UGLF8QX11JlvDRCJ2FcPtM4NBG/C9TqLMfkKDR
Ze+ITiXh6NjefdTZWJaM4vzsd6vFws8EYAP24IWFlZ451bNLVN1lzlgqluPNoKmj
CAsFPZhY/x5X9a8VLZ72ohx3N17T/iMsOlbiWtnlfqDcL6N0IoLG1YkFFeQIKEAH
mpobWus8Myq1miWqSaeXh5wOqUVQmYR0I8jNoTfte1nBYSaIGhtMixoQhM6Zw50C
dgSh4p7qhrZUOnYmkPqFXr7NCJ9n3RD10Xu8d/3IIp0u9RJ5Kx6NCEg9adq22jZQ
XGrr/vH0sM8MzpKmfTMi5t2Cx5kP2G+O3enq0hQi4x3Cb4o8vwWQlMgydTd+xBjj
aLo3WTTw0h6nTuKkZL2p
=wuX4
-----END PGP SIGNATURE-----
Merge tag 'cleanup-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc into next
Pull ARM SoC cleanups from Olof Johansson:
"Cleanups for 3.16. Among these are:
- a bunch of misc cleanups for Broadcom platforms, mostly
housekeeping
- enabling Common Clock Framework on the older s3c24xx Samsung
chipsets
- cleanup of the Versatile Express system controller code, moving it
to syscon
- power management cleanups for OMAP platforms
plus a handful of other cleanups across the place"
* tag 'cleanup-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (87 commits)
ARM: kconfig: allow PCI support to be selected with ARCH_MULTIPLATFORM
clk: samsung: fix build error
ARM: vexpress: refine dependencies for new code
clk: samsung: clk-s3c2410-dlck: do not use PNAME macro as it declares __initdata
cpufreq: exynos: Fix the compile error
ARM: S3C24XX: move debug-macro.S into the common space
ARM: S3C24XX: use generic DEBUG_UART_PHY/_VIRT in debug macro
ARM: S3C24XX: trim down debug uart handling
ARM: compressed/head.S: remove s3c24xx special case
ARM: EXYNOS: Remove unnecessary inclusion of cpu.h
ARM: EXYNOS: Migrate Exynos specific macros from plat to mach
ARM: EXYNOS: Remove exynos_subsys registration
ARM: EXYNOS: Remove duplicate lines in Makefile
ARM: EXYNOS: use v7_exit_coherency_flush macro for cache disabling
ARM: OMAP4: PRCM: remove references to cm-regbits-44xx.h from PRCM core files
ARM: OMAP3/4: PRM: add support of late_init call to prm_ll_ops
ARM: OMAP3/OMAP4: PRM: add prm_features flags and add IO wakeup under it
ARM: OMAP3/4: PRM: provide io chain reconfig function through irq setup
ARM: OMAP2+: PRM: remove unnecessary cpu_is_XXX calls from prm_init / exit
ARM: OMAP2+: PRCM: cleanup some header includes
...
This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>