linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Rafael J. Wysocki	f3db6de55e	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: intel_pstate: Implement passive mode with HWP enabled	2020-08-14 19:39:35 +02:00
Rafael J. Wysocki	f6ebbcf08f	cpufreq: intel_pstate: Implement passive mode with HWP enabled Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>	2020-08-11 17:29:45 +02:00
Linus Torvalds	2324d50d05	It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU= =JQLZ -----END PGP SIGNATURE----- Merge tag 'docs-5.9' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "It's been a busy cycle for documentation - hopefully the busiest for a while to come. Changes include: - Some new Chinese translations - Progress on the battle against double words words and non-HTTPS URLs - Some block-mq documentation - More RST conversions from Mauro. At this point, that task is essentially complete, so we shouldn't see this kind of churn again for a while. Unless we decide to switch to asciidoc or something...:) - Lots of typo fixes, warning fixes, and more" * tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits) scripts/kernel-doc: optionally treat warnings as errors docs: ia64: correct typo mailmap: add entry for <alobakin@marvell.com> doc/zh_CN: add cpu-load Chinese version Documentation/admin-guide: tainted-kernels: fix spelling mistake MAINTAINERS: adjust kprobes.rst entry to new location devices.txt: document rfkill allocation PCI: correct flag name docs: filesystems: vfs: correct flag name docs: filesystems: vfs: correct sync_mode flag names docs: path-lookup: markup fixes for emphasis docs: path-lookup: more markup fixes docs: path-lookup: fix HTML entity mojibake CREDITS: Replace HTTP links with HTTPS ones docs: process: Add an example for creating a fixes tag doc/zh_CN: add Chinese translation prefer section doc/zh_CN: add clearing-warn-once Chinese version doc/zh_CN: add admin-guide index doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label futex: MAINTAINERS: Re-add selftests directory ...	2020-08-04 22:47:54 -07:00
Randy Dunlap	b45225b41a	Documentation/admin-guide: intel-speed-select: drop doubled words Drop the doubled words "that" and "and". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: platform-driver-x86@vger.kernel.org Link: https://lore.kernel.org/r/20200704032020.21923-11-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-07-05 14:01:49 -06:00
Randy Dunlap	4d7e204f7d	Documentation/admin-guide: intel_pstate: drop doubled word Drop the doubled word "to". Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: linux-pm@vger.kernel.org Link: https://lore.kernel.org/r/20200704032020.21923-10-rdunlap@infradead.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2020-07-05 14:01:49 -06:00
Quentin Perret	8412b4563e	cpufreq: Specify default governor on command line Currently, the only way to specify the default CPUfreq governor is via Kconfig options, which suits users who can build the kernel themselves perfectly. However, for those who use a distro-like kernel (such as Android, with the Generic Kernel Image project), the only way to use a non-default governor is to boot to userspace, and to then switch using the sysfs interface. Being able to specify the default governor on the command line, like is the case for cpuidle, would allow those users to specify their governor of choice earlier on, and to simplify the userspace boot procedure slighlty. To support this use-case, add a kernel command line parameter allowing the default governor for CPUfreq to be specified, which takes precedence over the built-in default. This implementation has one notable limitation: the default governor must be registered before the driver. This is solved for builtin governors and drivers using appropriate *_initcall() functions. And in the modular case, this must be reflected as a constraint on the module loading order. Signed-off-by: Quentin Perret <qperret@google.com> [ Viresh: Converted 'default_governor' to a string and parsing it only at initcall level, and several updates to cpufreq_init_policy(). ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-02 13:03:30 +02:00
Srinivas Pandruvada	f473bf398b	cpufreq: intel_pstate: Allow raw energy performance preference value Currently using attribute "energy_performance_preference", user space can write one of the four per-defined preference string. These preference strings gets mapped to a hard-coded Energy-Performance Preference (EPP) or Energy-Performance Bias (EPB) knob. These four values are supposed to cover broad spectrum of use cases, but are not uniformly distributed in the range. There are number of cases, where this is not enough. For example: Suppose user wants more performance when connected to AC. Instead of using default "balance performance", the "performance" setting can be used. This changes EPP value from 0x80 to 0x00. But setting EPP to 0, results in electrical and thermal issues on some platforms. This results in aggressive throttling, which causes a drop in performance. But some value between 0x80 and 0x00 results in better performance. But that value can't be fixed as the power curve is not linear. In some cases just changing EPP from 0x80 to 0x75 is enough to get significant performance gain. Similarly on battery the default "balance_performance" mode can be aggressive in power consumption. But picking up the next choice "balance power" results in too much loss of performance, which results in bad user experience in use cases like "Google Hangout". It was observed that some value between these two EPP is optimal. This change allows fine grain EPP tuning for platform like Chromebook or for users who wants to fine tune power and performance. Here based on the product and use cases, different EPP values can be set. This change is similar to the change done for: /sys/devices/system/cpu/cpu*/power/energy_perf_bias where user has choice to write a predefined string or raw value. The change itself is trivial. When user preference doesn't match predefined string preferences and value is an unsigned integer and in range, use that value for EPP. When the EPP feature is not present writing raw value is not supported. Suggested-by: Len Brown <lenb@kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-02 13:02:46 +02:00
Srinivas Pandruvada	ed7bde7a6d	cpufreq: intel_pstate: Allow enable/disable energy efficiency By default intel_pstate the driver disables energy efficiency by setting MSR_IA32_POWER_CTL bit 19 for Kaby Lake desktop CPU model in HWP mode. This CPU model is also shared by Coffee Lake desktop CPUs. This allows these systems to reach maximum possible frequency. But this adds power penalty, which some customers don't want. They want some way to enable/ disable dynamically. So, add an additional attribute "energy_efficiency" under /sys/devices/system/cpu/intel_pstate/ for these CPU models. This allows to read and write bit 19 ("Disable Energy Efficiency Optimization") in the MSR IA32_POWER_CTL. This attribute is present in both HWP and non-HWP mode as this has an effect in both modes. Refer to Intel Software Developer's manual for details. The scope of this bit is package wide. Also these systems are single package systems. So read/write MSR on the current CPU is enough. The energy efficiency (EE) bit setting needs to be preserved during suspend/resume and CPU offline/online operation. To do this: - Restoring the EE setting from the cpufreq resume() callback, if there is change from the system default. - By default, don't disable EE from cpufreq init() callback for matching CPU models. Since the scope is package wide and is a single package system, move the disable EE calls from init() callback to intel_pstate_init() function, which is called only once. Suggested-by: Len Brown <lenb@kernel.org> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-02 13:02:46 +02:00
Rafael J. Wysocki	a34024d98e	Merge branches 'pm-devfreq', 'powercap', 'pm-docs' and 'pm-tools' * pm-devfreq: PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting * powercap: powercap: RAPL: remove unused local MSR define powercap/intel_rapl: add support for ElkhartLake * pm-docs: Documentation: admin-guide: pm: Document intel-speed-select * pm-tools: cpupower: Remove unneeded semicolon	2020-06-01 15:20:45 +02:00
Rafael J. Wysocki	ac7ccfc75f	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: Fix up cpufreq_boost_set_sw() cpufreq: fix minor typo in struct cpufreq_driver doc comment cpufreq: qoriq: Add platform dependencies clk: qoriq: add cpufreq platform device cpufreq: qoriq: convert to a platform driver cpufreq: qcom: fix wrong compatible binding cpufreq: imx-cpufreq-dt: support i.MX7ULP cpufreq: dt: Add support for r8a7742 cpufreq: Add i.MX7ULP to cpufreq-dt-platdev blacklist cpufreq: omap: Build driver by default for ARCH_OMAP2PLUS cpufreq: intel_pstate: Use passive mode by default without HWP	2020-06-01 15:19:39 +02:00
Srinivas Pandruvada	213081dadd	Documentation: admin-guide: pm: Document intel-speed-select Added documentation to configure servers to use Intel(R) Speed Select Technology using intel-speed-select tool. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Andriy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-05-19 19:59:10 +02:00
Hanjun Guo	7395683a24	Documentation: cpuidle: update the document Update the document after the remove of cpuidle_sysfs_switch. Signed-off-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-05-19 17:41:17 +02:00
Rafael J. Wysocki	33aa46f252	cpufreq: intel_pstate: Use passive mode by default without HWP After recent changes allowing scale-invariant utilization to be used on x86, the schedutil governor on top of intel_pstate in the passive mode should be on par with (or better than) the active mode "powersave" algorithm of intel_pstate on systems in which hardware-managed P-states (HWP) are not used, so it should not be necessary to use the internal scaling algorithm in those cases. Accordingly, modify intel_pstate to start in the passive mode by default if the processor at hand does not support HWP of if the driver is requested to avoid using HWP through the kernel command line. Among other things, that will allow utilization clamps and the support for RT/DL tasks in the schedutil governor to be utilized on systems in which intel_pstate is used. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-04-17 17:04:38 +02:00
Rafael J. Wysocki	4506c531f1	Documentation: PM: sleep: Document system-wide suspend code flows Add a document describing high-level system-wide suspend code flows in Linux. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-04-03 11:41:01 +02:00
Rafael J. Wysocki	2409000a0c	Merge branches 'pm-devfreq', 'powercap' and 'pm-docs' * pm-devfreq: PM / devfreq: Get rid of some doc warnings PM / devfreq: Fix handling dev_pm_qos_remove_request result PM / devfreq: Fix a typo in a comment PM / devfreq: Change to DEVFREQ_GOV_UPDATE_INTERVAL event name PM / devfreq: Remove unneeded extern keyword PM / devfreq: Use constant name of userspace governor * powercap: powercap: idle_inject: Replace zero-length array with flexible-array member * pm-docs: docs: cpu-freq: convert cpufreq-stats.txt to ReST docs: cpu-freq: convert cpu-drivers.txt to ReST docs: cpu-freq: convert core.txt to ReST docs: cpu-freq: convert index.txt to ReST docs: cpufreq: fix a broken reference Documentation: cpufreq: Move legacy driver documentation	2020-03-30 14:48:31 +02:00
Rafael J. Wysocki	0411f0d10e	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: intel_pstate: Simplify intel_pstate_cpu_init() cpufreq: qcom: Add support for krait based socs cpufreq: imx6q-cpufreq: Improve the logic of -EPROBE_DEFER handling cpufreq: Use scnprintf() for avoiding potential buffer overflow Documentation: intel_pstate: update links for references cpufreq: intel_pstate: Consolidate policy verification cpufreq: dt: Allow platform specific intermediate callbacks cpufreq: imx-cpufreq-dt: Correct i.MX8MP's market segment fuse location cpufreq: imx6q: read OCOTP through nvmem for imx6q cpufreq: imx6q: fix error handling cpufreq: imx-cpufreq-dt: Add "cpu-supply" property check cpufreq: ti-cpufreq: Add support for OPP_PLUS cpufreq: imx6q: Fixes unwanted cpu overclocking on i.MX6ULL	2020-03-30 14:46:27 +02:00
Alex Hung	c1f59a3782	Documentation: intel_pstate: update links for references URLs for presentation and Intel Software Developer’s Manual are updated as they were using "http" which are gradually replaced by "https". Signed-off-by: Alex Hung <alex.hung@canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-03-14 11:46:02 +01:00
Rafael J. Wysocki	03b2249650	Documentation: cpufreq: Move legacy driver documentation There are three legacy driver documents in Documentation/cpu-freq/ that were added years ago and converting them each to the .rst format is rather pointless, even though there is some value in preserving them. However, if they are preserved, they need to go into the admin-guide part of cpufreq documentation where they belong (at least to a certain extent). To preserve them with minimum amount of changes and put them into the right place, and make it possible to process them into HTML (and other formats) along with the rest of the documentation, move them each as a "literal text" block into a separate section of a single .rst "wrapper" file under Documentation/admin-guide/pm/. While at it, repair the PCC specification URL in one of them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-03-02 11:31:06 +01:00
Rafael J. Wysocki	b8e6e27c62	Documentation: PM: QoS: Update to reflect previous code changes Update the PM QoS documentation to reflect the previous code changes regarding the removal of PM QoS classes and the CPU latency QoS API rework. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org> Tested-by: Amit Kucheria <amit.kucheria@linaro.org>	2020-02-14 10:37:26 +01:00
Rafael J. Wysocki	332008256f	Merge branches 'pm-avs' and 'pm-cpuidle' * pm-avs: power: avs: qcom-cpr: Avoid clang -Wsometimes-uninitialized in cpr_scale power: avs: qcom-cpr: add unspecified HAS_IOMEM dependency PM / AVS: rockchip-io: fix the supply naming for the emmc supply on px30 power: avs: qcom-cpr: add a printout after the driver has been initialized * pm-cpuidle: cpuidle: Documentation: Clean up PM QoS description intel_idle: Introduce 'states_off' module parameter intel_idle: Introduce 'use_acpi' module parameter	2020-02-07 11:01:40 +01:00
Rafael J. Wysocki	f06572ef47	cpuidle: Documentation: Clean up PM QoS description Clean up the language in one paragraph in the PM QoS description in Documentation/admin-guide/pm/cpuidle.rst. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-02-05 02:08:31 +01:00
Rafael J. Wysocki	c21502efda	Documentation: admin-guide: PM: Update sleep states documentation There is some information in Documentation/power/interface.rst that is still missing from Documentation/admin-guide/pm/sleep-states.rst and really should be present in there, so update the latter by adding that information to it and delete the former (as it becomes redundant after that and it is somewhat outdated). While at it, clean up some assorted pieces of sleep-states.rst a bit. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-02-03 11:58:26 +01:00
Rafael J. Wysocki	4dcb78ee57	intel_idle: Introduce 'states_off' module parameter In certain system configurations it may not be desirable to use some C-states assumed to be available by intel_idle and the driver needs to be prevented from using them even before the cpuidle sysfs interface becomes accessible to user space. Currently, the only way to achieve that is by setting the 'max_cstate' module parameter to a value lower than the index of the shallowest of the C-states in question, but that may be overly intrusive, because it effectively makes all of the idle states deeper than the 'max_cstate' one go away (and the C-state to avoid may be in the middle of the range normally regarded as available). To allow that limitation to be overcome, introduce a new module parameter called 'states_off' to represent a list of idle states to be disabled by default in the form of a bitmask and update the documentation to cover it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-02-03 11:57:18 +01:00
Rafael J. Wysocki	3a5be9b8f4	intel_idle: Introduce 'use_acpi' module parameter For diagnostics, it is generally useful to be able to make intel_idle take the system's ACPI tables into consideration even if that is not required for the processor model in there, so introduce a new module parameter, 'use_acpi', to make that happen and update the documentation to cover it. While at it, fix the 'no_acpi' module parameter name in the documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-02-03 11:57:08 +01:00
Rafael J. Wysocki	a329918221	Documentation: admin-guide: PM: Add intel_idle document Add an admin-guide document for the intel_idle driver to describe how it works: how it enumerates idle states, what happens during the initialization of it, how it can be controlled via the kernel command line and so on. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>	2020-01-15 10:54:58 +01:00
Rafael J. Wysocki	75a8026741	cpuidle: Allow idle states to be disabled by default In certain situations it may be useful to prevent some idle states from being used by default while allowing user space to enable them later on. For this purpose, introduce a new state flag, CPUIDLE_FLAG_OFF, to mark idle states that should be disabled by default, make the core set CPUIDLE_STATE_DISABLED_BY_USER for those states at the initialization time and add a new state attribute in sysfs, "default_status", to inform user space of the initial status of the given idle state ("disabled" if CPUIDLE_FLAG_OFF is set for it, "enabled" otherwise). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2019-12-27 11:02:08 +01:00
Rafael J. Wysocki	7afc53951a	Merge branches 'pm-docs' and 'pm-misc' * pm-docs: Documentation: PM: Unify copyright notices Documentation: PM: Add SPDX license tags to multiple files cpufreq: intel_pstate: Documentation: Add references sections * pm-misc: firmware/psci: add support for SYSTEM_RESET2 drivers: firmware: psci: Announce support for OS initiated suspend mode drivers: firmware: psci: Simplify error path of psci_dt_init() drivers: firmware: psci: Split psci_dt_cpu_init_idle() MAINTAINERS: Update files for PSCI drivers: firmware: psci: Move psci to separate directory	2019-05-06 10:55:19 +02:00
Rafael J. Wysocki	7973b799db	admin-guide: pm: intel_epb: Add SPDX license tag and copyright notice Add an SPDX license tag and a copyright notice to the intel_epb.rst file under Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-08 12:59:09 +02:00
Rafael J. Wysocki	fc1860d6b1	Documentation: PM: Unify copyright notices Unify copyright notices in the .rst files under Documentation/driver-api/pm and Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-08 12:57:47 +02:00
Rafael J. Wysocki	fc7db767b1	Documentation: PM: Add SPDX license tags to multiple files Add SPDX license tags to .rst files under Documentation/driver-api/pm and Documentation/admin-quide/pm. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-08 12:57:47 +02:00
Rafael J. Wysocki	1120b0f985	cpufreq: intel_pstate: Documentation: Add references sections Add separate refereces sections to the cpufreq.rst and intel_pstate.rst documents under admin-quide/pm and list the references to external documentation in there. Update the ACPI specification URL while at it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2019-04-08 12:57:47 +02:00
Rafael J. Wysocki	b9c273babc	PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface The Performance and Energy Bias Hint (EPB) is expected to be set by user space through the generic MSR interface, but that interface is not particularly nice and there are security concerns regarding it, so it is not always available. For this reason, add a sysfs interface for reading and updating the EPB, in the form of a new attribute, energy_perf_bias, located under /sys/devices/system/cpu/cpu#/power/ for online CPUs that support the EPB feature. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Borislav Petkov <bp@suse.de>	2019-04-07 22:33:42 +02:00
Rafael J. Wysocki	5861381d48	PM / arch: x86: Rework the MSR_IA32_ENERGY_PERF_BIAS handling The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is problematic, because it may cause changes made by user space to that MSR (with the help of the x86_energy_perf_policy tool, for example) to be lost every time a CPU goes offline and then back online as well as during system-wide power management transitions into sleep states and back into the working state. The first problem is that if the current EPB value for a CPU going online is 0 ('performance'), the kernel will change it to 6 ('normal') regardless of whether or not this is the first bring-up of that CPU. That also happens during system-wide resume from sleep states (including, but not limited to, hibernation). However, the EPB may have been adjusted by user space this way and the kernel should not blindly override that setting. The second problem is that if the platform firmware resets the EPB values for any CPUs during system-wide resume from a sleep state, the kernel will not restore their previous EPB values that may have been set by user space before the preceding system-wide suspend transition. Again, that behavior may at least be confusing from the user space perspective. In order to address these issues, rework the handling of MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU offline and restored on CPU online as well as (for the boot CPU) during the syscore stages of system-wide suspend and resume transitions, respectively. However, retain the policy by which the EPB is set to 6 ('normal') on the first bring-up of each CPU if its initial value is 0, based on the observation that 0 may mean 'not initialized' just as well as 'performance' in that case. While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into a separate file and document it in Documentation/admin-guide. Fixes: `abe48b1082` (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS) Fixes: `b51ef52df7` (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume) Reported-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2019-04-07 22:33:19 +02:00
Rafael J. Wysocki	b26bf6ab71	cpuidle: New timer events oriented governor for tickless systems The venerable menu governor does some things that are quite questionable in my view. First, it includes timer wakeups in the pattern detection data and mixes them up with wakeups from other sources which in some cases causes it to expect what essentially would be a timer wakeup in a time frame in which no timer wakeups are possible (because it knows the time until the next timer event and that is later than the expected wakeup time). Second, it uses the extra exit latency limit based on the predicted idle duration and depending on the number of tasks waiting on I/O, even though those tasks may run on a different CPU when they are woken up. Moreover, the time ranges used by it for the sleep length correction factors depend on whether or not there are tasks waiting on I/O, which again doesn't imply anything in particular, and they are not correlated to the list of available idle states in any way whatever. Also, the pattern detection code in menu may end up considering values that are too large to matter at all, in which cases running it is a waste of time. A major rework of the menu governor would be required to address these issues and the performance of at least some workloads (tuned specifically to the current behavior of the menu governor) is likely to suffer from that. It is thus better to introduce an entirely new governor without them and let everybody use the governor that works better with their actual workloads. The new governor introduced here, the timer events oriented (TEO) governor, uses the same basic strategy as menu: it always tries to find the deepest idle state that can be used in the given conditions. However, it applies a different approach to that problem. First, it doesn't use "correction factors" for the time till the closest timer, but instead it tries to correlate the measured idle duration values with the available idle states and use that information to pick up the idle state that is most likely to "match" the upcoming CPU idle interval. Second, it doesn't take the number of "I/O waiters" into account at all and the pattern detection code in it avoids taking timer wakeups into account. It also only uses idle duration values less than the current time till the closest timer (with the tick excluded) for that purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>	2019-01-16 23:07:30 +01:00
Rafael J. Wysocki	3a56fe685d	Merge branches 'pm-cpuidle', 'pm-cpufreq' and 'pm-cpufreq-sched' * pm-cpuidle: cpuidle: Add 'above' and 'below' idle state metrics cpuidle: big.LITTLE: fix refcount leak cpuidle: Add cpuidle.governor= command line parameter cpuidle: poll_state: Disregard disable idle states Documentation: admin-guide: PM: Add cpuidle document * pm-cpufreq: cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver dt-bindings: cpufreq: Introduce QCOM cpufreq firmware bindings cpufreq: nforce2: Remove meaningless return cpufreq: ia64: Remove unused header files cpufreq: imx6q: save one condition block for normal case of nvmem read cpufreq: imx6q: remove unused code cpufreq: pmac64: add of_node_put() cpufreq: powernv: add of_node_put() Documentation: intel_pstate: Clarify coordination of P-State limits cpufreq: intel_pstate: Force HWP min perf before offline cpufreq: s3c24xx: Change to use DEFINE_SHOW_ATTRIBUTE macro * pm-cpufreq-sched: sched/cpufreq: Add the SPDX tags	2018-12-21 10:06:06 +01:00
Rafael J. Wysocki	04dab58a39	cpuidle: Add 'above' and 'below' idle state metrics Add two new metrics for CPU idle states, "above" and "below", to count the number of times the given state had been asked for (or entered from the kernel's perspective), but the observed idle duration turned out to be too short or too long for it (respectively). These metrics help to estimate the quality of the CPU idle governor in use. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-12-12 23:22:18 +01:00
Rafael J. Wysocki	61cb5758d3	cpuidle: Add cpuidle.governor= command line parameter Add cpuidle.governor= command line parameter to allow the default cpuidle governor to be replaced. That is useful, for example, if someone running a tickful kernel wants to use the menu governor on it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-12-11 12:08:44 +01:00
Rafael J. Wysocki	aa5eee355b	Documentation: admin-guide: PM: Add cpuidle document Important information is missing from user/admin cpuidle documentation available today, so add a new user/admin document for cpuidle containing current and comprehensive information to admin-guide and drop the old .txt documents it is replacing. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>	2018-12-03 10:03:36 +01:00
Srinivas Pandruvada	60935c17e2	Documentation: intel_pstate: Clarify coordination of P-State limits Explain influence of per-core P-states and hyper threading on the effective performance. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-11-29 22:31:58 +01:00
Zhao Wei Liew	e531efa1e9	Documentation: cpufreq: Correct a typo Fix a typo in the admin-guide documentation for cpufreq. Signed-off-by: Zhao Wei Liew <zhaoweiliew@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-11-07 13:32:34 +01:00
Srinivas Pandruvada	4b73d334c5	Documentation: intel_pstate: Add base_frequency information Updated documentation to explain base_frequency attribute. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-10-16 10:33:39 +02:00
Rafael J. Wysocki	649f53a3e4	Documentation: intel_pstate: Describe hwp_dynamic_boost sysfs knob Document the recently introduced hwp_dynamic_boost sysfs knob allowing user space to tell intel_pstate to use iowait boosting in the active mode with HWP enabled (to improve performance). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2018-06-27 13:02:06 +02:00
Rafael J. Wysocki	9e421b8fff	Documentation: admin-guide: intel_pstate: Fix sysfs path Fix an incorrect sysfs path in the intel_pstate admin-guide documentation. Fixes: `33fc30b470` (cpufreq: intel_pstate: Document the current behavior and user interface) Reported-by: Pawit Pornkitprasan <p.pawit@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-06-27 13:02:06 +02:00
Rafael J. Wysocki	7a0f9d1eb5	Documentation: intel_pstate: Fix typo Fix a typo in the intel_pstate admin-guide documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-06-21 00:35:19 +02:00
Juri Lelli	13610c9348	PM: docs: intel_pstate: fix Active Mode w/o HWP paragraph P-state selection algorithm (powersave or performance) is selected by echoing the desired choice to scaling_governor sysfs attribute and not to scaling_cur_freq (as currently stated). Fix it. Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-05-09 12:16:44 +02:00
Jonathan Neuschäfer	c72a0ded8d	PM: docs: sleep-states: Fix a typo ("includig") Fix a typo in admin-guide/pm/sleep-states.rst. Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-05-09 12:15:13 +02:00
Rafael J. Wysocki	4afbce7b39	Merge branch 'pm-docs' * pm-docs: PM: docs: Delete the obsolete states.txt document PM: docs: Describe high-level PM strategies and sleep states	2017-09-04 00:07:27 +02:00
Rafael J. Wysocki	ab271bc95b	Merge branch 'intel_pstate' * intel_pstate: cpufreq: intel_pstate: Shorten a couple of long names cpufreq: intel_pstate: Simplify intel_pstate_adjust_pstate() cpufreq: intel_pstate: Improve IO performance with per-core P-states cpufreq: intel_pstate: Drop INTEL_PSTATE_HWP_SAMPLING_INTERVAL cpufreq: intel_pstate: Drop ->update_util from pstate_funcs cpufreq: intel_pstate: Do not use PID-based P-state selection	2017-09-04 00:05:42 +02:00
Rafael J. Wysocki	bd87c8fb9d	Merge branch 'pm-cpufreq' * pm-cpufreq: (33 commits) cpufreq: imx6q: Fix imx6sx low frequency support cpufreq: speedstep-lib: make several arrays static, makes code smaller cpufreq: ti: Fix 'of_node_put' being called twice in error handling path cpufreq: dt-platdev: Drop few entries from whitelist cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2 ARM: ux500: don't select CPUFREQ_DT cpufreq: Convert to using %pOF instead of full_name cpufreq: Cap the default transition delay value to 10 ms cpufreq: dbx500: Delete obsolete driver mfd: db8500-prcmu: Get rid of cpufreq dependency cpufreq: enable the DT cpufreq driver on the Ux500 cpufreq: Loongson2: constify platform_device_id cpufreq: dt: Add r8a7796 support to to use generic cpufreq driver cpufreq: remove setting of policy->cpu in policy->cpus during init cpufreq: mediatek: add support of cpufreq to MT7622 SoC cpufreq: mediatek: add cleanups with the more generic naming cpufreq: rcar: Add support for R8A7795 SoC cpufreq: dt: Add rk3328 compatible to use generic cpufreq driver cpufreq: s5pv210: add missing of_node_put() cpufreq: Allow dynamic switching with CPUFREQ_ETERNAL latency ...	2017-09-04 00:05:13 +02:00
Rafael J. Wysocki	0c0b6b7bc4	PM: docs: Describe high-level PM strategies and sleep states Reorganize the power management part of admin-guide by adding a description of major power management strategies supported by the kernel (system-wide and working-state power management) to it and dividing the rest of the material into the system-wide PM and working-state PM chapters. On top of that, add a description of system sleep states to the system-wide PM chapter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>	2017-08-29 00:15:32 +02:00

1 2

58 Commits