Commit Graph

280 Commits

Author SHA1 Message Date
Len Brown
8a5bdf41d2 tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL
turbostat --debug
...
CPUID(0x15): eax_crystal: 2 ebx_tsc: 100 ecx_crystal_hz: 0
TSC: 1200 MHz (24000000 Hz * 100 / 2 / 1000000)

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:52 -04:00
Andrey Semin
40ee8e3b9d tools/power turbostat: correct DRAM RAPL units on recent Xeon processors
While not yet documented in the Software Developer's Manual,
the data-sheet for modern Xeon states that DRAM RAPL ENERGY units
are fixed at 15.3 uJ, rather than being discovered via MSR.

Before this patch, DRAM energy on these products is over-stated by turbostat
because the RAPL units are 4x larger.

ref: "Xeon E5-2600 v3/E5-1600 v3 Datasheet Volume 2"
http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Signed-off-by: Andrey Semin <andrey.semin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:52 -04:00
Len Brown
0b2bb6925e tools/power turbostat: Initial Skylake support
Skylake adds some additional residency counters.

Skylake supports a different mix of RAPL registers
from any previous product.

In most other ways, Skylake is like Broadwell.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Thomas D
f82263c698 tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
Since commit ee0778a301
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using

  [...]
  BUILD_OUTPUT    := $(PWD)
  [...]

which obviously causes trouble when building "turbostat" with

  make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat

because GNU make does not update nor guarantee that $PWD is set.

This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).

Link: https://bugs.gentoo.org/show_bug.cgi?id=533918
Fixes: ee0778a301 ("tools/power: turbostat: make Makefile a bit more capable")
Signed-off-by: Thomas D. <whissi@whissi.de>
Cc: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Len Brown
a21d38c846 tools/power turbostat: modprobe msr, if needed
Some distros (Ubuntu) ship the msr driver as a module.
If turbosat is run as root on those systems, and discovers
that there is no /dev/cpu/cpu0/msr, it will now "modprobe msr"
for the user.

If not root, the modprobe attempt will fail, and turbostat will exit as before:

turbostat: no /dev/cpu/0/msr, Try "# modprobe msr" : No such file or directory

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:51 -04:00
Len Brown
fcd17211bd tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2
and up to 18 cores of turbo ratio limit
when using the turbostat --debug option.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:50 -04:00
Len Brown
12bb43c615 tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names
s/MSR_NHM_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT/
s/MSR_IVT_TURBO_RATIO_LIMIT/MSR_TURBO_RATIO_LIMIT1/

syntax only -- use the documented strings describing these registers.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-18 14:20:50 -04:00
Len Brown
8f61f3598d tools/power turbostat: label base frequency
syntax only.

The cool kids are now using the phrase "base frequency",
where in the past we used "max non-turbo frequency" or "TSC frequency".

This distinction becomes important when a processor has a TSC
that runs at a different speed than the "base frequency".

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-13 15:52:54 -04:00
Len Brown
e33cbe852d tools/power turbostat: update PERF_LIMIT_REASONS decoding
cosmetic only.

order the decoding of MSR_PERF_LIMIT_REASONS bits
from MSB to LSB -- which you notice when more than 1 bit is set
and you are, say, comparing the output to the documentation...

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-13 15:52:54 -04:00
Len Brown
1cc21f7b6b tools/power turbostat: simplify default output
Casual turbostat users generally just want to know MHz.
So by default, just print enough information to make sense of MHz.

All the other configuration data and columns for C-states and temperature etc,
are printed with the --debug option.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-04-13 15:52:54 -04:00
Len Brown
48a0631c89 tools/power turbostat: support additional Broadwell model
Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-10 15:59:53 -05:00
Len Brown
d8af6f5f0f tools/power turbostat: update parameters, documentation
Long format options added, though the short ones should still work.
eg. the new "--Counter 0x10" is the same as the old "-C 0x10"

Note this Incompatibility:
Old:
-v displayed verbose debug output

New:
-v and --version simpaly display version

Additional parameters:
-d and --debug display verbose debug output
-h and --help display a help message

Updated turbosat.8 man page accordingly.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-10 01:56:38 -05:00
Len Brown
ee7e38e3d8 tools/power turbostat: Skip printing disabled package C-states
Replaced previously open-coded Package C-state Limit decoding
with table-driven decoding.  In doing so, updated to match January 2015
"Intel(R) 64 and IA-23 Architectures Software Developer's Manual".

In the past, turbostat would print package C-state residency columns
for all package states supported by the model's architecture, even though
a particular SKU may not support them, or they may be disabled by the BIOS.
Now turbostat will skip printing colunns if MSRs indicate that they are not enabled.
eg. many SKUs don't support PC7, and so that column will no longer be printed.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 23:39:45 -05:00
Len Brown
a729617c58 tools/power turbostat: relax dependency on APERF_MSR
While turbostat is significantly less useful on systems
with no APERF_MSR, it seems more friendly
to run on such systems and report what we can,
rather than refusing to run.

Update man page to reflect recent changes.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 18:28:18 -05:00
Len Brown
d789944753 tools/power turbostat: relax dependency on invariant TSC
Turbostat can be useful on systems that do not support invariant TSC,
so allow it to run on those systgems.

All arithmetic in turbostat using the TSC value is per-processsor,
so it does not depend on the TSC values being in sync acrosss processors.

Turbostat uses gettimeofday() for the measurement interval
rather than using the TSC directly, so that key metric
is also immune from variable TSC.

Turbostat prints a TSC sanity check column:

TSC_MHz = TSC_delta/interval

If this column is constant and is close to the processor
base frequency, then the TSC is behaving properly.

The other key turbostat columns are calculated this way:

Avg_Mhz = APERF_delta/interval

%Busy = MPERF_delta/TSC_delta

Bzy_MHz = TSC_delta/APERF_delta/MPERF_delta/interval

Tested on Core2 and Core2-Xeon, and so this patch includes
a few other changes to remove the assumption that target
systems are Nehalem and newer.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 18:28:08 -05:00
Len Brown
3a9a941d0b tools/power turbostat: decode MSR_*_PERF_LIMIT_REASONS
The Processor generation code-named Haswell
added MSR_{CORE | GFX | RING}_PERF_LIMIT_REASONS
to explain when and how the processor limits frequency.

turbostat -v
will now decode these bits.

Each MSR has an "Active" set of bits which describe
current conditions, and a "Logged" set of bits,
which describe what has happened since last cleared.

Turbostat currently doesn't clear the log bits.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 16:44:24 -05:00
Len Brown
98481e79b6 tools/power turbostat: relax dependency on root permission
For turbostat to run as non-root, it needs to permissions:

1. read access to /dev/cpu/*/msr
	via standard user/group/world file permissions

2. CAP_SYS_RAWIO
	eg.  # setcap cap_sys_rawio=ep turbostat

Yes, running as root still works.

Signed-off-by: Len Brown <len.brown@intel.com>
2015-02-09 16:41:16 -05:00
Len Brown
e7c95ff32d tools/power turbostat: tweak whitespace in output format
turbostat -S
output was off by 1 space before this patch.

Signed-off-by: Len Brown <len.brown@intel.com>
2014-08-15 17:34:44 -04:00
Jean Delvare
3482124a6a tools / power: turbostat: Drop temperature checks
The Intel 64 and IA-32 Architectures Software Developer's Manual says
that TjMax is stored in bits 23:16 of MSR_TEMPERATURE TARGET (0x1a2).
That's 8 bits, not 7, so it must be masked with 0xFF rather than 0x7F.

The manual has no mention of which values should be considered valid,
which kind of implies that they all are. Arbitrarily discarding values
outside a specific range is wrong. The upper range check had to be
fixed recently (commit 144b44b1) and the lower range check is just as
wrong. See bug #75071:

https://bugzilla.kernel.org/show_bug.cgi?id=75071

There are many Xeon processor series with TjMax of 70, 71 or 80
degrees Celsius, way below the arbitrary 85 degrees Celsius limit.
There may be other (past or future) models with even lower limits.

So drop this arbitrary check. The only value that would be clearly
invalid is 0. Everything else should be accepted.

After these changes, turbostat is aligned with what the coretemp
driver does.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Cc: Len Brown <len.brown@intel.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-07 00:14:46 +02:00
Len Brown
4e8e863fed tools/power turbostat: Run on Broadwell
Signed-off-by: Len Brown <len.brown@intel.com>
2014-03-05 22:20:02 -05:00
Len Brown
fc04cc67ea tools/power turbostat: simplify output, add Avg_MHz
Use 8 columns for each number ouput.
We don't fit into 80 columns on most machines,
so keep the format simple.

Print frequency in MHz instead of GHz.
We've got 8 columns now, so use them to
show low frequency in a more natural unit.

Many users didn't understand what %c0 meant,
so re-name it to be %Busy.

Add Avg_MHz column, which is the frequency that many
users expect to see -- the total number of cycles executed
over the measurement interval.

People found the previous GHz to be confusing, since
it was the speed only over the non-idle interval.
That measurement has been re-named Bzy_MHz.

Suggested-by: Dirk J. Brandewie
Signed-off-by: Len Brown <len.brown@intel.com>
2014-03-05 22:19:55 -05:00
Andy Shevchenko
3b4d5c7fec tools/power turbostat: introduce -s to dump counters
The new option allows just run turbostat and get dump of counter values. It's
useful when we have something more than one program to test.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-02-01 15:24:28 -05:00
Andy Shevchenko
f591c38b91 tools/power turbostat: remove unused command line option
The -s is not used, let's remove it, and update quick help accordingly.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-02-01 15:22:31 -05:00
Dirk Brandewie
5c56be9a25 turbostat: Add option to report joules consumed per sample
Add "-J" option to report energy consumed in joules per sample.  This option
also adds the sample time to the reported values.

Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:32 -05:00
Len Brown
e6f9bb3cc6 turbostat: run on HSX
Haswell Xeon has slightly different RAPL support than client HSW,
which prevented the previous version of turbostat from running on HSX.

Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:22 -05:00
Josh Triplett
7ade7f48b1 turbostat: Add a .gitignore to ignore the compiled turbostat binary
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:10 -05:00
Josh Triplett
b2c95d90a7 turbostat: Clean up error handling; disambiguate error messages; use err and errx
Most of turbostat's error handling consists of printing an error (often
including an errno) and exiting.  Since perror doesn't support a format
string, those error messages are often ambiguous, such as just showing a
file path, which doesn't uniquely identify which call failed.

turbostat already uses _GNU_SOURCE, so switch to the err and errx
functions from err.h, which take a format string.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:10 -05:00
Josh Triplett
57a42a34d1 turbostat: Factor out common function to open file and exit on failure
Several different functions in turbostat contain the same pattern of
opening a file and exiting on failure.  Factor out a common fopen_or_die
function for that.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:09 -05:00
Josh Triplett
95aebc44e7 turbostat: Add a helper to parse a single int out of a file
Many different chunks of code in turbostat open a file, parse a single
int out of it, and close it.  Factor that out into a common function.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:09 -05:00
Josh Triplett
7482341976 turbostat: Check return value of fscanf
Some systems declare fscanf with the warn_unused_result attribute.  On
such systems, turbostat generates the following warnings:

turbostat.c: In function 'get_core_id':
turbostat.c:1203:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'get_physical_package_id':
turbostat.c:1186:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_core_in_package':
turbostat.c:1169:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]
turbostat.c: In function 'cpu_is_first_sibling_in_core':
turbostat.c:1148:8: warning: ignoring return value of 'fscanf', declared with attribute warn_unused_result [-Wunused-result]

Fix these by checking the return value of those four calls to fscanf.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:09 -05:00
Josh Triplett
2b92865e64 turbostat: Use GCC's CPUID functions to support PIC
turbostat uses inline assembly to call cpuid.  On 32-bit x86, on systems
that have certain security features enabled by default that make -fPIC
the default, this causes a build error:

turbostat.c: In function ‘check_cpuid’:
turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’
  asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx");
  ^

GCC provides a header cpuid.h, containing a __get_cpuid function that
works with both PIC and non-PIC.  (On PIC, it saves and restores ebx
around the cpuid instruction.)  Use that instead.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:08 -05:00
Josh Triplett
2e9c6bc7fb turbostat: Don't attempt to printf an off_t with %zx
turbostat uses the format %zx to print an off_t.  However, %zx wants a
size_t, not an off_t.  On 32-bit targets, those refer to different
types, potentially even with different sizes.  Use %llx and a cast
instead, since printf does not have a length modifier for off_t.

Without this patch, when compiling for a 32-bit target:

turbostat.c: In function 'get_msr':
turbostat.c:231:3: warning: format '%zx' expects argument of type 'size_t', but argument 4 has type 'off_t' [-Wformat]

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:08 -05:00
Josh Triplett
b731f3119d turbostat: Don't put unprocessed uapi headers in the include path
turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so
that it can include <asm/msr.h> from it.  It isn't in general safe to
include even uapi headers directly from the kernel tree without
processing them through scripts/headers_install.sh, but asm/msr.h
happens to work.

However, that include path can break with some versions of system
headers, by overriding some system headers with the unprocessed versions
directly from the kernel source.  For instance:

In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0,
                 from /build/x86-generic/usr/include/signal.h:339,
                 from /build/x86-generic/usr/include/sys/wait.h:31,
                 from turbostat.c:27:
../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory

This occurs because the system bits/sigcontext.h on that build system
includes <asm/sigcontext.h>, and asm/sigcontext.h in the kernel source
includes <linux/compiler.h>, which scripts/headers_install.sh would have
filtered out.

Since turbostat really only wants a single header, just include that one
header rather than putting an entire directory of kernel headers on the
include path.

In the process, switch from msr.h to msr-index.h, since turbostat just
wants the MSR numbers.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: stable@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
2014-01-18 22:34:07 -05:00
Len Brown
144b44b135 tools / power turbostat: Support Silvermont
Support the next generation Intel Atom processor
mirco-architecture, formerly called Silvermont.

The server version, formerly called "Avoton",
is named the "Intel(R) Atom(TM) Processor C2000 Product Family".

The client version, formerly called "Bay Trail",
is named the "Intel Atom Processor Z3000 Series",
as well as various "Intel Pentium Processor"
and "Intel Celeron Processor" brands, depending
on form-factor.

Silvermont has a set of MSRs not far off from NHM,
but the RAPL register set is a sub-set of those previously supported.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-11-12 23:16:02 +01:00
Josh Triplett
b844db3187 turbostat: Increase output buffer size to accommodate C8-C10
On platforms with C8-C10 support, the additional C-states cause
turbostat to overrun its output buffer of 128 bytes per CPU.  Increase
this to 256 bytes per CPU.

[ As a bugfix, this should go into 3.10; however, since the C8-C10
  support didn't go in until after 3.9, this need not go into any stable
  kernel. ]

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-06-13 09:55:56 -07:00
Kristen Carlson Accardi
ca58710f3a tools/power turbostat: display C8, C9, C10 residency
Display residency in the new C-states, C8, C9, C10.

C8, C9, C10 are present on some:
"Fourth Generation Intel(R) Core(TM) Processors",
which are based on Intel(R) microarchitecture code name Haswell.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2013-04-17 19:23:26 -04:00
Len Brown
149c2319c6 tools/power turbostat: additional Haswell CPU-id
There is an additional HSW CPU-id, 0x46,
which has C-states exactly like CPU-id 0x45.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-03-15 11:05:26 -04:00
Len Brown
1ed51011af tools/power turbostat: display SMI count by default
The SMI counter is popular -- so display it by default
rather than requiring an option.  What the heck,
we've blown the 80 column budget on many systems already...

Note that the value displayed is the delta
during the measurement interval.
The absolute value of the counter can still be seen with
the generic 32-bit MSR option, ie.  -m 0x34

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-13 18:22:12 -05:00
Len Brown
6792041834 tools/power turbostat: decode MSR_IA32_POWER_CTL
When verbose is enabled, print the C1E-Enable
bit in MSR_IA32_POWER_CTL.

also delete some redundant tests on the verbose variable.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-08 19:26:16 -05:00
Len Brown
70b43400bc tools/power turbostat: support Haswell
This patch enables turbostat to run properly on the
next-generation Intel(R) Microarchitecture, code named "Haswell" (HSW).

HSW supports the BCLK and counters found in SNB.

Signed-off-by: Len Brown <len.brown@intel.com>
2013-02-08 19:25:57 -05:00
Linus Torvalds
6842d98de7 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull powertool update from Len Brown:
 "This updates the tree w/ the latest version of turbostat, which
  reports temperature and - on SNB and later - Watts."

Fix up semantic merge conflict as per Len.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools: Allow tools to be installed in a user specified location
  tools/power: turbostat: make Makefile a bit more capable
  tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
  tools/power turbostat: v3.0: monitor Watts and Temperature
  tools/power turbostat: fix output buffering issue
  tools/power turbostat: prevent infinite loop on migration error path
  x86 power: define RAPL MSRs
  tools/power/x86/turbostat: share kernel MSR #defines
2012-12-18 12:34:29 -08:00
Josh Boyer
55f1f545f7 tools: Allow tools to be installed in a user specified location
When building x86_energy_perf_policy or turbostat within the confines of
a packaging system such as RPM, we need to be able to have it install to
the buildroot and not the root filesystem of the build machine.  This
adds a DESTDIR variable that when set will act as a prefix for the
install location of these tools.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-30 01:09:45 -05:00
Mark Asselstine
ee0778a301 tools/power: turbostat: make Makefile a bit more capable
The turbostat Makefile is pretty simple, its output is placed in the
same directory as the source, the install rule has no concept of a
prefix or sysroot, and you can set CC to use a specific compiler but
not use the more familiar CROSS_COMPILE. By making a few minor changes
these limitations are removed while leaving the default behavior
matching what it used to be.

Example build with these changes:
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp install

or from the tools directory
make CROSS_COMPILE=i686-wrs-linux-gnu- DESTDIR=/tmp turbostat_install

Signed-off-by: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-30 01:09:45 -05:00
Colin Ian King
84764a415c tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu()
Instead of returning out of for_every_cpu() we should break out of the loop=
 which will then tidy up correctly by closing the file /proc/stat.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-30 01:09:44 -05:00
Len Brown
889facbee3 tools/power turbostat: v3.0: monitor Watts and Temperature
Show power in Watts and temperature in Celsius
when hardware support is present.

Intel's Sandy Bridge and Ivy Bridge processor generations support RAPL
(Run-Time-Average-Power-Limiting).  Per the Intel SDM
(Intel® 64 and IA-32 Architectures Software Developer Manual)
RAPL provides hardware energy counters and power control MSRs
(Model Specific Registers).  RAPL MSRs are designed primarily
as a method to implement power capping.  However, they are useful
for monitoring system power whether or not power capping is used.

In addition, Turbostat now shows temperature from DTS
(Digital Thermal Sensor) and PTM (Package Thermal Monitor) hardware,
if present.

As before, turbostat reads MSRs, and never writes MSRs.

New columns are present in turbostat output:

The Pkg_W column shows Watts for each package (socket) in the system.
On multi-socket systems, the system summary on the 1st row shows the sum
for all sockets together.

The Cor_W column shows Watts due to processors cores.
Note that Core_W is included in Pkg_W.

The optional GFX_W column shows Watts due to the graphics "un-core".
Note that GFX_W is included in Pkg_W.

The optional RAM_W column on server processors shows Watts due to DRAM DIMMS.
As DRAM DIMMs are outside the processor package, RAM_W is not included in Pkg_W.

The optional PKG_% and RAM_% columns on server processors shows the % of time
in the measurement interval that RAPL power limiting is in effect on the
package and on DRAM.

Note that the RAPL energy counters have some limitations.

First, hardware updates the counters about once every milli-second.
This is fine for typical turbostat measurement intervals > 1 sec.
However, when turbostat is used to measure events that approach
1ms, the counters are less useful.

Second, the 32-bit energy counters are subject to wrapping.
For example, a counter incrementing 15 micro-Joule units
on a 130 Watt TDP server processor could (in theory)
roll over in about 9 minutes.  Turbostat detects and handles
up to 1 counter overflow per measurement interval.
But when the measurement interval exceeds the guaranteed
counter range, we can't detect if more than 1 overflow occured.
So in this case turbostat indicates that the results are
in question by replacing the fractional part of the Watts
in the output with "**":

Pkg_W  Cor_W GFX_W
  3**    0**   0**

Third, the RAPL counters are energy (Joule) counters -- they sum up
weighted events in the package to estimate energy consumed.  They are
not analong power (Watt) meters.  In practice, they tend to under-count
because they don't cover every possible use of energy in the package.
The accuracy of the RAPL counters will vary between product generations,
and between SKU's in the same product generation, and with temperature.

turbostat's -v (verbose) option now displays more power and thermal configuration
information -- as shown on the turbostat.8 manual page.
For example, it now displays the Package and DRAM Thermal Design Power (TDP):

cpu0: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu0: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)
cpu8: MSR_PKG_POWER_INFO: 0x2f064001980410 (130 W TDP, RAPL 51 - 200 W, 0.045898 sec.)
cpu8: MSR_DRAM_POWER_INFO,: 0x28025800780118 (35 W TDP, RAPL 15 - 75 W, 0.039062 sec.)

Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-30 01:09:44 -05:00
Len Brown
ddac0d6872 tools/power turbostat: fix output buffering issue
In periodic mode, turbostat writes to stdout,
but users were un-able to re-direct stdout, eg.

turbostat > outputfile

would result in an empty outputfile.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-30 01:09:43 -05:00
Len Brown
e52966c084 tools/power turbostat: prevent infinite loop on migration error path
Turbostat assumed if it can't migrate to a CPU, then the CPU
must have gone off-line and turbostat should re-initialize
with the new topology.

But if turbostat can not migrate because it is restricted by
a cpuset, then it will fail to migrate even after re-initialization,
resulting in an infinite loop.

Spit out a warning when we can't migrate
and endure only 2 re-initialize cycles in a row
before giving up and exiting.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-27 00:03:06 -05:00
Len Brown
9c63a650bb tools/power/x86/turbostat: share kernel MSR #defines
Now that turbostat is built in the kernel tree,
it can share MSR #defines with the kernel.

Signed-off-by: Len Brown <len.brown@intel.com>
Cc: x86@kernel.org
2012-11-23 21:40:04 -05:00
Len Brown
d91bb17c2a tools/power turbostat: graceful fail on garbage input
When invald MSR's are specified on the command line,
turbostat should simply print an error and exit.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-01 00:22:00 -04:00
Len Brown
39300ffb9b tools/power turbostat: Repair Segmentation fault when using -i option
Fix regression caused by commit 8e180f3cb6
(tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter
deltas)

Signed-off-by: Len Brown <len.brown@intel.com>
2012-11-01 00:21:43 -04:00
Len Brown
f9240813e6 tools/power/turbostat: add option to count SMIs, re-name some options
Counting SMIs is popular, so add a dedicated "-s" option to do it,
and juggle some of the other option letters.

-S is now system summary (was -s)
-c is 32 bit counter (was -d)
-C is 64-bit counter (was -D)
-p is 1st thread in core (was -c)
-P is 1st thread in package (was -p)

bump the minor version number

Signed-off-by: Len Brown <len.brown@intel.com>
2012-10-06 15:26:31 -04:00
Len Brown
8e180f3cb6 tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas
# turbostat -d 0x34
is useful for printing the number of SMI's within an interval
on Nehalem and newer processors.

where
 # turbostat -m 0x34
will simply print out the total SMI count since reset.

Suggested-by: Andi Kleen
Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-27 22:04:56 -04:00
Len Brown
2f32edf12c tools/power turbostat: add [-m MSR#] option
-m MSR# prints the specified MSR in 32-bit format
-M MSR# prints the specified MSR in 64-bit format

Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-26 18:17:21 -04:00
Len Brown
130ff304f6 tools/power turbostat: make -M output pretty
The -M option dumps the specified 64-bit MSR with every sample.

Previously it was output at the end of each line.
However, with the v2 style of printing, the lines are now staggered,
making MSR output hard to read.

So move the MSR output column to the left where things are aligned.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-26 18:17:21 -04:00
Len Brown
6574a5d505 tools/power turbostat: print more turbo-limit information
The "turbo-limit" is the maximum opportunistic processor
speed, assuming no electrical or thermal constraints.
For a given processor, the turbo-limit varies, depending
on the number of active cores.  Generally, there is more
opportunity when fewer cores are active.

Under the "-v" verbose option, turbostat would
print the turbo-limits for the four cases
of 1 to 4 cores active.

Expand that capability to cover the cases of turbo
opportunities with up to 16 cores active.

Note that not all hardware platforms supply this information,
and that sometimes a valid limit may be specified for
a core which is not actually present.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-26 18:15:48 -04:00
Len Brown
d7db690165 tools/power turbostat: delete unused line
MSR_TSC is no longer needed because
we now use RDTSC directly.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-26 18:11:48 -04:00
Len Brown
1300651b40 tools/power turbostat: run on IVB Xeon
This fix is required to run on IVB Xeon,
which previously had an incorrect cpuid model number listed.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-09-26 18:11:31 -04:00
Len Brown
c3ae331d1c tools/power: turbostat: fix large c1% issue
Under some conditions, c1% was displayed as very large number,
much higher than 100%.

c1% is not measured, it is derived as "that, which is left over"
from other counters.  However, the other counters are not collected
atomically, and so it is possible for c1% to be calaculagted as
a small negative number -- displayed as very large positive.

There was a check for mperf vs tsc for this already,
but it needed to also include the other counters
that are used to calculate c1.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-07-19 22:26:33 -04:00
Len Brown
c98d5d9444 tools/power: turbostat v2 - re-write for efficiency
Measuring large profoundly-idle configurations
requires turbostat to be more lightweight.
Otherwise, the operation of turbostat itself
can interfere with the measurements.

This re-write makes turbostat topology aware.
Hardware is accessed in "topology order".
Redundant hardware accesses are deleted.
Redundant output is deleted.
Also, output is buffered and
local RDTSC use replaces remote MSR access for TSC.

From a feature point of view, the output
looks different since redundant figures are absent.
Also, there are now -c and -p options -- to restrict
output to the 1st thread in each core, and the 1st
thread in each package, respectively.  This is helpful
to reduce output on big systems, where more detail
than the "-s" system summary is desired.
Finally, periodic mode output is now on stdout, not stderr.

Turbostat v2 is also slightly more robust in
handling run-time CPU online/offline events,
as it now checks the actual map of on-line cpus rather
than just the total number of on-line cpus.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-07-19 22:26:14 -04:00
Len Brown
650a37f32d tools/power turbostat: fix IVB support
Initial IVB support went into turbostat in Linux-3.1:
553575f1ae
(tools turbostat: recognize and run properly on IVB)

However, when running on IVB, turbostat would fail
to report the new couters added with SNB, c7, pc2 and pc7.
So in scenarios where these counters are non-zero on IVB,
turbostat would report erroneous residencey results.

In particular c7 time would be added to c1 time,
since c1 time is calculated as "that which is left over".

Also, turbostat reports MHz capabilities when passed
the "-v" option, and it would incorrectly report 133MHz
bclk instead of 100MHz bclk for IVB, which would inflate
GHz reported with that option.

This patch is a backport of a fix already included in turbostat v2.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-03 23:47:49 -04:00
Len Brown
d15cf7c129 tools/power turbostat: fix un-intended affinity of forked program
Linux 3.4 included a modification to turbostat to
lower cross-call overhead by using scheduler affinity:

15aaa34654
(tools turbostat: reduce measurement overhead due to IPIs)

In the use-case where turbostat forks a child program,
that change had the un-intended side-effect of binding
the child to the last cpu in the system.

This change removed the binding before forking the child.

This is a back-port of a fix already included in turbostat v2.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-06-03 23:24:00 -04:00
Len Brown
15aaa34654 tools turbostat: harden against cpu online/offline
Sometimes users have turbostat running in interval mode
when they take processors offline/online.

Previously, turbostat would survive, but not gracefully.

Tighten up the error checking so turbostat notices
changesn sooner, and print just 1 line on change:

turbostat: re-initialized with num_cpus %d

Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-29 22:27:19 -04:00
Len Brown
88c3281f7b tools turbostat: reduce measurement overhead due to IPIs
turbostat uses /dev/cpu/*/msr interface to read MSRs.
For modern systems, it reads 10 MSR/CPU.  This can
be observed as 10 "Function Call Interrupts"
per CPU per sample added to /proc/interrupts.

This overhead is measurable on large idle systems,
and as Yoquan Song pointed out, it can even trick
cpuidle into thinking the system is busy.

Here turbostat re-schedules itself in-turn to each
CPU so that its MSR reads will always be local.
This replaces the 10 "Function Call Interrupts"
with a single "Rescheduling interrupt" per sample
per CPU.

On an idle 32-CPU system, this shifts some residency from
the shallow c1 state to the deeper c7 state:

 # ./turbostat.old -s
   %c0  GHz  TSC    %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7
  0.27 1.29 2.29   0.95   0.02   0.00  98.77  20.23   0.00  77.41   0.00
  0.25 1.24 2.29   0.98   0.02   0.00  98.75  20.34   0.03  77.74   0.00
  0.27 1.22 2.29   0.54   0.00   0.00  99.18  20.64   0.00  77.70   0.00
  0.26 1.22 2.29   1.22   0.00   0.00  98.52  20.22   0.00  77.74   0.00
  0.26 1.38 2.29   0.78   0.02   0.00  98.95  20.51   0.05  77.56   0.00
^C
 i# ./turbostat.new -s
   %c0  GHz  TSC    %c1    %c3    %c6    %c7   %pc2   %pc3   %pc6   %pc7
  0.27 1.20 2.29   0.24   0.01   0.00  99.49  20.58   0.00  78.20   0.00
  0.27 1.22 2.29   0.25   0.00   0.00  99.48  20.79   0.00  77.85   0.00
  0.27 1.20 2.29   0.25   0.02   0.00  99.46  20.71   0.03  77.89   0.00
  0.28 1.26 2.29   0.25   0.01   0.00  99.46  20.89   0.02  77.67   0.00
  0.27 1.20 2.29   0.24   0.01   0.00  99.48  20.65   0.00  78.04   0.00

cc: Youquan Song <youquan.song@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-29 22:04:58 -04:00
Len Brown
e23da0370f tools turbostat: add summary option
turbostat -s
cuts down on the amount of output, per user request.

also treak some output whitespace and the man page.

Signed-off-by: Len Brown <len.brown@intel.com>
2012-03-29 13:22:06 -04:00
Linus Torvalds
507a03c1cb Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
This includes initial support for the recently published ACPI 5.0 spec.
In particular, support for the "hardware-reduced" bit that eliminates
the dependency on legacy hardware.

APEI has patches resulting from testing on real hardware.

Plus other random fixes.

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits)
  acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec
  intel_idle: Split up and provide per CPU initialization func
  ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2
  ACPI processor: Remove unneeded cpuidle_unregister_driver call
  intel idle: Make idle driver more robust
  intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle
  ACPI: kernel-parameters.txt : Add intel_idle.max_cstate
  intel_idle: remove redundant local_irq_disable() call
  ACPI processor: Fix error path, also remove sysdev link
  ACPI: processor: fix acpi_get_cpuid for UP processor
  intel_idle: fix API misuse
  ACPI APEI: Convert atomicio routines
  ACPI: Export interfaces for ioremapping/iounmapping ACPI registers
  ACPI: Fix possible alignment issues with GAS 'address' references
  ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64)
  ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64)
  ACPI: Store SRAT table revision
  ACPI, APEI, Resolve false conflict between ACPI NVS and APEI
  ACPI, Record ACPI NVS regions
  ACPI, APEI, EINJ, Refine the fix of resource conflict
  ...
2012-01-18 15:51:48 -08:00
Len Brown
79ba0db69c Merge branches 'einj', 'intel_idle', 'misc', 'srat' and 'turbostat-ivb' into release 2012-01-18 01:15:54 -05:00
Arun Thomas
9b6cf1a012 tools/power turbostat: update fields in manpage
Field names were shortened: "pkg" is now "pk", "core" is now "cr"

Signed-off-by: Arun Thomas <arun.thomas@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-15 16:34:20 +01:00
Len Brown
553575f1ae tools turbostat: recognize and run properly on IVB
Signed-off-by: Len Brown <len.brown@intel.com>
2011-11-18 03:32:01 -05:00
Len Brown
efb90582c5 Merge branches 'acpi', 'idle', 'mrst-pmu' and 'pm-tools' into next 2011-11-06 22:14:50 -05:00
Len Brown
d30c4b7a87 tools/power turbostat: fit output into 80 columns on snb-ep
Reduce columns for package number to 1.
If you can afford more than 9 packages,
you can also afford a terminal with more than 80 columns:-)

Also shave a column also off the package C-states

Signed-off-by: Len Brown <len.brown@intel.com>
2011-08-02 18:33:31 -04:00
Len Brown
e4c0d0e22c tools/power x86_energy_perf_policy: fix print of uninitialized string
Looks like I was going to stick the brand string
in the verbose ouput, but didn't get around to it.

Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-15 23:39:00 -04:00
Len Brown
aeae1e92da tools/power turbostat: less verbose debugging
dump only the counters which are active

Signed-off-by: Len Brown <len.brown@intel.com>
2011-07-03 21:41:33 -04:00
Jiri Kosina
07f9479a40 Merge branch 'master' into for-next
Fast-forwarded to current state of Linus' tree as there are patches to be
applied for files that didn't exist on the old branch.
2011-04-26 10:22:59 +02:00
Justin P. Mattock
6eab04a876 treewide: remove extra semicolons
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-04-10 17:01:05 +02:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Len Brown
a829eb4d7e tools: turbostat: style updates
Follow kernel coding style traditions more closely.
Delete typedef, re-name "per cpu counters" to
simply be counters etc.

This patch changes no functionality.

Suggested-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-10 23:58:13 -05:00
Thomas Renninger
8209e054b6 tools: turbostat: fix bitwise and operand
bug could cause false positive on indicating
presence of invarient TSC or APERF support.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2011-02-10 23:58:11 -05:00
Len Brown
eca0bdd326 Merge branches 'turbostat' and 'x86_energy_perf_policy' into tools 2011-01-11 23:06:28 -05:00
Len Brown
d5532ee7b4 tools: create power/x86/x86_energy_perf_policy
MSR_IA32_ENERGY_PERF_BIAS first became available on Westmere Xeon.
It is implemented in all Sandy Bridge processors -- mobile, desktop and server.
It is expected to become increasingly important in subsequent generations.

x86_energy_perf_policy is a user-space utility to set the
hardware energy vs performance policy hint in the processor.
Most systems would benefit from "x86_energy_perf_policy normal"
at system startup, as the hardware default is maximum performance
at the expense of energy efficiency.

See x86_energy_perf_policy.8 man page for more information.

Background:

Linux-2.6.36 added "epb" to /proc/cpuinfo to indicate
if an x86 processor supports MSR_IA32_ENERGY_PERF_BIAS,
without actually modifying the MSR.

In March, 2010, Venkatesh Pallipadi proposed a small driver
that programmed MSR_IA32_ENERGY_PERF_BIAS, based on
the cpufreq governor in use.  It also offered
a boot-time cmdline option to override.
http://lkml.org/lkml/2010/3/4/457
But hiding the hardware policy behind the
governor choice was deemed "kinda icky".

In June, 2010, I proposed a generic user/kernel API to
generalize the power/performance policy trade-off.
"RFC: /sys/power/policy_preference"
http://lkml.org/lkml/2010/6/16/399
That is my preference for implementing this capability,
but I received no support on the list.

So in September, 2010, I sent x86_energy_perf_policy.c to LKML,
a user-space utility that scribbles directly to the MSR.
http://lkml.org/lkml/2010/9/28/246

Here is that same utility, after responding to some review feedback,
to live in tools/power/, where it is easily found.

Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-11 23:02:21 -05:00
Len Brown
103a8fea9b tools: create power/x86/turbostat
turbostat is a Linux tool to observe proper operation
of Intel(R) Turbo Boost Technology.

turbostat displays the actual processor frequency
on x86 processors that include APERF and MPERF MSRs.

Note that turbostat is of limited utility on Linux
kernels 2.6.29 and older, as acpi_cpufreq cleared
APERF/MPERF up through that release.

On Intel Core i3/i5/i7 (Nehalem) and newer processors,
turbostat also displays residency in idle power saving states,
which are necessary for diagnosing any cpuidle issues
that may have an effect on turbo-mode.

See the turbostat.8 man page for example usage.

Signed-off-by: Len Brown <len.brown@intel.com>
2011-01-11 22:46:02 -05:00