Pull perf updates from Ingo Molnar:
"Main kernel side changes:
- Big reorganization of the x86 perf support code. The old code grew
organically deep inside arch/x86/kernel/cpu/perf* and its naming
became somewhat messy.
The new location is under arch/x86/events/, using the following
cleaner hierarchy of source code files:
perf/x86: Move perf_event.c .................. => x86/events/core.c
perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c
(Borislav Petkov)
- Update various x86 PMU constraint and hw support details (Stephane
Eranian)
- Optimize kprobes for BPF execution (Martin KaFai Lau)
- Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
Gleixner)
- Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)
- Various fixes and smaller cleanups.
There are lots of perf tooling updates as well. A few highlights:
perf report/top:
- Hierarchy histogram mode for 'perf top' and 'perf report',
showing multiple levels, one per --sort entry: (Namhyung Kim)
On a mostly idle system:
# perf top --hierarchy -s comm,dso
Then expand some levels and use 'P' to take a snapshot:
# cat perf.hist.0
- 92.32% perf
58.20% perf
22.29% libc-2.22.so
5.97% [kernel]
4.18% libelf-0.165.so
1.69% [unknown]
- 4.71% qemu-system-x86
3.10% [kernel]
1.60% qemu-system-x86_64 (deleted)
+ 2.97% swapper
#
- Add 'L' hotkey to dynamicly set the percent threshold for
histogram entries and callchains, i.e. dynamicly do what the
--percent-limit command line option to 'top' and 'report' does.
(Namhyung Kim)
perf mem:
- Allow specifying events via -e in 'perf mem record', also listing
what events can be specified via 'perf mem record -e list' (Jiri
Olsa)
perf record:
- Add 'perf record' --all-user/--all-kernel options, so that one
can tell that all the events in the command line should be
restricted to the user or kernel levels (Jiri Olsa), i.e.:
perf record -e cycles:u,instructions:u
is equivalent to:
perf record --all-user -e cycles,instructions
- Make 'perf record' collect CPU cache info in the perf.data file header:
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
$ perf report --header-only -I | tail -10 | head -8
# CPU cache info:
# L1 Data 32K [0-1]
# L1 Instruction 32K [0-1]
# L1 Data 32K [2-3]
# L1 Instruction 32K [2-3]
# L2 Unified 256K [0-1]
# L2 Unified 256K [2-3]
# L3 Unified 4096K [0-3]
Will be used in 'perf c2c' and eventually in 'perf diff' to
allow, for instance running the same workload in multiple
machines and then when using 'diff' show the hardware difference.
(Jiri Olsa)
- Improved support for Java, using the JVMTI agent library to do
jitdumps that then will be inserted in synthesized
PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
ELF files stored in ~/.debug and keyed with build-ids, to allow
symbol resolution and even annotation with source line info, see
the changeset comments to see how to use it (Stephane Eranian)
perf script/trace:
- Decode data_src values (e.g. perf.data files generated by 'perf
mem record') in 'perf script': (Jiri Olsa)
# perf script
perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- Improve support to 'data_src', 'weight' and 'addr' fields in
'perf script' (Jiri Olsa)
- Handle empty print fmts in 'perf script -s' i.e. when running
python or perl scripts (Taeung Song)
perf stat:
- 'perf stat' now shows shadow metrics (insn per cycle, etc) in
interval mode too. E.g:
# perf stat -I 1000 -e instructions,cycles sleep 1
# time counts unit events
1.000215928 519,620 instructions # 0.69 insn per cycle
1.000215928 752,003 cycles
<SNIP>
- Port 'perf kvm stat' to PowerPC (Hemant Kumar)
- Implement CSV metrics output in 'perf stat' (Andi Kleen)
perf BPF support:
- Support converting data from bpf events in 'perf data' (Wang Nan)
- Print bpf-output events in 'perf script': (Wang Nan).
# perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
# perf script
usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
0008: 42 50 46 20 65 76 65 6e BPF even
0010: 74 21 00 00 t!..
BPF string: "Raise a BPF event!"
#
- Add API to set values of map entries in a BPF object, be it
individual map slots or ranges (Wang Nan)
- Introduce support for the 'bpf-output' event (Wang Nan)
- Add glue to read perf events in a BPF program (Wang Nan)
- Improve support for bpf-output events in 'perf trace' (Wang Nan)
... and tons of other changes as well - see the shortlog and git log
for details!"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
perf stat: Add --metric-only support for -A
perf stat: Implement --metric-only mode
perf stat: Document CSV format in manpage
perf hists browser: Check sort keys before hot key actions
perf hists browser: Allow thread filtering for comm sort key
perf tools: Add sort__has_comm variable
perf tools: Recalc total periods using top-level entries in hierarchy
perf tools: Remove nr_sort_keys field
perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
perf tools: Remove hist_entry->fmt field
perf tools: Fix command line filters in hierarchy mode
perf tools: Add more sort entry check functions
perf tools: Fix hist_entry__filter() for hierarchy
perf jitdump: Build only on supported archs
tools lib traceevent: Add '~' operation within arg_num_eval()
perf tools: Omit unnecessary cast in perf_pmu__parse_scale
perf tools: Pass perf_hpp_list all the way through setup_sort_list
perf tools: Fix perf script python database export crash
perf jitdump: DWARF is also needed
perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
...
Pull read-only kernel memory updates from Ingo Molnar:
"This tree adds two (security related) enhancements to the kernel's
handling of read-only kernel memory:
- extend read-only kernel memory to a new class of formerly writable
kernel data: 'post-init read-only memory' via the __ro_after_init
attribute, and mark the ARM and x86 vDSO as such read-only memory.
This kind of attribute can be used for data that requires a once
per bootup initialization sequence, but is otherwise never modified
after that point.
This feature was based on the work by PaX Team and Brad Spengler.
(by Kees Cook, the ARM vDSO bits by David Brown.)
- make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
Kconfig option. This simplifies the kernel and also signals that
read-only memory is the default model and a first-class citizen.
(Kees Cook)"
* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
Pull locking changes from Ingo Molnar:
"Various updates:
- Futex scalability improvements: remove page lock use for shared
futex get_futex_key(), which speeds up 'perf bench futex hash'
benchmarks by over 40% on a 60-core Westmere. This makes anon-mem
shared futexes perform close to private futexes. (Mel Gorman)
- lockdep hash collision detection and fix (Alfredo Alvarez
Fernandez)
- lockdep testing enhancements (Alfredo Alvarez Fernandez)
- robustify lockdep init by using hlists (Andrew Morton, Andrey
Ryabinin)
- mutex and csd_lock micro-optimizations (Davidlohr Bueso)
- small x86 barriers tweaks (Michael S Tsirkin)
- qspinlock updates (Waiman Long)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
locking/csd_lock: Explicitly inline csd_lock*() helpers
futex: Replace barrier() in unqueue_me() with READ_ONCE()
locking/lockdep: Detect chain_key collisions
locking/lockdep: Prevent chain_key collisions
tools/lib/lockdep: Fix link creation warning
tools/lib/lockdep: Add tests for AA and ABBA locking
tools/lib/lockdep: Add userspace version of READ_ONCE()
tools/lib/lockdep: Fix the build on recent kernels
locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
locking/mutex: Allow next waiter lockless wakeup
locking/pvqspinlock: Enable slowpath locking count tracking
locking/qspinlock: Use smp_cond_acquire() in pending code
locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
locking/mcs: Fix mcs_spin_lock() ordering
futex: Remove requirement for lock_page() in get_futex_key()
futex: Rename barrier references in ordering guarantees
locking/atomics: Update comment about READ_ONCE() and structures
locking/lockdep: Eliminate lockdep_init()
locking/lockdep: Convert hash tables to hlists
...
Pull ram resource handling changes from Ingo Molnar:
"Core kernel resource handling changes to support NVDIMM error
injection.
This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
for System RAM while keeping the current IORESOURCE_MEM type bit set
for all memory-mapped ranges (including System RAM) for backward
compatibility.
With this resource flag it no longer takes a strcmp() loop through the
resource tree to find "System RAM" resources.
The new resource type is then used to extend ACPI/APEI error injection
facility to also support NVDIMM"
* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ACPI/EINJ: Allow memory error injection to NVDIMM
resource: Kill walk_iomem_res()
x86/kexec: Remove walk_iomem_res() call with GART type
x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
resource: Add walk_iomem_res_desc()
memremap: Change region_intersects() to take @flags and @desc
arm/samsung: Change s3c_pm_run_res() to use System RAM type
resource: Change walk_system_ram() to use System RAM type
drivers: Initialize resource entry to zero
xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
ia64: Set System RAM type and descriptor
x86/e820: Set System RAM type and descriptor
resource: Add I/O resource descriptor
resource: Handle resource flags properly
resource: Add System RAM resource type
i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.
So after we made eagerfpu the default for all CPU types:
58122bf1d8 x86/fpu: Default eagerfpu=on on all CPUs
these old FPU designs broke. First, Andy Shevchenko reported a splat:
WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160
which was us trying to execute FXRSTOR on those machines even though
they don't support it.
After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.
Take care of all that.
Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:
394db20ca2 ("x86/fpu: Disable AVX when eagerfpu is off")
... is incorrect and that the original behavior of the FPU code was correct.
Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:
Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:
'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
by the new task.'
Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
Specification:
'AVX instructions refer to exceptions by classes that include #NM
"Device Not Available" exception for lazy context switch.'
So revert the commit.
Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After fixing FPU option parsing, we now parse the 'no387' boot option
too early: no387 clears X86_FEATURE_FPU before it's even probed, so
the boot CPU promptly re-enables it.
I suspect it gets even more confused on SMP.
Fix the probing code to leave X86_FEATURE_FPU off if it's been
disabled by setup_clear_cpu_cap().
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Fixes: 4f81cbafcc ("x86/fpu: Fix early FPU command-line parsing")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ignore_nmis is used in two distinct places:
1. modified through {stop,restart}_nmi by alternative_instructions
2. read by do_nmi to determine if default_do_nmi should be called or not
thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath.
Signed-off-by: Kostenzer Felix <fkostenzer@live.at>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.
To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.
Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which
- calculates the number of possible packages depending on nr_cpu_ids after boot
- makes sure that the package id is within the number of possible packages. If
the apic id is outside we map it to a logical package id if there is enough
space available.
Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.
This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that all functionality has been moved to arch/x86/events/, move the
perf_event.h header and adjust include paths.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
struct is freed, but the percpu pointer still references it. Set it to NULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Also, keep the churn at minimum by adjusting the include "perf_event.h"
when each file gets moved.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1454947748-28629-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Use the more current logging style pr_<level>(...) instead of the old
printk(KERN_<LEVEL> ...).
- Convert pr_warning() to pr_warn().
Signed-off-by: Chen Yucong <slaoub@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454384702-21707-1-git-send-email-slaoub@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Most of the magic numbers in x86_capability[] have been converted to
'enum cpuid_leafs', and this patch updates the remaining part.
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes from Thomas Gleixner:
"A bit on the largish side due to a series of fixes for a regression in
the x86 vector management which was introduced in 4.3. This work was
started in December already, but it took some time to fix all corner
cases and a couple of older bugs in that area which were detected
while at it
Aside of that a few platform updates for intel-mid, quark and UV and
two fixes for in the mm code:
- Use proper types for pgprot values to avoid truncation
- Prevent a size truncation in the pageattr code when setting page
attributes for large mappings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86/mm/pat: Avoid truncation when converting cpa->numpages to address
x86/mm: Fix types used in pgprot cacheability flags translations
x86/platform/quark: Print boundaries correctly
x86/platform/UV: Remove EFI memmap quirk for UV2+
x86/platform/intel-mid: Join string and fix SoC name
x86/platform/intel-mid: Enable 64-bit build
x86/irq: Plug vector cleanup race
x86/irq: Call irq_force_move_complete with irq descriptor
x86/irq: Remove outgoing CPU from vector cleanup mask
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
x86/irq: Clear move_in_progress before sending cleanup IPI
x86/irq: Remove offline cpus from vector cleanup
x86/irq: Get rid of code duplication
x86/irq: Copy vectormask instead of an AND operation
x86/irq: Check vector allocation early
x86/irq: Reorganize the search in assign_irq_vector
x86/irq: Reorganize the return path in assign_irq_vector
x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
x86/irq: Validate that irq descriptor is still active
x86/irq: Fix a race in x86_vector_free_irqs()
...
Pull perf fixes from Thomas Gleixner:
"This is much bigger than typical fixes, but Peter found a category of
races that spurred more fixes and more debugging enhancements. Work
started before the merge window, but got finished only now.
Aside of that this contains the usual small fixes to perf and tools.
Nothing particular exciting"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
perf: Remove/simplify lockdep annotation
perf: Synchronously clean up child events
perf: Untangle 'owner' confusion
perf: Add flags argument to perf_remove_from_context()
perf: Clean up sync_child_event()
perf: Robustify event->owner usage and SMP ordering
perf: Fix STATE_EXIT usage
perf: Update locking order
perf: Remove __free_event()
perf/bpf: Convert perf_event_array to use struct file
perf: Fix NULL deref
perf/x86: De-obfuscate code
perf/x86: Fix uninitialized value usage
perf: Fix race in perf_event_exit_task_context()
perf: Fix orphan hole
perf stat: Do not clean event's private stats
perf hists: Fix HISTC_MEM_DCACHELINE width setting
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf tests: Remove wrong semicolon in while loop in CQM test
perf: Synchronously free aux pages in case of allocation failure
...
There is no longer any driver inserting a "GART" region in the
kernel since
707d4eefbd ("Revert "[PATCH] Insert GART region into resource map"").
Remove the call to walk_iomem_res() with "GART" type, its
callback function, and GART-specific variables set by the
callback.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chun-Yi <joeyli.kernel@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lee, Chun-Yi <joeyli.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: kexec@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/1453841853-11383-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the callers of walk_iomem_res() scanning for the
following resources by name to use walk_iomem_res_desc()
instead.
"ACPI Tables"
"ACPI Non-volatile Storage"
"Persistent Memory (legacy)"
"Crash kernel"
Note, the caller of walk_iomem_res() with "GART" will be removed
in a later patch.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chun-Yi <joeyli.kernel@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Lee, Chun-Yi <joeyli.kernel@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Minfei Huang <mnfhuang@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kexec@lists.infradead.org
Cc: linux-arch@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change e820_reserve_resources() to set 'flags' and 'desc' from
e820 types.
Set E820_RESERVED_KERN and E820_RAM's (System RAM) io resource
type to IORESOURCE_SYSTEM_RAM.
Do the same for "Kernel data", "Kernel code", and "Kernel bss",
which are child nodes of System RAM.
I/O resource descriptor is set to 'desc' for entries that are
(and will be) target ranges of walk_iomem_res() and
region_intersects().
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm <linux-mm@kvack.org>
Link: http://lkml.kernel.org/r/1453841853-11383-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit:
f8e617f458 ("sched/idle/x86: Optimize unnecessary mwait_idle() resched IPIs")
adds memory barriers around clflush(), but this seems wrong for UP since
barrier() has no effect on clflush(). We really want MFENCE, so switch
to mb() instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization <virtualization@lists.linux-foundation.org>
Link: http://lkml.kernel.org/r/1453921746-16178-5-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
initialize alt_idx and then use this uninitialized value to index an
array.
When that is not fatal, it can result in an infinite loop in its
caller __intel_shared_reg_get_constraints(), with IRQs disabled.
Alternative error modes are random memory corruption due to the
cpuc->shared_regs->regs[] array overrun, which manifest in either
get_constraints or put_constraints doing weird stuff.
Only took 6 hours of painful debugging to find this. Neither GCC nor
Smatch warnings flagged this bug.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: ae3f011fc2 ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch enables the uncore_imc PMU for Intel
SkyLake Desktop processors (Core i7-6700, model 94).
It is possible to compute memory read/write bandwidth
using:
$ perf stat -a -e uncore_imc/data_reads/,uncore_imc/data_writes/ ....
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: kan.liang@intel.com
Link: http://lkml.kernel.org/r/1452151546-8853-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the stuff currently only used by the kexec file code within
CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG).
Also move internal "struct kexec_sha_region" and "struct kexec_buf" into
"kexec_internal.h".
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Young <dyoung@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>