Commit Graph

2992 Commits

Author SHA1 Message Date
Kan Liang
687805e4a6 perf/x86/intel: Filter branches for PEBS event
For supporting Intel LBR branches filtering, Intel LBR sharing logic
mechanism is introduced from commit b36817e886 ("perf/x86: Add Intel
LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to
config lbr_sel, which is finally used to set LBR_SELECT.

However, the intel_shared_regs_constraints() function is called after
intel_pebs_constraints(). The PEBS event will return immediately after
intel_pebs_constraints(). So it's impossible to filter branches for PEBS
events.

This patch moves intel_shared_regs_constraints() ahead of
intel_pebs_constraints().

We can safely do that because the intel_shared_regs_constraints() function
only returns empty constraint if its rejecting the event, otherwise it
returns NULL such that we continue calling intel_pebs_constraints() and
x86_get_event_constraint().

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-02 17:07:42 +02:00
Ingo Molnar
55474c48b4 x86/asm/entry: Remove user_mode_ignore_vm86()
user_mode_ignore_vm86() can be used instead of user_mode(), in
places where we have already done a v8086_mode() security
check of ptregs.

But doing this check in the wrong place would be a bug that
could result in security problems, and also the naming still
isn't very clear.

Furthermore, it only affects 32-bit kernels, while most
development happens on 64-bit kernels.

If we replace them with user_mode() checks then the cost is only
a very minor increase in various slowpaths:

   text             data   bss     dec              hex    filename
   10573391         703562 1753042 13029995         c6d26b vmlinux.o.before
   10573423         703562 1753042 13030027         c6d28b vmlinux.o.after

So lets get rid of this distinction once and for all.

Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150329090233.GA1963@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 11:45:19 +02:00
Hector Marco-Gisbert
4e26d11f52 x86/mm: Improve AMD Bulldozer ASLR workaround
The ASLR implementation needs to special-case AMD F15h processors by
clearing out bits [14:12] of the virtual address in order to avoid I$
cross invalidations and thus performance penalty for certain workloads.
For details, see:

  dfb09f9b7a ("x86, amd: Avoid cache aliasing penalties on AMD family 15h")

This special case reduces the mmapped file's entropy by 3 bits.

The following output is the run on an AMD Opteron 62xx class CPU
processor under x86_64 Linux 4.0.0:

  $ for i in `seq 1 10`; do cat /proc/self/maps | grep "r-xp.*libc" ; done
  b7588000-b7736000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b7570000-b771e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b75d0000-b777e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b75b0000-b775e000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  b7578000-b7726000 r-xp 00000000 00:01 4924       /lib/i386-linux-gnu/libc.so.6
  ...

Bits [12:14] are always 0, i.e. the address always ends in 0x8000 or
0x0000.

32-bit systems, as in the example above, are especially sensitive
to this issue because 32-bit randomness for VA space is 8 bits (see
mmap_rnd()). With the Bulldozer special case, this diminishes to only 32
different slots of mmap virtual addresses.

This patch randomizes per boot the three affected bits rather than
setting them to zero. Since all the shared pages have the same value
at bits [12..14], there is no cache aliasing problems. This value gets
generated during system boot and it is thus not known to a potential
remote attacker. Therefore, the impact from the Bulldozer workaround
gets diminished and ASLR randomness increased.

More details at:

  http://hmarco.org/bugs/AMD-Bulldozer-linux-ASLR-weakness-reducing-mmaped-files-by-eight.html

Original white paper by AMD dealing with the issue:

  http://developer.amd.com/wordpress/media/2012/10/SharedL1InstructionCacheonAMD15hCPU.pdf

Mentored-by: Ismael Ripoll <iripoll@disca.upv.es>
Signed-off-by: Hector Marco-Gisbert <hecmargi@upv.es>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan-Simon <dl9pf@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-fsdevel@vger.kernel.org
Link: http://lkml.kernel.org/r/1427456301-3764-1-git-send-email-hecmargi@upv.es
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 10:01:17 +02:00
Michael S. Tsirkin
46423ffaf4 x86/microcode/amd: Drop the pci_ids.h dependency
This file doesn't use any macros from pci_ids.h anymore, drop the include.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427635734-24786-80-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-31 09:54:32 +02:00
Denys Vlasenko
487d1edb9a x86/asm/entry/64: Fix comment about SYSENTER MSRs
The comment is ancient, it dates to the time when only AMD's
x86_64 implementation existed. AMD wasn't (and still isn't)
supporting SYSENTER, so these writes were "just in case" back
then.

This has changed: Intel's x86_64 appeared, and Intel does
support SYSENTER in long mode. "Some future 64-bit CPU" is here
already.

The code may appear "buggy" for AMD as it stands, since
MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a
kernel function's address to it would drop high bits. Subsequent
use of this MSR for branch via SYSENTER seem to allow user to
transition to CPL0 while executing his code. Scary, eh?

Explain why that is not a bug: because SYSENTER insn would not
work on AMD CPU.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 12:23:16 +01:00
Peter Zijlstra
34f439278c perf: Add per event clockid support
While thinking on the whole clock discussion it occurred to me we have
two distinct uses of time:

 1) the tracking of event/ctx/cgroup enabled/running/stopped times
    which includes the self-monitoring support in struct
    perf_event_mmap_page.

 2) the actual timestamps visible in the data records.

And we've been conflating them.

The first is all about tracking time deltas, nobody should really care
in what time base that happens, its all relative information, as long
as its internally consistent it works.

The second however is what people are worried about when having to
merge their data with external sources. And here we have the
discussion on MONOTONIC vs MONOTONIC_RAW etc..

Where MONOTONIC is good for correlating between machines (static
offset), MONOTNIC_RAW is required for correlating against a fixed rate
hardware clock.

This means configurability; now 1) makes that hard because it needs to
be internally consistent across groups of unrelated events; which is
why we had to have a global perf_clock().

However, for 2) it doesn't really matter, perf itself doesn't care
what it writes into the buffer.

The below patch makes the distinction between these two cases by
adding perf_event_clock() which is used for the second case. It
further makes this configurable on a per-event basis, but adds a few
sanity checks such that we cannot combine events with different clocks
in confusing ways.

And since we then have per-event configurability we might as well
retain the 'legacy' behaviour as a default.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 10:13:22 +01:00
David Ahern
9332d250b4 perf/x86: Remove redundant calls to perf_pmu_{dis|en}able()
perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called
afterwards. No need to call these inside of x86_pmu_add() as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:49:44 +01:00
Ingo Molnar
936c663aed Merge branch 'perf/x86' into perf/core, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:19 +01:00
Ingo Molnar
072e5a1cfa Merge branch 'perf/urgent' into perf/core, to pick up fixes and to refresh the tree
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:46:03 +01:00
Andi Kleen
294fe0f52a perf/x86/intel: Add INST_RETIRED.ALL workarounds
On Broadwell INST_RETIRED.ALL cannot be used with any period
that doesn't have the lowest 6 bits cleared. And the period
should not be smaller than 128.

This is erratum BDM11 and BDM55:

  http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf

BDM11: When using a period < 100; we may get incorrect PEBS/PMI
interrupts and/or an invalid counter state.
BDM55: When bit0-5 of the period are !0 we may get redundant PEBS
records on overflow.

Add a new callback to enforce this, and set it for Broadwell.

How does this handle the case when an app requests a specific
period with some of the bottom bits set?

Short answer:

Any useful instruction sampling period needs to be 4-6 orders
of magnitude larger than 128, as an PMI every 128 instructions
would instantly overwhelm the system and be throttled.
So the +-64 error from this is really small compared to the
period, much smaller than normal system jitter.

Long answer (by Peterz):

IFF we guarantee perf_event_attr::sample_period >= 128.

Suppose we start out with sample_period=192; then we'll set period_left
to 192, we'll end up with left = 128 (we truncate the lower bits). We
get an interrupt, find that period_left = 64 (>0 so we return 0 and
don't get an overflow handler), up that to 128. Then we trigger again,
at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
an overflow). We increment with sample_period so we get left = 128. We
fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
overflow). And on and on.

So while the individual interrupts are 'wrong' we get then with
interval=256,128 in exactly the right ratio to average out at 192. And
this works for everything >=128.

So the num_samples*fixed_period thing is still entirely correct +- 127,
which is good enough I'd say, as you already have that error anyhow.

So no need to 'fix' the tools, al we need to do is refuse to create
INST_RETIRED:ALL events with sample_period < 128.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Updated comments and changelog a bit. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:03 +01:00
Andi Kleen
91f1b70582 perf/x86/intel: Add Broadwell core support
Add Broadwell support for Broadwell to perf.

The basic support is very similar to Haswell. We use the new cache
event list added for Haswell earlier. The only differences
are a few bits related to remote nodes. To avoid an extra,
mostly identical, table these are patched up in the initialization code.

The constraint list has one new event that needs to be handled over Haswell.

Includes code and testing from Kan Liang.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:02 +01:00
Andi Kleen
0f1b5ca240 perf/x86/intel: Add new cache events table for Haswell
Haswell offcore events are quite different from Sandy Bridge.
Add a new table to handle Haswell properly.

Note that the offcore bits listed in the SDM are not quite correct
(this is currently being fixed). An uptodate list of bits is
in the patch.

The basic setup is similar to Sandy Bridge. The prefetch columns
have been removed, as prefetch counting is not very reliable
on Haswell. One L1 event that is not in the event list anymore
has been also removed.

- data reads do not include code reads (comparable to earlier Sandy Bridge tables)
- data counts include speculative execution (except L1 write, dtlb, bpu)
- remote node access includes both remote memory, remote cache, remote mmio.
- prefetches are not included in the counts for consistency
  (different from Sandy Bridge, which includes prefetches in the remote node)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
[ Removed the HSM30 comments; we don't have them for SNB/IVB either. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-27 09:14:01 +01:00
Ingo Molnar
d56fe4bf5f x86/asm/entry/64: Always set up SYSENTER MSRs
On CONFIG_IA32_EMULATION=y kernels we set up
MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION
kernels we leave them unchanged.

Clear them to make sure the instruction is disabled properly.

SYSCALL is set up properly in both cases.

Acked-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 20:57:25 +01:00
Denys Vlasenko
ef593260f0 x86/asm/entry: Get rid of KERNEL_STACK_OFFSET
PER_CPU_VAR(kernel_stack) was set up in a way where it points
five stack slots below the top of stack.

Presumably, it was done to avoid one "sub $5*8,%rsp"
in syscall/sysenter code paths, where iret frame needs to be
created by hand.

Ironically, none of them benefits from this optimization,
since all of them need to allocate additional data on stack
(struct pt_regs), so they still have to perform subtraction.

This patch eliminates KERNEL_STACK_OFFSET.

PER_CPU_VAR(kernel_stack) now points directly to top of stack.
pt_regs allocations are adjusted to allocate iret frame as well.
Hopefully we can merge it later with 32-bit specific
PER_CPU_VAR(cpu_current_top_of_stack) variable...

Net result in generated code is that constants in several insns
are changed.

This change is necessary for changing struct pt_regs creation
in SYSCALL64 code path from MOV to PUSH instructions.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 19:42:38 +01:00
Aravind Gopalakrishnan
43eaa2a1ad x86/mce: Define mce_severity function pointer
Rename mce_severity() to mce_severity_intel() and assign the
mce_severity function pointer to mce_severity_amd() during init on AMD.
This way, we can avoid a test to call mce_severity_amd every time we get
into mce_severity(). And it's cleaner to do it this way.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Suggested-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1427125373-2918-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:14:15 +01:00
Aravind Gopalakrishnan
bf80bbd7dc x86/mce: Add an AMD severities-grading function
Add a severities function that caters to AMD processors. This allows us
to do some vendor-specific work within the function if necessary.

Also, introduce a vendor flag bitfield for vendor-specific settings. The
severities code uses this to define error scope based on the prescence
of the flags field.

This is based off of work by Boris Petkov.

Testing details:
Fam10h, Model 9h (Greyhound)
Fam15h: Models 0h-0fh (Orochi), 30h-3fh (Kaveri) and 60h-6fh (Carrizo),
Fam16h Model 00h-0fh (Kabini)

Boris:
Intel SNB
AMD K8 (JH-E0)

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1427125373-2918-2-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Fixup build, clean up comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-24 12:13:34 +01:00
Denys Vlasenko
a76c7f4604 x86/asm/entry/64: Fold syscall32_cpu_init() into its sole user
Having syscall32/sysenter32 initialization in a separate tiny
function, called from within a function that is already syscall
init specific, serves no real purpose.

Its existense also caused an unintended effect of having
wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy
function returning -ENOSYS, and immediately after
(if CONFIG_IA32_EMULATION), we set it to point to the proper
syscall32 entry point, ia32_cstar_target.

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-24 08:20:51 +01:00
Andy Lutomirski
383f3af3f8 x86/asm/entry, perf: Explicitly optimize vm86 handling in code_segment_base()
There's no point in checking the VM bit on 64-bit, and, since
we're explicitly checking it, we can use user_mode_ignore_vm86()
after the check.

While we're at it, rearrange the #ifdef slightly to make the code
flow a bit clearer.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/dc1457a734feccd03a19bb3538a7648582f57cdd.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 11:13:41 +01:00
Peter Zijlstra
50f16a8bf9 perf: Remove type specific target pointers
The only reason CQM had to use a hard-coded pmu type was so it could use
cqm_target in hw_perf_event.

Do away with the {tp,bp,cqm}_target pointers and provide a non type
specific one.

This allows us to do away with that silly pmu type as well.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Vince Weaver <vince@deater.net>
Cc: acme@kernel.org
Cc: acme@redhat.com
Cc: hpa@zytor.com
Cc: jolsa@redhat.com
Cc: kanaka.d.juvva@intel.com
Cc: matt.fleming@intel.com
Cc: tglx@linutronix.de
Cc: torvalds@linux-foundation.org
Cc: vikas.shivappa@linux.intel.com
Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:04 +01:00
Matt Fleming
4e16ed9941 perf/x86/intel: Fix Makefile to actually build the cqm driver
Someone fat fingered a merge conflict and lost the Makefile hunk.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <acme@redhat.com>
Cc: <hpa@zytor.com>
Cc: <jolsa@redhat.com>
Cc: <kanaka.d.juvva@intel.com>
Cc: <tglx@linutronix.de>
Cc: <torvalds@linux-foundation.org>
Cc: <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1424976420.15321.35.camel@mfleming-mobl1.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:58:03 +01:00
Sudeep Holla
37dea8c52c x86/cpu/cacheinfo: Fix cache_get_priv_group() for Intel processors
The private pointer provided by the cacheinfo code is used to implement
the AMD L3 cache-specific attributes using a pointer to the northbridge
descriptor. It is needed for performing L3-specific operations and for
that we need a couple of PCI devices and other service information, all
contained in the northbridge descriptor.

This results in failure of cacheinfo setup as shown below as
cache_get_priv_group() returns the uninitialised private attributes which
are not valid for Intel processors.

  ------------[ cut here ]------------
  WARNING: CPU: 3 PID: 1 at fs/sysfs/group.c:102
  internal_create_group+0x151/0x280()
  sysfs: (bin_)attrs not set by subsystem for group: index3/
  Modules linked in:
  CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.0.0-rc3+ #1
  Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
  ...
  Call Trace:
    dump_stack
    warn_slowpath_common
    warn_slowpath_fmt
    internal_create_group
    sysfs_create_groups
    device_add
    cpu_device_create
    ? __kmalloc
    cache_add_dev
    cacheinfo_sysfs_init
    ? container_dev_init
    do_one_initcall
    kernel_init_freeable
    ? rest_init
    kernel_init
    ret_from_fork
    ? rest_init

This patch fixes the issue by checking if the L3 cache indices are
populated correctly (AMD-specific) before initializing the private
attributes.

Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:22:38 +01:00
Borislav Petkov
c9ce871283 x86/mce: Reindent __mcheck_cpu_apply_quirks() properly
Had some strange 3 tabs + 2 chars indentation, probably from me. Fix it.

No code changed:

  # arch/x86/kernel/cpu/mcheck/mce.o:

   text    data     bss     dec     hex filename
  21371    5923     264   27558    6ba6 mce.o.before
  21371    5923     264   27558    6ba6 mce.o.after

md5:
   eb3996c84d15e08ed836f043df2cbb01  mce.o.before.asm
   eb3996c84d15e08ed836f043df2cbb01  mce.o.after.asm

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:44 +01:00
Jesse Larrew
f77ac507f8 x86/mce: Use safe MSR accesses for AMD quirk
Certain MSRs are only relevant to a kernel in host mode, and kvm had
chosen not to implement these MSRs at all for guests. If a guest kernel
ever tried to access these MSRs, the result was a general protection
fault.

KVM will be separately patched to return 0 when these MSRs are read,
and this patch ensures that MSR accesses are tolerant of exceptions.

Signed-off-by: Jesse Larrew <jesse.larrew@amd.com>
[ Drop {} braces around loop ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joel Schopp <joel.schopp@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac@vger.kernel.org
Link: http://lkml.kernel.org/r/1426262619-5016-1-git-send-email-jesse.larrew@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:16:43 +01:00
Andy Lutomirski
c56716af8d x86/asm/entry, perf: Fix incorrect TIF_IA32 check in code_segment_base()
We want to check whether user code is in 32-bit mode, not
whether the task is nominally 32-bit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/33e5107085ce347a8303560302b15c2cadd62c4c.1426728647.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-23 10:08:21 +01:00
Ingo Molnar
8b6c0ab1a1 x86/asm/entry: Document and clean up the enable_sep_cpu() and syscall32_cpu_init() functions
Clean up the flow and document the functions a bit better.

Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Denys Vlasenko
d828c71fba x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" better
Before the patch, the 'tss_struct::stack' field was not referenced anywhere.

It was used only to set SYSENTER's stack to point after the last byte
of tss_struct, thus the trailing field, stack[64], was used.

But grep would not know it. You can comment it out, compile,
and kernel will even run until an unlucky NMI corrupts
io_bitmap[] (which is also not easily detectable).

This patch changes code so that the purpose and usage of this
field is not mysterious anymore, and can be easily grepped for.

This does change generated code, for a subtle reason:
since tss_struct is ____cacheline_aligned, there happens to be
5 longs of padding at the end. Old code was using the padding
too; new code will strictly use it only for SYSENTER_stack[].

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Drewry <wad@chromium.org>
Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-17 09:25:29 +01:00
Ingo Molnar
56544d29c3 Linux 4.0-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
 mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
 kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
 fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
 V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
 qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
 =Xkue
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc3' into x86/build, to refresh an older tree before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-13 14:21:04 +01:00
Sudeep Holla
0d55ba46bf x86/cacheinfo: Move cacheinfo sysfs code to generic infrastructure
This patch removes the redundant sysfs cacheinfo code by reusing
the newly introduced generic cacheinfo infrastructure through the
commit

  246246cbde ("drivers: base: support cpu cache information
		 interface to userspace via sysfs")

The private pointer provided by the cacheinfo is used to implement
the AMD L3 cache-specific attributes.

Note that with v4.0-rc1, commit

  513e3d2d11 ("cpumask: always use nr_cpu_ids in formatting and parsing
		 functions")

in particular changes from long format to shorter one for all cpumasks
sysfs entries. As the consequence of the same, even the shared_cpu_map
in the cacheinfo sysfs was also changed.

This patch neither alters any existing sysfs entries nor their
formating, however since the generic cacheinfo has switched to use the
device attributes instead of the traditional raw kobjects, a directory
named "power" along with its standard attributes are added similar to
any other device.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Link: http://lkml.kernel.org/r/1425470416-20691-1-git-send-email-sudeep.holla@arm.com
[ Add a check for uninitialized this_cpu_ci for the cpu_has_topoext case too
  in __cache_amd_cpumap_setup() ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-09 09:32:24 +01:00
Andy Lutomirski
a7fcf28d43 x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it on x86_32
I broke 32-bit kernels.  The implementation of sp0 was correct
as far as I can tell, but sp0 was much weirder on x86_32 than I
realized.  It has the following issues:

 - Init's sp0 is inconsistent with everything else's: non-init tasks
   are offset by 8 bytes.  (I have no idea why, and the comment is unhelpful.)

 - vm86 does crazy things to sp0.

Fix it up by replacing this_cpu_sp0() with
current_top_of_stack() and using a new percpu variable to track
the top of the stack on x86_32.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 75182b1632 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()")
Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-07 09:34:03 +01:00
Andy Lutomirski
24933b82c0 x86/asm/entry: Rename 'init_tss' to 'cpu_tss'
It has nothing to do with init -- there's only one TSS per cpu.

Other names considered include:

 - current_tss: Confusing because we never switch the tss.
 - singleton_tss: Too long.

This patch was generated with 's/init_tss/cpu_tss/g'.  Followup
patches will fix INIT_TSS and INIT_TSS_IST by hand.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-06 08:32:58 +01:00
Ingo Molnar
f8e92fb4b0 A more involved rework of the alternatives framework to be able to
pad instructions and thus make using the alternatives macros more
 straightforward and without having to figure out old and new instruction
 sizes but have the toolchain figure that out for us.
 
 Furthermore, it optimizes JMPs used so that fetch and decode can be
 relieved with smaller versions of the JMPs, where possible.
 
 Some stats:
 
 x86_64 defconfig:
 
 Alternatives sites total:               2478
 Total padding added (in Bytes):         6051
 
 The padding is currently done for:
 
 X86_FEATURE_ALWAYS
 X86_FEATURE_ERMS
 X86_FEATURE_LFENCE_RDTSC
 X86_FEATURE_MFENCE_RDTSC
 X86_FEATURE_SMAP
 
 This is with the latest version of the patchset. Of course, on each
 machine the alternatives sites actually being patched are a proper
 subset of the total number.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9ekpAAoJEBLB8Bhh3lVKyjYP/AiHEiHkkjnpwTt49kUtUMI6
 GIlGfJVNjp5LLnSRD/fkL/wdkBgQtMzr9O1g8Qi/lbFqxsOFteU9f1OtLx34ZwZw
 MhtdiHcrKGMsaIxTJh4FaqPHBT5ussm2yn1jlAX+LgILd3dpqe3oytsO8JihcK9j
 t2u9V/Lq92TV7zXxGgWJsPc86WhhgdldlU3X96S++Di18bnDaKbGkzthU6WzZG/H
 qtFZ5bfK8TlVHYduft+D9ZPzFYGp1WCOa03qU4+Djaxw02HDB6Ltysend9zg0lB1
 RT/BP0PwHD3mOL11qpgtV1ChCbR8FJMN/z5+YdSNJgzDQA0H5Sf0UueTweosfAz+
 /iC5t/wkegdYtqtA0nKVypYOJCS+UdfMZXenYgtSUJl6drB6I5BCW4mVft3AuWo+
 EilPGpblvmjWRx1HiF4/Q/5zrSWHzmKQDyXuyxI9m0OUxAGAM0+8CY6wOqRA5pX+
 /f5MjZ1hXELQGhl5Qdj4nqJacICGevJ8WYdZ53B+uYVxz7fbXk9hSYcZKT94UshD
 qSdaV4XJSuC7pDKqiWoNWXp5N1g+D2BgfwoQEr/RnodFZRlfc+cmOv/visak0OLr
 E/pp1vJvCi3+T3ImX1MCDiXmflQtFctiL3hNgMXYK2IGhJb2RDC2bFeZkksOHuAE
 BGgrn+usQDjVlikEnfI3
 =0KXp
 -----END PGP SIGNATURE-----

Merge tag 'alternatives_padding' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/asm

Pull alternative instructions framework improvements from Borislav Petkov:

 "A more involved rework of the alternatives framework to be able to
  pad instructions and thus make using the alternatives macros more
  straightforward and without having to figure out old and new instruction
  sizes but have the toolchain figure that out for us.

  Furthermore, it optimizes JMPs used so that fetch and decode can be
  relieved with smaller versions of the JMPs, where possible.

  Some stats:

    x86_64 defconfig:

    Alternatives sites total:               2478
    Total padding added (in Bytes):         6051

  The padding is currently done for:

    X86_FEATURE_ALWAYS
    X86_FEATURE_ERMS
    X86_FEATURE_LFENCE_RDTSC
    X86_FEATURE_MFENCE_RDTSC
    X86_FEATURE_SMAP

  This is with the latest version of the patchset. Of course, on each
  machine the alternatives sites actually being patched are a proper
  subset of the total number."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-04 06:36:15 +01:00
Ingo Molnar
25efdcb43c The first part of the scrubbing of the intel early microcode loader.
There's more work to come but let's unload this pile first.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU9LsjAAoJEBLB8Bhh3lVKVIMP/0xUfRb/wV8P+HtJ6St41G8/
 OygkO/D4UJcZiZP2xrxNix5/waExpegtEIqJPbO6Wlyq0N+0imCZxcsPsgmdw2Zo
 CQv6eu4p0on/v8EUFTJV9+ZHqs6zhch1tNQMfLuq+nXH7f8okSzbDL25RWoo8QU5
 qOOHhHwjyQzivC1KpEodwVte1nT/KLNFio3moRwONKM0/1xCBHyvK4us6QqifWow
 hsDnVNdoXqTzqhY43u7zxNcSzo/RMCq/4sc90augdCdZFAVbKzPGM0o0Pq5FQTmv
 MbMuGF80LhfMFln0tBv30IGFuEc54BBD/x3d7YYadyeX0jIE4T27Pe4xbVLoTDTM
 T8PnaNn/sUyiBUGYO5ff5niMwzpqnuwgKi3wSe4fJ0HIHVE/SAHQogoomP1EKb59
 n66RTWV5eE9KkzdZCTdXhm8aalLK8QfbSlElOrbqwr7/qfFslNpnNUzRwhaHoN1k
 kk5PJ8PipZR/YmWapIU7K6lEZQoRixAb+StMiOvX5n++d+Z7d2/Mu3UqebpqlvUc
 nVFpbPpB9FuH0XQPfICQEUvfWf+MTP9cPw5OkMu+Zo4ok8fIt7MdHCtQJVa9d0R/
 3ZMc9daDgwU8bEYoMRn6xHSGaUjtyO6AUeFhDM7b7If9YWDg59H7nufa5ySkM5KB
 xhWDhQYiSjhdvToKKF/e
 =Jlz9
 -----END PGP SIGNATURE-----

Merge tag 'intel_microcode_cleanup_p1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/microcode

Pull x86 microcode loader code cleanups from Borislav Petkov:

  "The first part of the scrubbing of the intel early microcode loader.
   There's more work to come but let's unload this pile first."

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-03-03 13:53:16 +01:00
Borislav Petkov
a858b5e504 x86/microcode/intel: Fix printing of microcode blobs in show_saved_mc()
When doing

  echo 1 > /sys/devices/system/cpu/microcode/reload

in order to reload microcode, I get:

  microcode: Total microcode saved: 1
  BUG: using smp_processor_id() in preemptible [00000000] code: bash/2606
  caller is debug_smp_processor_id+0x17/0x20
  CPU: 1 PID: 2606 Comm: bash Not tainted 3.19.0-rc7+ #9
  Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
   ffffffff81a4266d ffff8802131db808 ffffffff81666588 0000000000000007
   0000000000000001 ffff8802131db838 ffffffff812e6eef ffff8802131db868
   00000000000306a9 0000000000000010 0000000000000015 ffff8802131db848
  Call Trace:
   dump_stack
   check_preemption_disabled
   debug_smp_processor_id
   show_saved_mc
   ? save_microcode.constprop.8
   save_mc_for_early
   ? print_context_stack
   ? dump_trace
   ? __bfs
   ? mark_held_locks
   ? get_page_from_freelist
   ? trace_hardirqs_on_caller
   ? trace_hardirqs_on
   ? __alloc_pages_nodemask
   ? __get_vm_area_node
   ? map_vm_area
   ? __vmalloc_node_range
   ? generic_load_microcode
   generic_load_microcode
   ? microcode_fini_cpu
   request_microcode_fw
   reload_store
   dev_attr_store
   sysfs_kf_write
   kernfs_fop_write
   vfs_write
   ? sysret_check
   SyS_write
   system_call_fastpath
  microcode: CPU1: sig=0x306a9, pf=0x10, rev=0x15
  microcode: mc_saved[0]: sig=0x306a9, pf=0x12, rev=0x1b, toal size=0x3000, date = 2014-05-29

because we're using smp_processor_id() in preemtible context. And we
don't really need to use it there because the microcode container we're
dumping is global and CPU-specific info is irrelevant.

While at it, make pr_* stuff use "microcode: " prefix for easier
grepping and document how to enable the DEBUG build.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:34 +01:00
Borislav Petkov
4f1f605cfe x86/microcode/intel: Check scan_microcode()'s retval
... and do not attempt to load anything in case of error.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:20 +01:00
Borislav Petkov
140f74fced x86/microcode/intel: Sanitize microcode_pointer()
Shorten variable names and rename it to what it does.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:16 +01:00
Borislav Petkov
e3d8f67476 x86/microcode/intel: Move mc arg last in get_matching_{microcode|sig}
... arguments list so that it comes more natural for those functions to
have the signature, processor flags and revision together, before the
rest of the args.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:13 +01:00
Borislav Petkov
9e02bb46d3 x86/microcode/intel: Simplify generic_load_microcode_early()
* remove state variable and out label
* get rid of completely unused mc_size
* shorten variable names
* get rid of local variables
* don't do assignments in local var declarations for less cluttered code
* finally rename it to the shorter and perfectly fine load_microcode_early()

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:10 +01:00
Borislav Petkov
58ce8d6d3a x86/microcode: Consolidate family,model, ... code
... to the header. Split the family acquiring function into a
main one, doing CPUID and a helper which computes the extended
family and is used in multiple places. Get rid of the locally-grown
get_x86_{family,model}().

While at it, rename local variables to something more descriptive and
vertically align assignments for better readability.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:07 +01:00
Borislav Petkov
4f5e5f2b57 x86/microcode/intel: Rename update_match_revision()
... to revision_is_newer() and push it up into the header and make it an
inline function.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:03 +01:00
Borislav Petkov
c868570e74 x86/microcode/intel: Sanitize _save_mc()
Shorten local variable names for better readability and flatten loop
indentation levels.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:32:00 +01:00
Borislav Petkov
a5de5e242b x86/microcode/intel: Make _save_mc() return the updated saved count
... of microcode patches instead of handing in a pointer which is used
for I/O in an otherwise void function.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:56 +01:00
Borislav Petkov
02f35177fb x86/microcode/intel: Simplify load_ucode_intel_bsp()
Don't compute start and end from start and size in order to compute size
again down the path in scan_microcode(). So pass size directly instead
and simplify a bunch. Shorten variable names and remove useless ones.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:51 +01:00
Borislav Petkov
2d48bb9b6e x86/microcode/intel: Get rid of last arg to load_ucode_intel_bsp()
Allocate it on the helper's _load_ucode_intel_bsp() stack instead and do
not hand it down.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:48 +01:00
Borislav Petkov
f9524e6f54 x86/microcode/intel: Do the mc_saved_src NULL check first
... and only then deref it. Also, shorten some variable names and rename
others so as to diminish the ubiquitous presence of the "mc_" prefix
everywhere and make it a bit more readable.

Use kcalloc so that we don't kfree() uninitialized memory on the unwind
path, as suggested by Quentin.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
2015-03-02 20:31:11 +01:00
Borislav Petkov
776d3cdc93 x86/microcode/intel: Check if microcode was found before applying
We should check the return value of the routines fishing out the proper
microcode and not try to apply if we haven't found a suitable blob.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:31:03 +01:00
Quentin Casasnovas
d496a002ae x86/microcode/intel: Fix out of bounds memory access to the extended header
Improper pointer arithmetics when calculating the address of the
extended header could lead to an out of bounds memory read and kernel
panic.

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Link: http://lkml.kernel.org/r/20150225094125.GB30434@chrystal.uk.oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-02 20:30:42 +01:00
Steven Rostedt
5b2bdbc845 x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs too
Commit:

   1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")

added a shadow CR4 such that reads and writes that do not
modify the CR4 execute much faster than always reading the
register itself.

The change modified cpu_init() in common.c, so that the
shadow CR4 gets initialized before anything uses it.

Unfortunately, there's two cpu_init()s in common.c. There's
one for 64-bit and one for 32-bit. The commit only added
the shadow init to the 64-bit path, but the 32-bit path
needs the init too.

Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4ccc "x86: Store a per-cpu shadow copy of CR4"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:04:20 +01:00
Ingo Molnar
5838d18955 Merge branch 'linus' into x86/urgent, to merge dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-28 08:03:10 +01:00
Ingo Molnar
e9e4e44309 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-26 12:24:50 +01:00
Matt Fleming
59bf7fd45c perf/x86/intel: Enable conflicting event scheduling for CQM
We can leverage the workqueue that we use for RMID rotation to support
scheduling of conflicting monitoring events. Allowing events that
monitor conflicting things is done at various other places in the perf
subsystem, so there's precedent there.

An example of two conflicting events would be monitoring a cgroup and
simultaneously monitoring a task within that cgroup.

This uses the cache_groups list as a queuing mechanism, where every
event that reaches the front of the list gets the chance to be scheduled
in, possibly descheduling any conflicting events that are running.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-10-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:36 +01:00
Matt Fleming
bff671dba7 perf/x86/intel: Perform rotation on Intel CQM RMIDs
There are many use cases where people will want to monitor more tasks
than there exist RMIDs in the hardware, meaning that we have to perform
some kind of multiplexing.

We do this by "rotating" the RMIDs in a workqueue, and assigning an RMID
to a waiting event when the RMID becomes unused.

This scheme reserves one RMID at all times for rotation. When we need to
schedule a new event we give it the reserved RMID, pick a victim event
from the front of the global CQM list and wait for the victim's RMID to
drop to zero occupancy, before it becomes the new reserved RMID.

We put the victim's RMID onto the limbo list, where it resides for a
"minimum queue time", which is intended to save ourselves an expensive
smp IPI when the RMID is unlikely to have a occupancy value below
__intel_cqm_threshold.

If we fail to recycle an RMID, even after waiting the minimum queue time
then we need to increment __intel_cqm_threshold. There is an upper bound
on this threshold, __intel_cqm_max_threshold, which is programmable from
userland as /sys/devices/intel_cqm/max_recycling_threshold.

The comments above __intel_cqm_rmid_rotate() have more details.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-9-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:35 +01:00
Matt Fleming
bfe1fcd268 perf/x86/intel: Support task events with Intel CQM
Add support for task events as well as system-wide events. This change
has a big impact on the way that we gather LLC occupancy values in
intel_cqm_event_read().

Currently, for system-wide (per-cpu) events we defer processing to
userspace which knows how to discard all but one cpu result per package.

Things aren't so simple for task events because we need to do the value
aggregation ourselves. To do this, we defer updating the LLC occupancy
value in event->count from intel_cqm_event_read() and do an SMP
cross-call to read values for all packages in intel_cqm_event_count().
We need to ensure that we only do this for one task event per cache
group, otherwise we'll report duplicate values.

If we're a system-wide event we want to fallback to the default
perf_event_count() implementation. Refactor this into a common function
so that we don't duplicate the code.

Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
driver.  This task information is only availble in the upper layers of
the perf infrastructure.

Other perf backends stash the target task in event->hw.*target so we
need to do something similar. The task is used to determine whether
events should share a cache group and an RMID.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:34 +01:00
Matt Fleming
35298e554c perf/x86/intel: Implement LRU monitoring ID allocation for CQM
It's possible to run into issues with re-using unused monitoring IDs
because there may be stale cachelines associated with that ID from a
previous allocation. This can cause the LLC occupancy values to be
inaccurate.

To attempt to mitigate this problem we place the IDs on a least recently
used list, essentially a FIFO. The basic idea is that the longer the
time period between ID re-use the lower the probability that stale
cachelines exist in the cache.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-7-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:33 +01:00
Matt Fleming
4afbb24ce5 perf/x86/intel: Add Intel Cache QoS Monitoring support
Future Intel Xeon processors support a Cache QoS Monitoring feature that
allows tracking of the LLC occupancy for a task or task group, i.e. the
amount of data in pulled into the LLC for the task (group).

Currently the PMU only supports per-cpu events. We create an event for
each cpu and read out all the LLC occupancy values.

Because this results in duplicate values being written out to userspace,
we also export a .per-pkg event file so that the perf tools only
accumulate values for one cpu per package.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:32 +01:00
Peter P Waskiewicz Jr
cbc82b1726 x86: Add support for Intel Cache QoS Monitoring (CQM) detection
This patch adds support for the new Cache QoS Monitoring (CQM)
feature found in future Intel Xeon processors.  It includes the
new values to track CQM resources to the cpuinfo_x86 structure,
plus the CPUID detection routines for CQM.

CQM allows a process, or set of processes, to be tracked by the CPU
to determine the cache usage of that task group.  Using this data
from the CPU, software can be written to extract this data and
report cache usage and occupancy for a particular process, or
group of processes.

More information about Cache QoS Monitoring can be found in the
Intel (R) x86 Architecture Software Developer Manual, section 17.14.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chris Webb <chris@arachsys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-25 13:53:31 +01:00
Borislav Petkov
a930dc4543 x86/asm: Cleanup prefetch primitives
This is based on a patch originally by hpa.

With the current improvements to the alternatives, we can simply use %P1
as a mem8 operand constraint and rely on the toolchain to generate the
proper instruction sizes. For example, on 32-bit, where we use an empty
old instruction we get:

  apply_alternatives: feat: 6*32+8, old: (c104648b, len: 4), repl: (c195566c, len: 4)
  c104648b: alt_insn: 90 90 90 90
  c195566c: rpl_insn: 0f 0d 4b 5c

  ...

  apply_alternatives: feat: 6*32+8, old: (c18e09b4, len: 3), repl: (c1955948, len: 3)
  c18e09b4: alt_insn: 90 90 90
  c1955948: rpl_insn: 0f 0d 08

  ...

  apply_alternatives: feat: 6*32+8, old: (c1190cf9, len: 7), repl: (c1955a79, len: 7)
  c1190cf9: alt_insn: 90 90 90 90 90 90 90
  c1955a79: rpl_insn: 0f 0d 0d a0 d4 85 c1

all with the proper padding done depending on the size of the
replacement instruction the compiler generates.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@linux.intel.com>
2015-02-23 13:44:17 +01:00
Yannick Guerrini
a927792c19 x86/cpu/intel: Fix trivial typo in intel_tlb_table[]
Change 'ssociative' to 'associative'

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Cc: Borislav Petkov <bp@suse.de>
Cc: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Chris Bainbridge <chris.bainbridge@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Steven Honeyman <stevenhoneyman@gmail.com>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/1424558510-1420-1-git-send-email-yguerrini@tomshardware.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-22 08:55:58 +01:00
Linus Torvalds
5fbe4c224c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:
 "This contains:

   - EFI fixes
   - a boot printout fix
   - ASLR/kASLR fixes
   - intel microcode driver fixes
   - other misc fixes

  Most of the linecount comes from an EFI revert"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/ASLR: Avoid PAGE_SIZE redefinition for UML subarch
  x86/microcode/intel: Handle truncated microcode images more robustly
  x86/microcode/intel: Guard against stack overflow in the loader
  x86, mm/ASLR: Fix stack randomization on 64-bit systems
  x86/mm/init: Fix incorrect page size in init_memory_mapping() printks
  x86/mm/ASLR: Propagate base load address calculation
  Documentation/x86: Fix path in zero-page.txt
  x86/apic: Fix the devicetree build in certain configs
  Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"
  x86/efi: Avoid triple faults during EFI mixed mode calls
2015-02-21 10:41:29 -08:00
Ingo Molnar
1fbe23e0de * Two fixes hardening microcode data handling. (Quentin Casasnovas)
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU5cyVAAoJEBLB8Bhh3lVKJWEP/1eK+XyiVdxV7FRuPmmgjGUC
 mD6MypFCwc942orTdltm9vlRFTU6OE1AkfEVX3NKawy+lzt/mE+TbWzwx+mr26un
 pqyKgSGGKqACDBADgUiVxubXffhAx9Ke5obScZoFA/Yp+l7os8wkwr6AMjwU+XgU
 FMGKWra0yeZsfCSkQgQ+q+RjQe2TOjh3YYVcwpPRaU6jkJ3CR+MNQ2tVmJnEVMAq
 Q3xEce8mMN+xpuyTlCyvpSIid8M9klAeXb5kjqfffJGSBmtVJ+nn3mDV1a0ejeYQ
 aA6X6SBwpIBPPjhwJrsgUcGC0GeF4X0TKjg1F6ZEW0lN9/mipiM+t3OEgxcBH0G7
 SOAUtQTRDasj0bJd5qKOhAWWmFoXjSc61XiMYUreOWDPoaje76oql+iN1auZsRSh
 RS6KCwYgdqQYscN05L/l4iHgJXeGUTm45BJ6rJb1wEJ9OldO5yK4O42Tn0IZyQ4g
 w10poQY4jkjPnHVUWvk5IQpu7AcBiZtov201a89QpRyPGFoGgOOu7n5y0nDLxuvK
 m3L8LrEve8xO8xdqyidQKE3KGLnDcuuTx9XscbEGtoNWQ8oGIYuYW9DvKzCK/kmU
 u24tx65tygcQ6NJoUW/S3mIwnlyM1egqziXjpzmfR2TvGraqNCkIkocYyf1etVjh
 c0Mem02eJTvarxEpvHAa
 =qoLG
 -----END PGP SIGNATURE-----

Merge tag 'microcode_fixes_for-3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent

Pull microcode fixes from Borislav Petkov:

  - Two fixes hardening microcode data handling. (Quentin Casasnovas)

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 13:32:42 +01:00
Ingo Molnar
fa45a45ca3 Merge tag 'ras_for_3.21' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull RAS updates from Borislav Petkov:

 "- Enable AMD thresholding IRQ by default if supported. (Aravind Gopalakrishnan)

  - Unify mce_panic() message pattern. (Derek Che)

  - A bit more involved simplification of the CMCI logic after yet another
    report about race condition with the adaptive logic. (Borislav Petkov)

  - ACPI APEI EINJ fleshing out of the user documentation. (Borislav Petkov)

  - Minor cleanup. (Jan Beulich.)"

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 13:31:33 +01:00
Aravind Gopalakrishnan
d79f931f1c x86/MCE/AMD: Enable thresholding interrupts by default if supported
We setup APIC vectors for threshold errors if interrupt_capable.
However, we don't set interrupt_enable by default. Rework
threshold_restart_bank() so that when we set up lvt_offset, we also set
IntType to APIC and also enable thresholding interrupts for banks which
support it by default.

User is still allowed to disable interrupts through sysfs.

While at it, check if status is valid before we proceed to log error
using mce_log. This is because, in multi-node platforms, only the NBC
(Node Base Core, i.e. the first core in the node) has valid status info
in its MCA registers. So, the decoding of status values on the non-NBC
leads to noise on kernel logs like so:

  EDAC DEBUG: amd64_inject_write_store: section=0x80000000 word_bits=0x10020001
  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:25 (15:2:0) MC4_STATUS[-|CE|-|-|-
  [Hardware Error]: Corrected error, no action required.
  [Hardware Error]: CPU:26 (15:2:0) MC4_STATUS[-|CE|-|-|-
  <...>
  WARNING: CPU: 25 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0()
  WARNING: CPU: 26 PID: 0 at drivers/edac/amd64_edac.c:2147 decode_bus_error+0x1ba/0x2a0()
  Something is rotten in the state of Denmark.

Suggested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1422896561-7695-1-git-send-email-aravind.gopalakrishnan@amd.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 13:24:47 +01:00
Derek Che
8af7043a3c x86/MCE: Make mce_panic() fatal machine check msg in the same pattern
There is another mce_panic call with "Fatal machine check on current CPU" in
the same mce.c file, why not keep them all in same pattern

	mce_panic("Fatal machine check on current CPU", &m, msg);

Signed-off-by: Derek Che <drc@yahoo-inc.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 13:24:47 +01:00
Borislav Petkov
3f2f0680d1 x86/MCE/intel: Cleanup CMCI storm logic
Initially, this started with the yet another report about a race
condition in the CMCI storm adaptive period length thing. Yes, we have
to admit, it is fragile and error prone. So let's simplify it.

The simpler logic is: now, after we enter storm mode, we go straight to
polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm
mode as long as we see errors being logged while polling.

Theoretically, if we see an uninterrupted error stream, we will remain
in storm mode indefinitely and keep polling the MSRs.

However, when the storm is actually a burst of errors, once we have
logged them all, we back out of it after ~5 mins of polling and no more
errors logged.

If we encounter an error during those 5 minutes, we reset the polling
interval to 5 mins.

Making machine_check_poll() return a bool and denoting whether it has
seen an error or not lets us simplify a bunch of code and move the storm
handling private to mce_intel.c.

Some minor cleanups while at it.

Reported-by: Calvin Owens <calvinowens@fb.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 13:24:25 +01:00
Quentin Casasnovas
35a9ff4eec x86/microcode/intel: Handle truncated microcode images more robustly
We do not check the input data bounds containing the microcode before
copying a struct microcode_intel_header from it. A specially crafted
microcode could cause the kernel to read invalid memory and lead to a
denial-of-service.

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1422964824-22056-3-git-send-email-quentin.casasnovas@oracle.com
[ Made error message differ from the next one and flipped comparison. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 12:42:23 +01:00
Quentin Casasnovas
f84598bd7c x86/microcode/intel: Guard against stack overflow in the loader
mc_saved_tmp is a static array allocated on the stack, we need to make
sure mc_saved_count stays within its bounds, otherwise we're overflowing
the stack in _save_mc(). A specially crafted microcode header could lead
to a kernel crash or potentially kernel execution.

Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1422964824-22056-1-git-send-email-quentin.casasnovas@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 12:41:37 +01:00
Jan Beulich
2cd4c303a7 x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()
The compiler validly warns about it being ignored.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/54C21511020000780005890E@mail.emea.novell.com
Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19 12:30:47 +01:00
Sylvain BERTRAND
e85bd9892c x86/build: Fix mkcapflags.sh bash-ism
Chocked while compiling linux with dash shell instead of bash
shell. See:

  http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_09_05

Signed-off-by: Sylvain BERTRAND <sylvain.bertrand@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20141229154324.GA27533@dhcppc1
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-19 02:21:00 +01:00
Peter Zijlstra
2c44b1936b perf/x86/intel: Expose LBR callstack to user space tooling
With LBR call stack feature enable, there are three callchain options.
Enable the 3rd callchain option (LBR callstack) to user space tooling.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/20141105093759.GQ10501@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:15 +01:00
Yan, Zheng
aa54ae9b87 perf/x86/intel: Discard zero length call entries in LBR call stack
"Zero length call" uses the attribute of the call instruction to push
the immediate instruction pointer on to the stack and then pops off
that address into a register. This is accomplished without any matching
return instruction. It confuses the hardware and make the recorded call
stack incorrect.

We can partially resolve this issue by: decode call instructions and
discard any zero length call entry in the LBR stack.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-16-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:14 +01:00
Yan, Zheng
2c70d0086e perf/x86/intel: Disable FREEZE_LBRS_ON_PMI when LBR operates in callstack mode
LBR callstack is designed for PEBS, It does not work well with
FREEZE_LBRS_ON_PMI for non PEBS event. If FREEZE_LBRS_ON_PMI is set for
non PEBS event, PMIs near call/return instructions may cause superfluous
increase/decrease of LBR_TOS.

This patch modifies __intel_pmu_lbr_enable() to not enable
FREEZE_LBRS_ON_PMI when LBR operates in callstack mode. We currently
don't use LBR callstack to capture kernel space callchain, so disabling
FREEZE_LBRS_ON_PMI should not be a problem.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-15-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:13 +01:00
Yan, Zheng
4b85490099 perf/x86/intel: Re-organize code that implicitly enables LBR/PEBS
Make later patch more readable, no logic change.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-13-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:12 +01:00
Yan, Zheng
a46a230001 perf: Simplify the branch stack check
Use event->attr.branch_sample_type to replace
intel_pmu_needs_lbr_smpl() for avoiding duplicated code that
implicitly enables the LBR.

Currently, branch stack can be enabled by user explicitly requesting
branch sampling or implicit branch sampling to correct PEBS skid.

For user explicitly requested branch sampling, the branch_sample_type
is explicitly set by user. For PEBS case, the branch_sample_type is also
implicitly set to PERF_SAMPLE_BRANCH_ANY in x86_pmu_hw_config.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-11-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:11 +01:00
Yan, Zheng
76cb2c617f perf/x86/intel: Save/restore LBR stack during context switch
When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. The solution is saving/restoring
the LBR stack to/from task's perf event context.

The LBR stack is saved/restored only when there are events that use
the LBR call stack. If no event uses LBR call stack, the LBR stack
is reset when task is scheduled in.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-10-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:10 +01:00
Yan, Zheng
63f0c1d841 perf/x86/intel: Track number of events that use the LBR callstack
When enabling/disabling an event, check if the event uses the LBR
callstack feature, adjust the LBR callstack usage count accordingly.
Later patch will use the usage count to decide if LBR stack should
be saved/restored.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-9-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:09 +01:00
Yan, Zheng
e18bf52642 perf/x86/intel: Allocate space for storing LBR stack
When the LBR call stack is enabled, it is necessary to save/restore
the LBR stack on context switch. We can use pmu specific data to
store LBR stack when task is scheduled out. This patch adds code
that allocates the pmu specific data.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-8-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:08 +01:00
Yan, Zheng
e9d7f7cd97 perf/x86/intel: Add basic Haswell LBR call stack support
Haswell has a new feature that utilizes the existing LBR facility to
record call chains. To enable this feature, bits (JCC, NEAR_IND_JMP,
NEAR_REL_JMP, FAR_BRANCH, EN_CALLSTACK) in LBR_SELECT must be set to 1,
bits (NEAR_REL_CALL, NEAR-IND_CALL, NEAR_RET) must be cleared. Due to
a hardware bug of Haswell, this feature doesn't work well with
FREEZE_LBRS_ON_PMI.

When the call stack feature is enabled, the LBR stack will capture
unfiltered call data normally, but as return instructions are executed,
the last captured branch record is flushed from the on-chip registers
in a last-in first-out (LIFO) manner. Thus, branch information relative
to leaf functions will not be captured, while preserving the call stack
information of the main line execution path.

This patch defines a separate lbr_sel map for Haswell. The map contains
a new entry for the call stack feature.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-5-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:04 +01:00
Yan, Zheng
2a0ad3b326 perf/x86/intel: Use context switch callback to flush LBR stack
Previous commit introduces context switch callback, its function
overlaps with the flush branch stack callback. So we can use the
context switch callback to flush LBR stack.

This patch adds code that uses the flush branch callback to
flush the LBR stack when task is being scheduled in. The callback
is enabled only when there are events use the LBR hardware. This
patch also removes all old flush branch stack code.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:03 +01:00
Yan, Zheng
ba532500c5 perf: Introduce pmu context switch callback
The callback is invoked when process is scheduled in or out.
It provides mechanism for later patches to save/store the LBR
stack. For the schedule in case, the callback is invoked at
the same place that flush branch stack callback is invoked.
So it also can replace the flush branch stack callback. To
avoid unnecessary overhead, the callback is enabled only when
there are events use the LBR stack.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Link: http://lkml.kernel.org/r/1415156173-10035-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:02 +01:00
Yan, Zheng
27ac905b8f perf/x86/intel: Reduce lbr_sel_map[] size
The index of lbr_sel_map is bit value of perf branch_sample_type.
PERF_SAMPLE_BRANCH_MAX is 1024 at present, so each lbr_sel_map uses
4096 bytes. By using bit shift as index, we can reduce lbr_sel_map
size to 40 bytes. This patch defines 'bit shift' for branch types,
and use 'bit shift' to define lbr_sel_maps.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: jolsa@redhat.com
Cc: linux-api@vger.kernel.org
Link: http://lkml.kernel.org/r/1415156173-10035-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:01 +01:00
Aravind Gopalakrishnan
c796b205b8 perf/x86/amd/ibs: Convert force_ibs_eilvt_setup() to void
The caller of force_ibs_eilvt_setup() is ibs_eilvt_setup()
which does not care about the return values.

So mark it void and clean up the return statements.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <hpa@zytor.com>
Cc: <paulus@samba.org>
Cc: <tglx@linutronix.de>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422037175-20957-1-git-send-email-aravind.gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:01:46 +01:00
Markus Elfring
8e57c586c6 perf/x86/intel/uncore: Delete an unnecessary check before pci_dev_put() call
The pci_dev_put() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/54D0B59C.2060106@users.sourceforge.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:01:42 +01:00
Linus Torvalds
d96c757efa Fix regression - functions on the mce notifier chain should
not be able to decide that an event should not be logged
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU47aGAAoJEKurIx+X31iBiX4P+wfq7uUKwQ4riD2jFppvhrcm
 W2Qx/iIv9QN77ZIw5I45VqGbDKXmlThl41ISem9BlKd8jKaldY3lUlQMfrPC5V11
 9bl/7LsZoQLlbuwYR6uiLdKqW9wd6d5Y1mczdSDM5wCtfMw/s+C/ETzuRVHsQZjF
 1LTB0rb0NPouX+y3D8aDrvk9Os5ozZsz3N/y6e3TsI/wV8d3rqwH8C8x3RjB2Evx
 3WRSwoSOq9kHEbeg1r7PMKYKWAoJs97Kwo4EgJELqn8fxYMWnSsoDZGr9P2PX8oT
 TKgSFPnhgCLw+qlWy81MM8hutnHKnN6oXcJKzE0nHtD8JlJ/M/HdDAPIg8G9aLIn
 ABxPg6OORs/4YJQYGFA8ixx3TfIMspMU2m9KGoCcerpGaHCHhrlylJyheUvhRkPP
 u8pjGz+31d3bVVRzCLJt1eqo3H/y0wcURWaemk23lcUIsdDqisjZDzZrZxyZuWaH
 eDTKmHsZB/I4wnOs4Ke+U7oo/u+NtBzPmBSJcshgKSONLPd7bSJtjckLoa3wSf5I
 q5DkZgxrUYkO6tIoAAi/N2tc/2qkjTOug79BP9YKN3elmv3nxW4gieSaUb5p16bd
 ORpZLt3SDKEPv+79a/1e7ZyR7ik9Evhzc+/72M1IZvmdr3uOjj+xkuE1uRU7vuvX
 44Jmx6mEtPK1mVBfwkdG
 =jAUs
 -----END PGP SIGNATURE-----

Merge tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull mcelog regression fix from Tony Luck:
 "Fix regression - functions on the mce notifier chain should not be
  able to decide that an event should not be logged"

* tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
  x86/mce: Fix regression. All error records should report via /dev/mcelog
2015-02-17 17:03:07 -08:00
Linus Torvalds
37507717de Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 perf updates from Ingo Molnar:
 "This series tightens up RDPMC permissions: currently even highly
  sandboxed x86 execution environments (such as seccomp) have permission
  to execute RDPMC, which may leak various perf events / PMU state such
  as timing information and other CPU execution details.

  This 'all is allowed' RDPMC mode is still preserved as the
  (non-default) /sys/devices/cpu/rdpmc=2 setting.  The new default is
  that RDPMC access is only allowed if a perf event is mmap-ed (which is
  needed to correctly interpret RDPMC counter values in any case).

  As a side effect of these changes CR4 handling is cleaned up in the
  x86 code and a shadow copy of the CR4 value is added.

  The extra CR4 manipulation adds ~ <50ns to the context switch cost
  between rdpmc-capable and rdpmc-non-capable mms"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
  perf/x86: Only allow rdpmc if a perf_event is mapped
  perf: Pass the event to arch_perf_update_userpage()
  perf: Add pmu callbacks to track event mapping and unmapping
  x86: Add a comment clarifying LDT context switching
  x86: Store a per-cpu shadow copy of CR4
  x86: Clean up cr4 manipulation
2015-02-16 14:58:12 -08:00
Tejun Heo
bf58b4879c x86: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

* Unnecessary buffer size calculation and condition on the lenght
  removed from intel_cacheinfo.c::show_shared_cpu_map_func().

* uv_nmi_nr_cpus_pr() got overly smart and implemented "..."
  abbreviation if the output stretched over the predefined 1024 byte
  buffer.  Replaced with plain printk.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:37 -08:00
Linus Torvalds
e07e0d4cb0 Merge branch 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
 "The changes in this cycle were:

   - allow mmcfg access to APEI error injection handlers

   - improve MCE error messages

   - smaller cleanups"

* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, mce: Fix sparse errors
  x86, mce: Improve timeout error messages
  ACPI, EINJ: Enhance error injection tolerance level
2015-02-09 18:22:04 -08:00
Linus Torvalds
80f33a5fdf Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "Misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/rtc: Remove duplicate const specifier
  x86, early_serial_console: Remove unnecessary check
  x86, early_serial_console: Remove unused macro XMTRDY
  x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H
  x86, CPU: Fix trivial printk formatting issues with dmesg
2015-02-09 17:50:09 -08:00
Linus Torvalds
7453311d68 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The main changes in this cycle were the x86/entry and sysret
  enhancements from Andy Lutomirski, see merge commits 772a9aca12 and
  b57c0b5175 for details"

[ Exectutive summary: IST exceptions that interrupt user space will run
  on the regular kernel stack instead of the IST stack.  Which
  simplifies things particularly on return to user space.

  The sysret cleanup ends up simplifying the logic on when we can use
  sysret vs when we have to use iret.                - Linus ]

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86_64, entry: Remove the syscall exit audit and schedule optimizations
  x86_64, entry: Use sysret to return to userspace when possible
  x86, traps: Fix ist_enter from userspace
  x86, vdso: teach 'make clean' remove vdso64 binaries
  x86_64 entry: Fix RCX for ptraced syscalls
  x86: entry_64.S: fold SAVE_ARGS_IRQ macro into its sole user
  x86: ia32entry.S: fix wrong symbolic constant usage: R11->ARGOFFSET
  x86: entry_64.S: delete unused code
  x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricks
  x86, traps: Add ist_begin_non_atomic and ist_end_non_atomic
  x86: Clean up current_stack_pointer
  x86, traps: Track entry into and exit from IST context
  x86, entry: Switch stacks on a paranoid entry from userspace
2015-02-09 17:16:44 -08:00
Linus Torvalds
9d43bade34 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Ingo Molnar:
 "Continued fallout of the conversion of the x86 IRQ code to the
  hierarchical irqdomain framework: more cleanups, simplifications,
  memory allocation behavior enhancements, mainly in the interrupt
  remapping and APIC code"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  x86, init: Fix UP boot regression on x86_64
  iommu/amd: Fix irq remapping detection logic
  x86/acpi: Make acpi_[un]register_gsi_ioapic() depend on CONFIG_X86_LOCAL_APIC
  x86: Consolidate boot cpu timer setup
  x86/apic: Reuse apic_bsp_setup() for UP APIC setup
  x86/smpboot: Sanitize uniprocessor init
  x86/smpboot: Move apic init code to apic.c
  init: Get rid of x86isms
  x86/apic: Move apic_init_uniprocessor code
  x86/smpboot: Cleanup ioapic handling
  x86/apic: Sanitize ioapic handling
  x86/ioapic: Add proper checks to setp/enable_IO_APIC()
  x86/ioapic: Provide stub functions for IOAPIC%3Dn
  x86/smpboot: Move smpboot inlines to code
  x86/x2apic: Use state information for disable
  x86/x2apic: Split enable and setup function
  x86/x2apic: Disable x2apic from nox2apic setup
  x86/x2apic: Add proper state tracking
  x86/x2apic: Clarify remapping mode for x2apic enablement
  x86/x2apic: Move code in conditional region
  ...
2015-02-09 16:57:56 -08:00
Linus Torvalds
a4cbbf549a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - AMD range breakpoints support:

     Extend breakpoint tools and core to support address range through
     perf event with initial backend support for AMD extended
     breakpoints.

     The syntax is:

         perf record -e mem:addr/len:type

     For example set write breakpoint from 0x1000 to 0x1200 (0x1000 + 512)

         perf record -e mem:0x1000/512:w

   - event throttling/rotating fixes

   - various event group handling fixes, cleanups and general paranoia
     code to be more robust against bugs in the future.

    - kernel stack overhead fixes

  User-visible tooling side changes:

   - Show precise number of samples in at the end of a 'record' session,
     if processing build ids, since we will then traverse the whole
     perf.data file and see all the PERF_RECORD_SAMPLE records,
     otherwise stop showing the previous off-base heuristicly counted
     number of "samples" (Namhyung Kim).

   - Support to read compressed module from build-id cache (Namhyung
     Kim)

   - Enable sampling loads and stores simultaneously in 'perf mem'
     (Stephane Eranian)

   - 'perf diff' output improvements (Namhyung Kim)

   - Fix error reporting for evsel pgfault constructor (Arnaldo Carvalho
     de Melo)

  Tooling side infrastructure changes:

   - Cache eh/debug frame offset for dwarf unwind (Namhyung Kim)

   - Support parsing parameterized events (Cody P Schafer)

   - Add support for IP address formats in libtraceevent (David Ahern)

  Plus other misc fixes"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
  perf: Decouple unthrottling and rotating
  perf: Drop module reference on event init failure
  perf: Use POLLIN instead of POLL_IN for perf poll data in flag
  perf: Fix put_event() ctx lock
  perf: Fix move_group() order
  perf: Fix event->ctx locking
  perf: Add a bit of paranoia
  perf symbols: Convert lseek + read to pread
  perf tools: Use perf_data_file__fd() consistently
  perf symbols: Support to read compressed module from build-id cache
  perf evsel: Set attr.task bit for a tracking event
  perf header: Set header version correctly
  perf record: Show precise number of samples
  perf tools: Do not use __perf_session__process_events() directly
  perf callchain: Cache eh/debug frame offset for dwarf unwind
  perf tools: Provide stub for missing pthread_attr_setaffinity_np
  perf evsel: Don't rely on malloc working for sz 0
  tools lib traceevent: Add support for IP address formats
  perf ui/tui: Show fatal error message only if exists
  perf tests: Fix typo in sample-parsing.c
  ...
2015-02-09 15:43:55 -08:00
Tony Luck
a2413d8b29 x86/mce: Fix regression. All error records should report via /dev/mcelog
I'm getting complaints from validation teams that have updated their
Linux kernels from ancient versions to current. They don't see the
error logs they expect. I tell the to unload any EDAC drivers[1], and
things start working again.  The problem is that we short-circuit
the logging process if any function on the decoder chain claims to
have dealt with the problem:

	ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m);
	if (ret == NOTIFY_STOP)
		return;

The logic we used when we added this code was that we did not want
to confuse users with double reports of the same error.

But it turns out users are not confused - they are upset that they
don't see a log where their tools used to find a log.

I could also get into a long description of how the consumer of this
log does more than just decode model specific details of the error.
It keeps counts, tracks thresholds, takes actions and runs scripts
that can alert administrators to problems.

[1] We've recently compounded the problem because the acpi_extlog
driver also registers for this notifier and also returns NOTIFY_STOP.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-02-09 09:36:53 -08:00
Linus Torvalds
26cdd1f76a Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and x86 fix from Ingo Molnar:
 "A CLOCK_TAI early expiry fix and an x86 microcode driver oops fix"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode: Return error from driver init code when loader is disabled
2015-02-06 13:56:02 -08:00
Andy Lutomirski
a66734297f perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks
While perfmon2 is a sufficiently evil library (it pokes MSRs
directly) that breaking it is fair game, it's still useful, so we
might as well try to support it.  This allows users to write 2 to
/sys/devices/cpu/rdpmc to disable all rdpmc protection so that hack
like perfmon2 can continue to work.

At some point, if perf_event becomes fast enough to replace
perfmon2, then this can go.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/caac3c1c707dcca48ecbc35f4def21495856f479.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:49 +01:00
Andy Lutomirski
7911d3f7af perf/x86: Only allow rdpmc if a perf_event is mapped
We currently allow any process to use rdpmc.  This significantly
weakens the protection offered by PR_TSC_DISABLED, and it could be
helpful to users attempting to exploit timing attacks.

Since we can't enable access to individual counters, use a very
coarse heuristic to limit access to rdpmc: allow access only when
a perf_event is mmapped.  This protects seccomp sandboxes.

There is plenty of room to further tighen these restrictions.  For
example, this allows rdpmc for any x86_pmu event, but it's only
useful for self-monitoring tasks.

As a side effect, cap_user_rdpmc will now be false for AMD uncore
events.  This isn't a real regression, since .event_idx is disabled
for these events anyway for the time being.  Whenever that gets
re-added, the cap_user_rdpmc code can be adjusted or refactored
accordingly.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/a2bdb3cf3a1d70c26980d7c6dddfbaa69f3182bf.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:47 +01:00
Andy Lutomirski
c1317ec2b9 perf: Pass the event to arch_perf_update_userpage()
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0fea9a7fac3c1eea86cb0a5954184e74f4213666.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:46 +01:00
Andy Lutomirski
1e02ce4ccc x86: Store a per-cpu shadow copy of CR4
Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm.  The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:42 +01:00
Andy Lutomirski
375074cc73 x86: Clean up cr4 manipulation
CR4 manipulation was split, seemingly at random, between direct
(write_cr4) and using a helper (set/clear_in_cr4).  Unfortunately,
the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code,
which only a small subset of users actually wanted.

This patch replaces all cr4 access in functions that don't leave cr4
exactly the way they found it with new helpers cr4_set_bits,
cr4_clear_bits, and cr4_set_bits_and_update_boot.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 12:10:41 +01:00
Ingo Molnar
0967160ad6 Merge branch 'x86/asm' into perf/x86, to avoid conflicts with upcoming patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-04 09:01:12 +01:00
Ingo Molnar
8dbcb8737c Linux 3.19-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUzvgKAAoJEHm+PkMAQRiG8XQH/1qVbHI4pP0KcnzfZUHq/mXq
 RuS4aJMwLm/Y6cXFraXBDaPde1A3CPtwtpob2C6giKcfu2zXGunY65haOEeJWNpX
 lCbBsLkNC3oDNkygBpVr5Zd6yibaw63WBjjLnpAi7pn2G2Zm2zB8DfILWWWMb7yz
 MH8ZXV+/xIYCTkjNWGWA1iMjmdYqu0PQHPeOgLsYQ+u7rxfM1zb/wHEkjqUZS6iu
 IaaZv7PV2PnFYnqib/iIPYjAEDvSQ4vN/7b82zlFd2Culm9j/568KCCWUPhJTb2l
 X0u4QYs49GnMTWVRa3bgYxS/nTUaE/6DeWs2y2WzqTt0/XDntVUnok0blUeDxGk=
 =o2kS
 -----END PGP SIGNATURE-----

Merge tag 'v3.19-rc7' into x86/asm, to refresh the branch before pulling in new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-03 12:22:18 +01:00
Ingo Molnar
b3890e4704 Merge branch 'perf/hw_breakpoints' into perf/core
The new hw_breakpoint bits are now ready for v3.20, merge them
into the main branch, to avoid conflicts.

Conflicts:
	tools/perf/Documentation/perf-record.txt

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28 15:48:59 +01:00
Ingo Molnar
772a9aca12 This is my accumulated x86 entry work, part 1, for 3.20. The meat
of this is an IST rework.  When an IST exception interrupts user
 space, we will handle it on the per-thread kernel stack instead of
 on the IST stack.  This sounds messy, but it actually simplifies the
 IST entry/exit code, because it eliminates some ugly games we used
 to play in order to handle rescheduling, signal delivery, etc on the
 way out of an IST exception.
 
 The IST rework introduces proper context tracking to IST exception
 handlers.  I haven't seen any bug reports, but the old code could
 have incorrectly treated an IST exception handler as an RCU extended
 quiescent state.
 
 The memory failure change (included in this pull request with
 Borislav and Tony's permission) eliminates a bunch of code that
 is no longer needed now that user memory failure handlers are
 called in process context.
 
 Finally, this includes a few on Denys' uncontroversial and Obviously
 Correct (tm) cleanups.
 
 The IST and memory failure changes have been in -next for a while.
 
 LKML references:
 
 IST rework:
 http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net
 
 Memory failure change:
 http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com
 
 Denys' cleanups:
 http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUtvkFAAoJEK9N98ZeDfrkcfsIAJxZ0UBUCEDvulbqgk/iPGOa
 fIpKLMowS7CpKtw6Wdc/YvAIkeHXWm1vU44Hj0TrjSrXCgVF8yCngs/xlXtOjoa1
 dosXQqgqVJJ+hyui7chAEWyalLW7bEO8raq/6snhiMrhiuEkVKpEr7Fer4FVVCZL
 4VALmNQQsbV+Qq4pXIhuagZC0Nt/XKi/+/cKvhS4p//q1F/TbHTz0FpDUrh0jPMh
 18WFy0jWgxdkMRnSp/wJhekvdXX6PwUy5BdES9fjw8LQJZxxFpqN3Fe1kgfyzV0k
 yuvEHw1hPt2aBGj3q69wQvDVyyn4OqMpRDBhk4S+GJYmVh7mFyFMN4BDMEy/EY8=
 =LXVl
 -----END PGP SIGNATURE-----

Merge tag 'pr-20150114-x86-entry' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux into x86/asm

Pull x86/entry enhancements from Andy Lutomirski:

" This is my accumulated x86 entry work, part 1, for 3.20.  The meat
  of this is an IST rework.  When an IST exception interrupts user
  space, we will handle it on the per-thread kernel stack instead of
  on the IST stack.  This sounds messy, but it actually simplifies the
  IST entry/exit code, because it eliminates some ugly games we used
  to play in order to handle rescheduling, signal delivery, etc on the
  way out of an IST exception.

  The IST rework introduces proper context tracking to IST exception
  handlers.  I haven't seen any bug reports, but the old code could
  have incorrectly treated an IST exception handler as an RCU extended
  quiescent state.

  The memory failure change (included in this pull request with
  Borislav and Tony's permission) eliminates a bunch of code that
  is no longer needed now that user memory failure handlers are
  called in process context.

  Finally, this includes a few on Denys' uncontroversial and Obviously
  Correct (tm) cleanups.

  The IST and memory failure changes have been in -next for a while.

  LKML references:

  IST rework:
  http://lkml.kernel.org/r/cover.1416604491.git.luto@amacapital.net

  Memory failure change:
  http://lkml.kernel.org/r/54ab2ffa301102cd6e@agluck-desk.sc.intel.com

  Denys' cleanups:
  http://lkml.kernel.org/r/1420927210-19738-1-git-send-email-dvlasenk@redhat.com
"

This tree semantically depends on and is based on the following RCU commit:

  734d168013 ("rcu: Make rcu_nmi_enter() handle nesting")

... and for that reason won't be pushed upstream before the RCU bits hit Linus's tree.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-28 15:33:26 +01:00