Commit Graph

11905 Commits

Author SHA1 Message Date
Linus Torvalds
ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Linus Torvalds
d88bfe1d68 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "Various RAS updates:

   - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
     MCA) error decoding support (Aravind Gopalakrishnan)

   - x86 memcpy_mcsafe() support, to enable smart(er) hardware error
     recovery in NVDIMM drivers, based on an extension of the x86
     exception handling code.  (Tony Luck)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/sb_edac: Fix computation of channel address
  x86/mm, x86/mce: Add memcpy_mcsafe()
  x86/mce/AMD: Document some functionality
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
  x86/mce: Move MCx_CONFIG MSR definitions
  x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
  x86/mm: Expand the exception table logic to allow new handling options
  x86/mce/AMD: Set MCAX Enable bit
  x86/mce/AMD: Carve out threshold block preparation
  x86/mce/AMD: Fix LVT offset configuration for thresholding
  x86/mce/AMD: Reduce number of blocks scanned per bank
  x86/mce/AMD: Do not perform shared bank check for future processors
  x86/mce: Fix order of AMD MCE init function call
2016-03-14 18:43:51 -07:00
Linus Torvalds
e71c2c1eeb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main kernel side changes:

   - Big reorganization of the x86 perf support code.  The old code grew
     organically deep inside arch/x86/kernel/cpu/perf* and its naming
     became somewhat messy.

     The new location is under arch/x86/events/, using the following
     cleaner hierarchy of source code files:

       perf/x86: Move perf_event.c .................. => x86/events/core.c
       perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
       perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
       perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
       perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
       perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
       perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
       perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
       perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
       perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
       perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
       perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
       perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
       perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
       perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
       perf/x86: Move perf_event_intel_uncore_snb.c   => x86/events/intel/uncore_snb.c
       perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
       perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
       perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
       perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
       perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

     (Borislav Petkov)

   - Update various x86 PMU constraint and hw support details (Stephane
     Eranian)

   - Optimize kprobes for BPF execution (Martin KaFai Lau)

   - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
     Gleixner)

   - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

   - Various fixes and smaller cleanups.

  There are lots of perf tooling updates as well.  A few highlights:

  perf report/top:

     - Hierarchy histogram mode for 'perf top' and 'perf report',
       showing multiple levels, one per --sort entry: (Namhyung Kim)

       On a mostly idle system:

         # perf top --hierarchy -s comm,dso

       Then expand some levels and use 'P' to take a snapshot:

         # cat perf.hist.0
         -  92.32%         perf
               58.20%         perf
               22.29%         libc-2.22.so
                5.97%         [kernel]
                4.18%         libelf-0.165.so
                1.69%         [unknown]
         -   4.71%         qemu-system-x86
                3.10%         [kernel]
                1.60%         qemu-system-x86_64 (deleted)
         +   2.97%         swapper
         #

     - Add 'L' hotkey to dynamicly set the percent threshold for
       histogram entries and callchains, i.e.  dynamicly do what the
       --percent-limit command line option to 'top' and 'report' does.
       (Namhyung Kim)

  perf mem:

     - Allow specifying events via -e in 'perf mem record', also listing
       what events can be specified via 'perf mem record -e list' (Jiri
       Olsa)

  perf record:

     - Add 'perf record' --all-user/--all-kernel options, so that one
       can tell that all the events in the command line should be
       restricted to the user or kernel levels (Jiri Olsa), i.e.:

         perf record -e cycles:u,instructions:u

       is equivalent to:

         perf record --all-user -e cycles,instructions

     - Make 'perf record' collect CPU cache info in the perf.data file header:

         $ perf record usleep 1
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
         $ perf report --header-only -I | tail -10 | head -8
         # CPU cache info:
         #  L1 Data                 32K [0-1]
         #  L1 Instruction          32K [0-1]
         #  L1 Data                 32K [2-3]
         #  L1 Instruction          32K [2-3]
         #  L2 Unified             256K [0-1]
         #  L2 Unified             256K [2-3]
         #  L3 Unified            4096K [0-3]

       Will be used in 'perf c2c' and eventually in 'perf diff' to
       allow, for instance running the same workload in multiple
       machines and then when using 'diff' show the hardware difference.
       (Jiri Olsa)

     - Improved support for Java, using the JVMTI agent library to do
       jitdumps that then will be inserted in synthesized
       PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
       ELF files stored in ~/.debug and keyed with build-ids, to allow
       symbol resolution and even annotation with source line info, see
       the changeset comments to see how to use it (Stephane Eranian)

  perf script/trace:

     - Decode data_src values (e.g.  perf.data files generated by 'perf
       mem record') in 'perf script': (Jiri Olsa)

         # perf script
           perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     - Improve support to 'data_src', 'weight' and 'addr' fields in
       'perf script' (Jiri Olsa)

     - Handle empty print fmts in 'perf script -s' i.e. when running
       python or perl scripts (Taeung Song)

  perf stat:

     - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
       interval mode too.  E.g:

         # perf stat -I 1000 -e instructions,cycles sleep 1
         #         time   counts unit events
            1.000215928  519,620      instructions     #  0.69 insn per cycle
            1.000215928  752,003      cycles
         <SNIP>

     - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

     - Implement CSV metrics output in 'perf stat' (Andi Kleen)

  perf BPF support:

     - Support converting data from bpf events in 'perf data' (Wang Nan)

     - Print bpf-output events in 'perf script': (Wang Nan).

         # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
         # perf script
            usleep  4882 21384.532523:   evt:  ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
         #

     - Add API to set values of map entries in a BPF object, be it
       individual map slots or ranges (Wang Nan)

     - Introduce support for the 'bpf-output' event (Wang Nan)

     - Add glue to read perf events in a BPF program (Wang Nan)

     - Improve support for bpf-output events in 'perf trace' (Wang Nan)

  ... and tons of other changes as well - see the shortlog and git log
  for details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
  perf stat: Add --metric-only support for -A
  perf stat: Implement --metric-only mode
  perf stat: Document CSV format in manpage
  perf hists browser: Check sort keys before hot key actions
  perf hists browser: Allow thread filtering for comm sort key
  perf tools: Add sort__has_comm variable
  perf tools: Recalc total periods using top-level entries in hierarchy
  perf tools: Remove nr_sort_keys field
  perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
  perf tools: Remove hist_entry->fmt field
  perf tools: Fix command line filters in hierarchy mode
  perf tools: Add more sort entry check functions
  perf tools: Fix hist_entry__filter() for hierarchy
  perf jitdump: Build only on supported archs
  tools lib traceevent: Add '~' operation within arg_num_eval()
  perf tools: Omit unnecessary cast in perf_pmu__parse_scale
  perf tools: Pass perf_hpp_list all the way through setup_sort_list
  perf tools: Fix perf script python database export crash
  perf jitdump: DWARF is also needed
  perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
  ...
2016-03-14 17:58:53 -07:00
Linus Torvalds
d09e356ad0 Merge branch 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull read-only kernel memory updates from Ingo Molnar:
 "This tree adds two (security related) enhancements to the kernel's
  handling of read-only kernel memory:

   - extend read-only kernel memory to a new class of formerly writable
     kernel data: 'post-init read-only memory' via the __ro_after_init
     attribute, and mark the ARM and x86 vDSO as such read-only memory.

     This kind of attribute can be used for data that requires a once
     per bootup initialization sequence, but is otherwise never modified
     after that point.

     This feature was based on the work by PaX Team and Brad Spengler.

     (by Kees Cook, the ARM vDSO bits by David Brown.)

   - make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
     Kconfig option.  This simplifies the kernel and also signals that
     read-only memory is the default model and a first-class citizen.
     (Kees Cook)"

* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ARM/vdso: Mark the vDSO code read-only after init
  x86/vdso: Mark the vDSO code read-only after init
  lkdtm: Verify that '__ro_after_init' works correctly
  arch: Introduce post-init read-only memory
  x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
  mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
  asm-generic: Consolidate mark_rodata_ro()
2016-03-14 16:58:50 -07:00
Linus Torvalds
fbed0bc091 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
 "Various updates:

   - Futex scalability improvements: remove page lock use for shared
     futex get_futex_key(), which speeds up 'perf bench futex hash'
     benchmarks by over 40% on a 60-core Westmere.  This makes anon-mem
     shared futexes perform close to private futexes.  (Mel Gorman)

   - lockdep hash collision detection and fix (Alfredo Alvarez
     Fernandez)

   - lockdep testing enhancements (Alfredo Alvarez Fernandez)

   - robustify lockdep init by using hlists (Andrew Morton, Andrey
     Ryabinin)

   - mutex and csd_lock micro-optimizations (Davidlohr Bueso)

   - small x86 barriers tweaks (Michael S Tsirkin)

   - qspinlock updates (Waiman Long)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
  locking/csd_lock: Explicitly inline csd_lock*() helpers
  futex: Replace barrier() in unqueue_me() with READ_ONCE()
  locking/lockdep: Detect chain_key collisions
  locking/lockdep: Prevent chain_key collisions
  tools/lib/lockdep: Fix link creation warning
  tools/lib/lockdep: Add tests for AA and ABBA locking
  tools/lib/lockdep: Add userspace version of READ_ONCE()
  tools/lib/lockdep: Fix the build on recent kernels
  locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
  locking/mutex: Allow next waiter lockless wakeup
  locking/pvqspinlock: Enable slowpath locking count tracking
  locking/qspinlock: Use smp_cond_acquire() in pending code
  locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
  locking/mcs: Fix mcs_spin_lock() ordering
  futex: Remove requirement for lock_page() in get_futex_key()
  futex: Rename barrier references in ordering guarantees
  locking/atomics: Update comment about READ_ONCE() and structures
  locking/lockdep: Eliminate lockdep_init()
  locking/lockdep: Convert hash tables to hlists
  ...
2016-03-14 15:50:44 -07:00
Linus Torvalds
d37a14bb5f Merge branch 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull ram resource handling changes from Ingo Molnar:
 "Core kernel resource handling changes to support NVDIMM error
  injection.

  This tree introduces a new I/O resource type, IORESOURCE_SYSTEM_RAM,
  for System RAM while keeping the current IORESOURCE_MEM type bit set
  for all memory-mapped ranges (including System RAM) for backward
  compatibility.

  With this resource flag it no longer takes a strcmp() loop through the
  resource tree to find "System RAM" resources.

  The new resource type is then used to extend ACPI/APEI error injection
  facility to also support NVDIMM"

* 'core-resources-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ACPI/EINJ: Allow memory error injection to NVDIMM
  resource: Kill walk_iomem_res()
  x86/kexec: Remove walk_iomem_res() call with GART type
  x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search
  resource: Add walk_iomem_res_desc()
  memremap: Change region_intersects() to take @flags and @desc
  arm/samsung: Change s3c_pm_run_res() to use System RAM type
  resource: Change walk_system_ram() to use System RAM type
  drivers: Initialize resource entry to zero
  xen, mm: Set IORESOURCE_SYSTEM_RAM to System RAM
  kexec: Set IORESOURCE_SYSTEM_RAM for System RAM
  arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM
  ia64: Set System RAM type and descriptor
  x86/e820: Set System RAM type and descriptor
  resource: Add I/O resource descriptor
  resource: Handle resource flags properly
  resource: Add System RAM resource type
2016-03-14 15:15:51 -07:00
Fenghua Yu
d050049442 x86/cpufeature: Enable new AVX-512 features
A few new AVX-512 instruction groups/features are added in cpufeatures.h
for enuermation: AVX512DQ, AVX512BW, and AVX512VL.

Clear the flags in fpu__xstate_clear_all_cpu_caps().

The specification for latest AVX-512 including the features can be found at:

  https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up
the flags and enable them in KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com
[ Added more detailed feature descriptions. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12 17:30:53 +01:00
Borislav Petkov
6e6867093d x86/fpu: Fix eager-FPU handling on legacy FPU machines
i486 derived cores like Intel Quark support only the very old,
legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and
our FPU code wasn't handling the saving and restoring there
properly in the 'eagerfpu' case.

So after we made eagerfpu the default for all CPU types:

  58122bf1d8 x86/fpu: Default eagerfpu=on on all CPUs

these old FPU designs broke. First, Andy Shevchenko reported a splat:

  WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160

which was us trying to execute FXRSTOR on those machines even though
they don't support it.

After taking care of that, Bryan O'Donoghue reported that a simple FPU
test still failed because we weren't initializing the FPU state properly
on those machines.

Take care of all that.

Reported-and-tested-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Reported-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12 16:13:55 +01:00
Jianyu Zhan
10ee73865e x86/entry/traps: Show unhandled signal for i386 in do_trap()
Commit abd4f7505b ("x86: i386-show-unhandled-signals-v3") did turn on
the showing-unhandled-signal behaviour for i386 for some exception handlers,
but for no reason do_trap() is left out (my naive guess is because turning it on
for do_trap() would be too noisy since do_trap() is shared by several exceptions).

And since the same commit make "show_unhandled_signals" a debug tunable(in
/proc/sys/debug/exception-trace), and x86 by default turning it on.

So it would be strange for i386 users who turing it on manually and expect
seeing the unhandled signal output in log, but nothing.

This patch turns it on for i386 in do_trap() as well.

Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: dave.hansen@linux.intel.com
Cc: heukelum@fastmail.fm
Cc: jbeulich@novell.com
Cc: jdike@addtoit.com
Cc: joe@perches.com
Cc: luto@kernel.org
Link: http://lkml.kernel.org/r/1457612398-4568-1-git-send-email-nasa4836@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 18:37:25 +01:00
Andy Lutomirski
a798f09111 x86/entry/32: Change INT80 to be an interrupt gate
We want all of the syscall entries to run with interrupts off so that
we can efficiently run context tracking before enabling interrupts.

This will regress int $0x80 performance on 32-bit kernels by a
couple of cycles.  This shouldn't matter much -- int $0x80 is not a
fast path.

This effectively reverts:

  657c1eea00 ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on")

... and fixes the same issue differently.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:53:26 +01:00
Ingo Molnar
6cbe9e4a22 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:28:27 +01:00
Yu-cheng Yu
a65050c6f1 x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:

  394db20ca2 ("x86/fpu: Disable AVX when eagerfpu is off")

... is incorrect and that the original behavior of the FPU code was correct.

Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:

  Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:

   'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
    MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
    an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
    by the new task.'

  Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
  Specification:

   'AVX instructions refer to exceptions by classes that include #NM
    "Device Not Available" exception for lazy context switch.'

So revert the commit.

Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:15:58 +01:00
Andy Lutomirski
2a41aa4feb x86/entry/32: Add and check a stack canary for the SYSENTER stack
The first instruction of the SYSENTER entry runs on its own tiny
stack.  That stack can be used if a #DB or NMI is delivered before
the SYSENTER prologue switches to a real stack.

We have code in place to prevent us from overflowing the tiny stack.
For added paranoia, add a canary to the stack and check it in
do_debug() -- that way, if something goes wrong with the #DB logic,
we'll eventually notice.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
Andy Lutomirski
7536656f08 x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
Right after SYSENTER, we can get a #DB or NMI.  On x86_32, there's no IST,
so the exception handler is invoked on the temporary SYSENTER stack.

Because the SYSENTER stack is very small, we have a fixup to switch
off the stack quickly when this happens.  The old fixup had several issues:

 1. It checked the interrupt frame's CS and EIP.  This wasn't
    obviously correct on Xen or if vm86 mode was in use [1].

 2. In the NMI handler, it did some frightening digging into the
    stack frame.  I'm not convinced this digging was correct.

 3. The fixup didn't switch stacks and then switch back.  Instead, it
    synthesized a brand new stack frame that would redirect the IRET
    back to the SYSENTER code.  That frame was highly questionable.
    For one thing, if NMI nested inside #DB, we would effectively
    abort the #DB prologue, which was probably safe but was
    frightening.  For another, the code used PUSHFL to write the
    FLAGS portion of the frame, which was simply bogus -- by the time
    PUSHFL was called, at least TF, NT, VM, and all of the arithmetic
    flags were clobbered.

Simplify this considerably.  Instead of looking at the saved frame
to see where we came from, check the hardware ESP register against
the SYSENTER stack directly.  Malicious user code cannot spoof the
kernel ESP register, and by moving the check after SAVE_ALL, we can
use normal PER_CPU accesses to find all the relevant addresses.

With this patch applied, the improved syscall_nt_32 test finally
passes on 32-bit kernels.

[1] It isn't obviously correct, but it is nonetheless safe from vm86
    shenanigans as far as I can tell.  A user can't point EIP at
    entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32,
    like all kernel addresses, is greater than 0xffff and would thus
    violate the CS segment limit.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
Andy Lutomirski
f2b375756c x86/entry: Vastly simplify SYSENTER TF (single-step) handling
Due to a blatant design error, SYSENTER doesn't clear TF (single-step).

As a result, if a user does SYSENTER with TF set, we will single-step
through the kernel until something clears TF.  There is absolutely
nothing we can do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

  1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
     add TF to that list of flags to sanitize with no overhead whatsoever.

  2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
    making sure we clear TF before any branches in the SYSENTER
    code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:13 +01:00
Andy Lutomirski
8bb5643686 x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
Leaving any bits set in DR6 on return from a debug exception is
asking for trouble.  Prevent it by writing zero right away and
clarify the comment.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3857676e1be8fb27db4b89bbb1e2052b7f435ff4.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:13 +01:00
Andy Lutomirski
81edd9f69a x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
The SDM says that debug exceptions clear BTF, and we need to keep
TIF_BLOCKSTEP in sync with BTF.  Clear it unconditionally and improve
the comment.

I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not
to be cleared was just an oversight.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:13 +01:00
Andy Lutomirski
f363938c70 x86/fpu: Fix 'no387' regression
After fixing FPU option parsing, we now parse the 'no387' boot option
too early: no387 clears X86_FEATURE_FPU before it's even probed, so
the boot CPU promptly re-enables it.

I suspect it gets even more confused on SMP.

Fix the probing code to leave X86_FEATURE_FPU off if it's been
disabled by setup_clear_cpu_cap().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Fixes: 4f81cbafcc ("x86/fpu: Fix early FPU command-line parsing")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-09 13:54:40 +01:00
Tony Luck
92b0729c34 x86/mm, x86/mce: Add memcpy_mcsafe()
Make use of the EXTABLE_FAULT exception table entries to write
a kernel copy routine that doesn't crash the system if it
encounters a machine check. Prime use case for this is to copy
from large arrays of non-volatile memory used as storage.

We have to use an unrolled copy loop for now because current
hardware implementations treat a machine check in "rep mov"
as fatal. When that is fixed we can simplify.

Return type is a "bool". True means that we copied OK, false means
that it didn't.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@gmail.com>
Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 17:54:38 +01:00
Andy Lutomirski
0dd0036f6e x86/asm-offsets: Remove PARAVIRT_enabled
It no longer has any users.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 14:16:44 +01:00
Andy Lutomirski
58a5aac533 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
x86_64 has very clean espfix handling on paravirt: espfix64 is set
up in native_iret, so paravirt systems that override iret bypass
espfix64 automatically.  This is robust and straightforward.

x86_32 is messier.  espfix is set up before the IRET paravirt patch
point, so it can't be directly conditionalized on whether we use
native_iret.  We also can't easily move it into native_iret without
regressing performance due to a bizarre consideration.  Specifically,
on 64-bit kernels, the logic is:

  if (regs->ss & 0x4)
          setup_espfix;

On 32-bit kernels, the logic is:

  if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
      (regs->flags & X86_EFLAGS_VM) == 0)
          setup_espfix;

The performance of setup_espfix itself is essentially irrelevant, but
the comparison happens on every IRET so its performance matters.  On
x86_64, there's no need for any registers except flags to implement
the comparison, so we fold the whole thing into native_iret.  On
x86_32, we don't do that because we need a free register to
implement the comparison efficiently.  We therefore do espfix setup
before restoring registers on x86_32.

This patch gets rid of the explicit paravirt_enabled check by
introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
to skip espfix on paravirt systems where iret != native_iret.  This is
also messy, but it's at least in line with other things we do.

This improves espfix performance by removing a branch, but no one
cares.  More importantly, it removes a paravirt_enabled user, which is
good because paravirt_enabled is ill-defined and is going away.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 14:16:44 +01:00
Kostenzer Felix
8e2a7f5b9a x86/nmi: Mark 'ignore_nmis' as __read_mostly
ignore_nmis is used in two distinct places:

 1. modified through {stop,restart}_nmi by alternative_instructions
 2. read by do_nmi to determine if default_do_nmi should be called or not

thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath.

Signed-off-by: Kostenzer Felix <fkostenzer@live.at>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:48:19 +01:00
Aravind Gopalakrishnan
ea2ca36b65 x86/mce/AMD: Document some functionality
In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here. Also,
trim comments around threshold_restart_bank() and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Shorten comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:15 +01:00
Aravind Gopalakrishnan
8dd1e17a55 x86/mce/AMD: Fix logic to obtain block address
In upcoming processors, the BLKPTR field is no longer used to indicate
the MSR number of the additional register. Insted, it simply indicates
the prescence of additional MSRs.

Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC()
for newer processors and fall back to existing logic for older
processors.

[ Drop nextaddr_out label; style cleanups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Aravind Gopalakrishnan
be0aec23bf x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Ingo Molnar
a1a8ba2d4a Merge branch 'linus' into ras/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:00 +01:00
Ingo Molnar
ec87e1cf7d Linux 4.5-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW3LO0AAoJEHm+PkMAQRiGhewIAIVHA1+qSSXEHTFeuLRuYpiz
 +ptQUIjPJdakWm/XqOnwSG8SWUuD4XL6ysfNmLSZIdqXYBAPpAuwT1UA2FZhz0dN
 soZxMNleAvzHWRDFLqwjVdOVlTxS6CTTdEQNzi+3R0ZCADllsRcuj/GBIY+M8cr6
 LvxK8BnhDU+Au3gZQjaujTMO7fKG6gOq4wKz/U7RIG37A6rwW577kEfLg4ZgFwt9
 RVjsky5mrX9+4l3QFtox9ZC383P/0VZ6+vXwN2QH1/joDK4EvA8pCwsGTyjRJiqi
 fArHbS+mHyAtbPWJmDbVlQ5dkZJAqRgtWBydjQYoC16S4Bwdce2/FbhBiTgEQAo=
 =sqln
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fix

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-07 09:27:30 +01:00
Ingo Molnar
bc94b99636 Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into core/resources, to resolve conflict

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-04 12:12:08 +01:00
Todd E Brandt
92f9e179a7 PM / sleep / x86: Fix crash on graph trace through x86 suspend
Pause/unpause graph tracing around do_suspend_lowlevel as it has
inconsistent call/return info after it jumps to the wakeup vector.
The graph trace buffer will otherwise become misaligned and
may eventually crash and hang on suspend.

To reproduce the issue and test the fix:
Run a function_graph trace over suspend/resume and set the graph
function to suspend_devices_and_enter. This consistently hangs the
system without this fix.

Signed-off-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-03-03 02:28:28 +01:00
Ingo Molnar
39a1142dbb Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:55:22 +01:00
Thomas Gleixner
1f12e32f4c x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Kees Cook
9ccaf77cf0 x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.

This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:51:38 +01:00
Tony Luck
0f68c088c0 x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
The Intel Software Developer Manual describes bit 24 in the MCG_CAP
MSR:

   MCG_SER_P (software error recovery support present) flag,
   bit 24 — Indicates (when set) that the processor supports
   software error recovery

But only some models with this capability bit set will actually
generate recoverable machine checks.

Check the model name and set a synthetic capability bit. Provide
a command line option to set this bit anyway in case the kernel
doesn't recognise the model name.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:28:47 +01:00
Ingo Molnar
3a2f2ac9b9 Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:28:03 +01:00
Tony Luck
b2f9d678e2 x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
Extend the severity checking code to add a new context IN_KERN_RECOV
which is used to indicate that the machine check was triggered by code
in the kernel tagged with _ASM_EXTABLE_FAULT() so that the ex_handler_fault()
handler will provide the fixup code with the trap number.

Major re-work to the tail code in do_machine_check() to make all this
readable/maintainable. One functional change is that tolerant=3 no longer
stops recovery actions. Revert to only skipping sending SIGBUS to the
current process.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/89d243d05a7943bb187d1074bb30d9c4f482d5f5.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:22:42 +01:00
Tony Luck
548acf1923 x86/mm: Expand the exception table logic to allow new handling options
Huge amounts of help from  Andy Lutomirski and Borislav Petkov to
produce this. Andy provided the inspiration to add classes to the
exception table with a clever bit-squeezing trick, Boris pointed
out how much cleaner it would all be if we just had a new field.

Linus Torvalds blessed the expansion with:

  ' I'd rather not be clever in order to save just a tiny amount of space
    in the exception table, which isn't really criticial for anybody. '

The third field is another relative function pointer, this one to a
handler that executes the actions.

We start out with three handlers:

 1: Legacy - just jumps the to fixup IP
 2: Fault - provide the trap number in %ax to the fixup code
 3: Cleaned up legacy for the uaccess error hack

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:21:46 +01:00
Borislav Petkov
27f6d22b03 perf/x86: Move perf_event.h to its new home
Now that all functionality has been moved to arch/x86/events/, move the
perf_event.h header and adjust include paths.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-18-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:36 +01:00
Borislav Petkov
65a27a3510 perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-17-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:36 +01:00
Borislav Petkov
5e865ed44b perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-16-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:36 +01:00
Borislav Petkov
f03e97dbd2 perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-15-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:35 +01:00
Borislav Petkov
edbb591870 perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-14-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:35 +01:00
Borislav Petkov
ed367e6ca4 perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-13-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:34 +01:00
Borislav Petkov
92553e40c6 perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-12-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:11:26 +01:00
Borislav Petkov
35bf705c25 perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:48 +01:00
Borislav Petkov
6bcb2db547 perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:47 +01:00
Borislav Petkov
609d809f83 perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:47 +01:00
Borislav Petkov
fd1c601c25 perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:47 +01:00
Borislav Petkov
c85cc4497f perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:46 +01:00
Borislav Petkov
7010d12913 perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:46 +01:00
Borislav Petkov
6aec1ad736 perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1455098123-11740-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:09:46 +01:00