Commit Graph

25465 Commits

Author SHA1 Message Date
Peter Zijlstra
7c4788950b x86/uaccess, sched/preempt: Verify access_ok() context
I recently encountered wreckage because access_ok() was used where it
should not be, add an explicit WARN when access_ok() is used wrongly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-12-06 10:32:40 +01:00
Ingo Molnar
0a21fc1214 sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable
Right now CONFIG_SCHED_MC_PRIO has X86_INTEL_PSTATE as a dependency,
which is not enabled by default and which hides the CONFIG_SCHED_MC_PRIO
hardware-enabling feature.

Select X86_INTEL_PSTATE instead, plus its dependency (CPU_FREQ), if the
user enables CONFIG_SCHED_MC_PRIO=y.

(Also align the CONFIG_SCHED_MC_PRIO Kconfig help text in standard style.)

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-30 08:36:10 +01:00
Tim Chen
de966cf4a4 sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO
Rename CONFIG_SCHED_ITMT for Intel Turbo Boost Max Technology 3.0
to CONFIG_SCHED_MC_PRIO.  This makes the configuration extensible
in future to other architectures that wish to similarly establish
CPU core priorities support in the scheduler.

The description in Kconfig is updated to reflect this change with
added details for better clarity.  The configuration is explicitly
default-y, to enable the feature on CPUs that have this feature.

It has no effect on non-TBM3 CPUs.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: linux-acpi@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/2b2ee29d93e3f162922d72d0165a1405864fbb23.1480444902.git.tim.c.chen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-30 08:27:08 +01:00
Ingo Molnar
a293b3954a x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h>
asm/mutex.h is gone from the locking tree, which makes sched/core break the build.

Use linux/mutex.h instead, which is the canonical method.

Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: bp@suse.de
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-28 09:43:49 +01:00
Tim Chen
d3d37d850d x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.

If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24 20:44:20 +01:00
Tim Chen
f9793e3495 x86/sysctl: Add sysctl for ITMT scheduling feature
Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.

Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.

By default, system that is ITMT capable and single
socket has this feature turned on.  It is more likely
to be lightly loaded and operates in Turbo range.

When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24 20:44:19 +01:00
Tim Chen
5e76b2ab36 x86: Enable Intel Turbo Boost Max Technology 3.0
On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package.  In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.

To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency.  In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach.  At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.

The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function.  The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24 20:44:19 +01:00
Tim Chen
7d25127cef x86/topology: Define x86's arch_update_cpu_topology
The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.

So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.

Request the rebuild when the x86 internal update flag is set.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-24 20:44:19 +01:00
Ingo Molnar
ec84f00567 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-23 10:23:09 +01:00
Linus Torvalds
7cfc4317ea Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:
   - two fixes to make (very) old Intel CPUs boot reliably
   - fix the intel-mid driver and rename it
   - two KASAN false positive fixes
   - an FPU fix
   - two sysfb fixes
   - two build fixes related to new toolchain versions"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
  x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
  x86/platform/intel-mid: Register watchdog device after SCU
  x86/fpu: Fix invalid FPU ptrace state after execve()
  x86/boot: Fail the boot if !M486 and CPUID is missing
  x86/traps: Ignore high word of regs->cs in early_fixup_exception()
  x86/dumpstack: Prevent KASAN false positive warnings
  x86/unwind: Prevent KASAN false positive warnings in guess unwinder
  x86/boot: Avoid warning for zero-filling .bss
  x86/sysfb: Fix lfb_size calculation
  x86/sysfb: Add support for 64bit EFI lfb_base
2016-11-22 12:17:49 -08:00
Andy Shevchenko
e5dce28688 x86/platform/intel-mid: Rename platform_wdt to platform_mrfld_wdt
Rename the watchdog platform library file to explicitly show that is used only
on Intel Merrifield platforms.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161118172723.179761-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 11:07:11 +01:00
H.J. Lu
a980ce352f x86/build: Build compressed x86 kernels as PIE when !CONFIG_RELOCATABLE as well
Since the bootloader may load the compressed x86 kernel at any address,
it should always be built as PIE, not just when CONFIG_RELOCATABLE=y.

Otherwise, linker in binutils 2.27 will optimize GOT load into the
absolute address when building the compressed x86 kernel as a non-PIE
executable.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
[ Small wording changes. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 11:05:28 +01:00
Andy Shevchenko
8c5c86fb6a x86/platform/intel-mid: Register watchdog device after SCU
Watchdog device in Intel Tangier relies on SCU to be present. It uses the SCU
IPC channel to send commands and receive responses. If watchdog driver is
initialized quite before SCU and a command has been sent the result is always
an error like the following:

	intel_mid_wdt: Error stopping watchdog: 0xffffffed

Register watchdog device whne SCU is ready to avoid described issue.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161118165224.175514-1-andriy.shevchenko@linux.intel.com
[ Small cleanups. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 10:59:14 +01:00
Yu-cheng Yu
b22cbe404a x86/fpu: Fix invalid FPU ptrace state after execve()
Robert O'Callahan reported that after an execve PTRACE_GETREGSET
NT_X86_XSTATE continues to return the pre-exec register values
until the exec'ed task modifies FPU state.

The test code is at:

  https://bugzilla.redhat.com/attachment.cgi?id=1164286.

What is happening is fpu__clear() does not properly clear fpstate.
Fix it by doing just that.

Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479402695-6553-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 10:38:35 +01:00
Andy Lutomirski
ed68d7e9b9 x86/boot: Fail the boot if !M486 and CPUID is missing
Linux will have all kinds of sporadic problems on systems that don't
have the CPUID instruction unless CONFIG_M486=y.  In particular,
sync_core() will explode.

I believe that these kernels had a better chance of working before
commit 05fb3c199b ("x86/boot: Initialize FPU and X86_FEATURE_ALWAYS
even if we don't have CPUID").  That commit inadvertently fixed a
serious bug: we used to fail to detect the FPU if CPUID wasn't
present.  Because we also used to forget to set X86_FEATURE_ALWAYS, we
end up with no cpu feature bits set at all.  This meant that
alternative patching didn't do anything and, if paravirt was disabled,
we could plausibly finish the entire boot process without calling
sync_core().

Rather than trying to work around these issues, just have the kernel
fail loudly if it's running on a CPUID-less 486, doesn't have CPUID,
and doesn't have CONFIG_M486 set.

Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/70eac6639f23df8be5fe03fa1984aedd5d40077a.1479598603.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 09:04:32 +01:00
Andy Lutomirski
fc0e81b2be x86/traps: Ignore high word of regs->cs in early_fixup_exception()
On the 80486 DX, it seems that some exceptions may leave garbage in
the high bits of CS.  This causes sporadic failures in which
early_fixup_exception() refuses to fix up an exception.

As far as I can tell, this has been buggy for a long time, but the
problem seems to have been exacerbated by commits:

  1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
  e1bfc11c5a ("x86/init: Fix cr4_init_shadow() on CR4-less machines")

This appears to have broken for as long as we've had early
exception handling.

[ Note to stable maintainers: This patch is needed all the way back to 3.4,
  but it will only apply to 4.6 and up, as it depends on commit:

    0e861fbb5b ("x86/head: Move early exception panic code into early_fixup_exception()")

  If you want to backport to kernels before 4.6, please don't backport the
  prerequisites (there was a big chain of them that rewrote a lot of the
  early exception machinery); instead, ask me and I can send you a one-liner
  that will apply. ]

Reported-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 4c5023a3fa ("x86-32: Handle exception table entries during early boot")
Link: http://lkml.kernel.org/r/cb32c69920e58a1a58e7b5cad975038a69c0ce7d.1479609510.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-21 08:06:54 +01:00
Linus Torvalds
dce9ce3615 KVM fixes for v4.9-rc6
ARM:
  - Fix handling of the 32bit cycle counter
  - Fix cycle counter filtering
 
 x86:
  - Fix a race leading to double unregistering of user notifiers
  - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead
  - Use SRCU around kvm_lapic_set_vapic_addr
  - Avoid recursive flushing of asynchronous page faults
  - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP
  - Let userspace know that KVM_GET_CLOCK is useful with master clock;
    4.9 changed the return value to better match the guest clock, but
    didn't provide means to let guests take advantage of it
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJYMKbdAAoJEED/6hsPKofoPcEIAJF7hsuO3B2dMfUTz1EK+4IH
 B7JXr9mlAAEG61y82EY06Es+3gt69XBiE5iKBpxlL6jIJJiUOd+oOdygV0hv4D0K
 G6A03DsCWX16yJKjS7oGq4WOAiDGOpk7SU5YYlFZGqCzhaqScY2ecQFKEUYayJtt
 nXG+i22eFKccrD8wlkm3ZYEjl1Hif7bUmHfxL/CBec1cDNxOys1dB24VsZl90n89
 7pMUtzOTskUXjbNX+cKmFtR18/XUdlucnn0w9AApf3M8GnmUxIjIaeFSLbzuNz84
 U2o3LdxrYysSKSsc7VleHtWVfCbPbC62vpUI51XdNw0u7BHlKkVdvBfJEUmSpkw=
 =Crjd
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - Fix handling of the 32bit cycle counter
   - Fix cycle counter filtering

  x86:
   - Fix a race leading to double unregistering of user notifiers
   - Amend oversight in kvm_arch_set_irq that turned Hyper-V code dead
   - Use SRCU around kvm_lapic_set_vapic_addr
   - Avoid recursive flushing of asynchronous page faults
   - Do not rely on deferred update in KVM_GET_CLOCK, which fixes #GP
   - Let userspace know that KVM_GET_CLOCK is useful with master clock;
     4.9 changed the return value to better match the guest clock, but
     didn't provide means to let guests take advantage of it"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
  KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
  KVM: async_pf: avoid recursive flushing of work items
  kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
  KVM: Disable irq while unregistering user notifier
  KVM: x86: do not go through vcpu in __get_kvmclock_ns
  KVM: arm64: Fix the issues when guest PMCCFILTR is configured
  arm64: KVM: pmu: Fix AArch32 cycle counter access
2016-11-19 13:31:40 -08:00
Paolo Bonzini
a2b07739ff kvm: x86: merge kvm_arch_set_irq and kvm_arch_set_irq_inatomic
kvm_arch_set_irq is unused since commit b97e6de9c9.  Merge
its functionality with kvm_arch_set_irq_inatomic.

Reported-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:19 +01:00
Paolo Bonzini
7301d6abae KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
Reported by syzkaller:

    [ INFO: suspicious RCU usage. ]
    4.9.0-rc4+ #47 Not tainted
    -------------------------------
    ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!

    stack backtrace:
    CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
     0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
     ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
     [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
     [<     inline     >] __kvm_memslots include/linux/kvm_host.h:534
     [<     inline     >] kvm_memslots include/linux/kvm_host.h:541
     [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
     [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: fda4e2e855
Cc: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:18 +01:00
Paolo Bonzini
e3fd9a93a1 kvm: kvmclock: let KVM_GET_CLOCK return whether the master clock is in use
Userspace can read the exact value of kvmclock by reading the TSC
and fetching the timekeeping parameters out of guest memory.  This
however is brittle and not necessary anymore with KVM 4.11.  Provide
a mechanism that lets userspace know if the new KVM_GET_CLOCK
semantics are in effect, and---since we are at it---if the clock
is stable across all VCPUs.

Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:16 +01:00
Ignacio Alvarado
1650b4ebc9 KVM: Disable irq while unregistering user notifier
Function user_notifier_unregister should be called only once for each
registered user notifier.

Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.

Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Cc: stable@vger.kernel.org
Fixes: 18863bdd60 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 19:04:04 +01:00
Paolo Bonzini
8b95344064 KVM: x86: do not go through vcpu in __get_kvmclock_ns
Going through the first VCPU is wrong if you follow a KVM_SET_CLOCK with
a KVM_GET_CLOCK immediately after, without letting the VCPU run and
call kvm_guest_time_update.

To fix this, compute the kvmclock value ourselves, using the master
clock (tsc, nsec) pair as the base and the host CPU frequency as
the scale.

Reported-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-11-19 18:03:03 +01:00
Linus Torvalds
04e36857d6 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
 "Here are some regression fixes for kbuild:

   - modversion support for exported asm symbols (Nick Piggin). The
     affected architectures need separate patches adding
     asm-prototypes.h.

   - fix rebuilds of lib-ksyms.o (Nick Piggin)

   - -fno-PIE builds (Sebastian Siewior and Borislav Petkov). This is
     not a kernel regression, but one of the Debian gcc package.
     Nevertheless, it's quite annoying, so I think it should go into
     mainline and stable now"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Steal gcc's pie from the very beginning
  kbuild: be more careful about matching preprocessed asm ___EXPORT_SYMBOL
  x86/kexec: add -fno-PIE
  scripts/has-stack-protector: add -fno-PIE
  kbuild: add -fno-PIE
  kbuild: modversions for EXPORT_SYMBOL() for asm
  kbuild: prevent lib-ksyms.o rebuilds
2016-11-18 16:45:21 -08:00
Josh Poimboeuf
91e08ab0c8 x86/dumpstack: Prevent KASAN false positive warnings
The oops stack dump code scans the entire stack, which can cause KASAN
"stack-out-of-bounds" false positive warnings.  Tell KASAN to ignore it.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: davej@codemonkey.org.uk
Cc: dvyukov@google.com
Link: http://lkml.kernel.org/r/5f6e80c4b0c7f7f0b6211900847a247cdaad753c.1479398226.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-18 09:38:00 +01:00
Josh Poimboeuf
c2d75e03d6 x86/unwind: Prevent KASAN false positive warnings in guess unwinder
The guess unwinder scans the entire stack, which can cause KASAN
"stack-out-of-bounds" false positive warnings.  Tell KASAN to ignore it.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: davej@codemonkey.org.uk
Cc: dvyukov@google.com
Link: http://lkml.kernel.org/r/61939c0b2b2d63ce97ba59cba3b00fd47c2962cf.1479398226.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-18 09:38:00 +01:00
Arnd Bergmann
553bbc11aa x86/boot: Avoid warning for zero-filling .bss
The latest binutils are warning about a .fill directive with an explicit
value in a .bss section:

  arch/x86/kernel/head_32.S: Assembler messages:
  arch/x86/kernel/head_32.S:677: Warning: ignoring fill value in section `.bss..page_aligned'
  arch/x86/kernel/head_32.S:679: Warning: ignoring fill value in section `.bss..page_aligned'

This comes from the 'ENTRY()' macro padding the space between the symbols
with 'nop' via:

  .align 4,0x90

Open-coding the .globl directive without the padding avoids that warning,
as all the symbols are already page aligned.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161116141726.2013389-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-17 07:34:58 +01:00
Martin Schwidefsky
f285144f81 sched/x86: Do not clear PREEMPT_NEED_RESCHED on preempt count reset
The per-cpu preempt count of x86 contains two values, the actual preempt
count and the inverted PREEMPT_NEED_RESCHED bit. If a corrupted preempt
count is detected the preempt_count_set() function is used to reset the
preempt count.

In case the inverted PREEMPT_NEED_RESCHED bit is zero at the time of the
reset, the preemption indication is lost. Use raw_cpu_cmpxchg_4() to reset
only the count part and leave the PREEMPT_NEED_RESCHED bit as it is.

This improves the kernel's behavior when it runs into preempt count leaks
and tries to fix them up.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1478523660-733-1-git-send-email-schwidefsky@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 10:29:04 +01:00
Ingo Molnar
0acbc7aa47 Merge branch 'linus' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 10:16:28 +01:00
David Herrmann
f96acec8c8 x86/sysfb: Fix lfb_size calculation
The screen_info.lfb_size field is shifted by 16 bits *only* in case of
VBE. This has historical reasons since VBE advertised it similarly.
However, in case of EFI framebuffers, the size is no longer shifted. Fix
the x86 simple-framebuffer setup code to use the correct size in the
non-VBE case.

While at it, avoid variable abbreviations and rename 'len' to 'length',
and use the correct types matching the screen_info definition.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Gundersen <teg@jklm.no>
Link: http://lkml.kernel.org/r/20161115120158.15388-3-dh.herrmann@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 09:38:23 +01:00
David Herrmann
9164b4ceb7 x86/sysfb: Add support for 64bit EFI lfb_base
The screen_info object was extended to support 64-bit lfb_base addresses
in:

  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

However, the x86 simple-framebuffer setup code never made use of it. Fix
it to properly assemble and verify the lfb_base before advertising
simple-framebuffer devices.

In particular, this means if VIDEO_CAPABILITY_64BIT_BASE is set, the
screen_info->ext_lfb_base field will contain the upper 32bit of the
actual lfb_base. Make sure the address is not 0 (i.e., unset), as well as
does not overflow the physical address type.

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Gundersen <teg@jklm.no>
Link: http://lkml.kernel.org/r/20161115120158.15388-2-dh.herrmann@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-16 09:38:22 +01:00
Stanislaw Gruszka
353c50ebe3 sched/cputime: Simplify task_cputime()
Now since fetch_task_cputime() has no other users than task_cputime(),
its code could be used directly in task_cputime().

Moreover since only 2 task_cputime() calls of 17 use a NULL argument,
we can add dummy variables to those calls and remove NULL checks from
task_cputimes().

Also remove NULL checks from task_cputimes_scaled().

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1479175612-14718-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15 09:51:05 +01:00
Linus Torvalds
8528d66248 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix an Intel/MID boot crash/hang bug

   - fix a cache topology mis-parsing bug on certain AMD CPUs

   - fix a virtualization firmware bug by adding a check+quirk
     workaround on the kernel side"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Deal with broken firmware (VMWare/XEN)
  x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
  x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook
2016-11-14 08:39:56 -08:00
Linus Torvalds
5ad62a9e5c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "An uncore PMU driver hardware enablement change for Intel SkyLake
  uncore PMUs (Skylake Y, U, H and S platforms), plus a number of
  tooling fixes for the histogram handling/displaying code"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake
  perf hists: Fix column length on --hierarchy
  perf hists browser: Fix column indentation on --hierarchy
  perf hists browser: Show folded sign properly on --hierarchy
  perf hists browser: Fix indentation of folded sign on --hierarchy
  perf hist browser: Fix hierarchy column counts
2016-11-14 08:30:06 -08:00
Matt Fleming
f6697df36b x86/efi: Prevent mixed mode boot corruption with CONFIG_VMAP_STACK=y
Booting an EFI mixed mode kernel has been crashing since commit:

  e37e43a497 ("x86/mm/64: Enable vmapped stacks (CONFIG_HAVE_ARCH_VMAP_STACK=y)")

The user-visible effect in my test setup was the kernel being unable
to find the root file system ramdisk. This was likely caused by silent
memory or page table corruption.

Enabling CONFIG_DEBUG_VIRTUAL=y immediately flagged the thunking code as
abusing virt_to_phys() because it was passing addresses that were not
part of the kernel direct mapping.

Use the slow version instead, which correctly handles all memory
regions by performing a page table walk.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-3-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-13 08:26:40 +01:00
Borislav Petkov
02e56902e4 x86/efi: Fix EFI memmap pointer size warning
Fix this when building on 32-bit:

  arch/x86/platform/efi/efi.c: In function ‘__efi_enter_virtual_mode’:
  arch/x86/platform/efi/efi.c:911:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
       (efi_memory_desc_t *)pa);
       ^
  arch/x86/platform/efi/efi.c:918:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
       (efi_memory_desc_t *)pa);
       ^

The @pa local variable is declared as phys_addr_t and that is a u64 when
CONFIG_PHYS_ADDR_T_64BIT=y. (The last is enabled on 32-bit on a PAE
build.)

However, its value comes from __pa() which is basically doing pointer
arithmetic and checking, and returns unsigned long as it is the native
pointer width.

So let's use an unsigned long too. It should be fine to do so because
the later users cast it to a pointer too.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20161112210424.5157-2-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-13 08:26:40 +01:00
Arnd Bergmann
beae2c9eb5 crypto: aesni: shut up -Wmaybe-uninitialized warning
The rfc4106 encrypy/decrypt helper functions cause an annoying
false-positive warning in allmodconfig if we turn on
-Wmaybe-uninitialized warnings again:

  arch/x86/crypto/aesni-intel_glue.c: In function ‘helper_rfc4106_decrypt’:
  include/linux/scatterlist.h:67:31: warning: ‘dst_sg_walk.sg’ may be used uninitialized in this function [-Wmaybe-uninitialized]

The problem seems to be that the compiler doesn't track the state of the
'one_entry_in_sg' variable across the kernel_fpu_begin/kernel_fpu_end
section.

This takes the easy way out by adding a bogus initialization, which
should be harmless enough to get the patch into v4.9 so we can turn on
this warning again by default without producing useless output.  A
follow-up patch for v4.10 rearranges the code to make the warning go
away.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:45:08 -08:00
Arnd Bergmann
3a6d867612 x86: apm: avoid uninitialized data
apm_bios_call() can fail, and return a status in its argument structure.
If that status however is zero during a call from
apm_get_power_status(), we end up using data that may have never been
set, as reported by "gcc -Wmaybe-uninitialized":

  arch/x86/kernel/apm_32.c: In function ‘apm’:
  arch/x86/kernel/apm_32.c:1729:17: error: ‘bx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1835:5: error: ‘cx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1730:17: note: ‘cx’ was declared here
  arch/x86/kernel/apm_32.c:1842:27: error: ‘dx’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
  arch/x86/kernel/apm_32.c:1731:17: note: ‘dx’ was declared here

This changes the function to return "APM_NO_ERROR" here, which makes the
code more robust to broken BIOS versions, and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-11 08:45:08 -08:00
Kan Liang
d786810b2f perf/x86/intel/uncore: Add more Intel uncore IMC PCI IDs for SkyLake
Several uncore IMC PCI IDs are missed for Intel SkyLake.

Add the PCI IDs for SkyLake Y, U, H and S platforms.
Rename the ID macros for 0x191f and 0x190c.

The corresponding bug:

  https://bugzilla.kernel.org/show_bug.cgi?id=187301

The related datasheets are also attached in the bug entry for permanent reference.

Reported-by: Ben Widawsky <benjamin.widawsky@intel.com>
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1478631281-5061-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-11 08:30:22 +01:00
Sebastian Andrzej Siewior
90944e40ba x86/kexec: add -fno-PIE
If the gcc is configured to do -fPIE by default then the build aborts
later with:
| Unsupported relocation type: unknown type rel type name (29)

Tagging it stable so it is possible to compile recent stable kernels as
well.

Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-11-09 22:28:09 +01:00
Thomas Gleixner
d49597fd3b x86/cpu: Deal with broken firmware (VMWare/XEN)
Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <ciwillia@brocade.com>,
Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-09 21:05:01 +01:00
Yazen Ghannam
b0b6e86846 x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org> # v4.4..
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3849e91f57 ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-09 17:06:08 +01:00
Lukas Wunner
e8a6123e9e x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hook
Commit cc7cc02bad ("PCI: Query platform firmware for device power
state") augmented struct pci_platform_pm_ops with a ->get_state hook and
implemented it for acpi_pci_platform_pm, the only pci_platform_pm_ops
existing till v4.7.

However v4.8 introduced another pci_platform_pm_ops for Intel Mobile
Internet Devices with commit 5823d0893e ("x86/platform/intel-mid: Add
Power Management Unit driver").  It is missing the ->get_state hook,
which is fatal since pci_set_platform_pm() enforces its presence.  Andy
Shevchenko reports that without the present commit, such a device
"crashes without even a character printed out on serial console and
reboots (since watchdog)".

Retrofit mid_pci_platform_pm with the missing callback to fix the
breakage.

Acked-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: cc7cc02bad ("PCI: Query platform firmware for device power state")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: http://lkml.kernel.org/r/7c1567d4c49303a4aada94ba16275cbf56b8976b.1477221514.git.lukas@wunner.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-11-07 13:06:59 +01:00
Linus Torvalds
66cecb6789 One NULL pointer dereference, and two fixes for regressions introduced
during the merge window.  The rest are fixes for MIPS, s390 and nested VMX.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJYG2H5AAoJEL/70l94x66DK/cH/0jEQ3ynuLAd5CKux7JxI/EP
 msSJh1Xqr4+XhXZnuDpGQWrdsBlxoiqA6PsJrUTtyi4nQCDXlT8g+2MDuvqhWIHz
 7vw58j/EMJDCVQzYAbN5VDUfk13uB5aSWTo3M9Rf09v0hU1Ql7z8u4CtKEdLpN5Y
 LY9bT9fxUmXO7REKP7bdW6ZrDX/hUShYHgMqzXGFMyGBG3ym3a9bggXEzTCD6eNQ
 ioogQIWqg+icdhta0iLNAwFClPlcKB2/xo4IUuNgrPwGoHFGJN/8+qxT4+sVbp2B
 v8u1zOXlCFXBcskWE+yRRsGe72+mIzz6QScCyO+5HbhKYVfbE9H7KBlFX9rZZ2c=
 =IbKx
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One NULL pointer dereference, and two fixes for regressions introduced
  during the merge window.

  The rest are fixes for MIPS, s390 and nested VMX"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: Check memopp before dereference (CVE-2016-8630)
  kvm: nVMX: VMCLEAR an active shadow VMCS after last use
  KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
  KVM: x86: fix wbinvd_dirty_mask use-after-free
  kvm/x86: Show WRMSR data is in hex
  kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types
  KVM: document lock orders
  KVM: fix OOPS on flush_work
  KVM: s390: Fix STHYI buffer alignment for diag224
  KVM: MIPS: Precalculate MMIO load resume PC
  KVM: MIPS: Make ERET handle ERL before EXL
  KVM: MIPS: Fix lazy user ASID regenerate for SMP
2016-11-04 13:08:05 -07:00
Owen Hofmann
d9092f52d7 kvm: x86: Check memopp before dereference (CVE-2016-8630)
Commit 41061cdb98 ("KVM: emulate: do not initialize memopp") removes a
check for non-NULL under incorrect assumptions. An undefined instruction
with a ModR/M byte with Mod=0 and R/M-5 (e.g. 0xc7 0x15) will attempt
to dereference a null pointer here.

Fixes: 41061cdb98
Message-Id: <1477592752-126650-2-git-send-email-osh@google.com>
Signed-off-by: Owen Hofmann <osh@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 21:31:53 +01:00
Jim Mattson
355f4fb140 kvm: nVMX: VMCLEAR an active shadow VMCS after last use
After a successful VM-entry with the "VMCS shadowing" VM-execution
control set, the shadow VMCS referenced by the VMCS link pointer field
in the current VMCS becomes active on the logical processor.

A VMCS that is made active on more than one logical processor may become
corrupted. Therefore, before an active VMCS can be migrated to another
logical processor, the first logical processor must execute a VMCLEAR
for the active VMCS. VMCLEAR both ensures that all VMCS data are written
to memory and makes the VMCS inactive.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-By: David Matlack <dmatlack@google.com>
Message-Id: <1477668579-22555-1-git-send-email-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 20:03:17 +01:00
Paolo Bonzini
ea26e4ec08 KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK
Since commit a545ab6a00 ("kvm: x86: add tsc_offset field to struct
kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is
cached and need not be fished out of the VMCS or VMCB.  This means
that we can implement adjust_tsc_offset_guest and read_l1_tsc
entirely in generic code.  The simplification is particularly
significant for VMX code, where vmx->nested.vmcs01_tsc_offset
was duplicating what is now in vcpu->arch.tsc_offset.  Therefore
the vmcs01_tsc_offset can be dropped completely.

More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK
which, after commit 108b249c45 ("KVM: x86: introduce get_kvmclock_ns",
2016-09-01) called read_l1_tsc while the VMCS was not loaded.
It thus returned bogus values on Intel CPUs.

Fixes: 108b249c45
Reported-by: Roman Kagan <rkagan@virtuozzo.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-11-02 20:03:07 +01:00
Thomas Gleixner
1e90a13d0c x86/smpboot: Init apic mapping before usage
The recent changes, which forced the registration of the boot cpu on UP
systems, which do not have ACPI tables, have been fixed for systems w/o
local APIC, but left a wreckage for systems which have neither ACPI nor
mptables, but the CPU has an APIC, e.g. virtualbox.

The boot process crashes in prefill_possible_map() as it wants to register
the boot cpu, which needs to access the local apic, but the local APIC is
not yet mapped.

There is no reason why init_apic_mapping() can't be invoked before
prefill_possible_map(). So instead of playing another silly early mapping
game, as the ACPI/mptables code does, we just move init_apic_mapping()
before the call to prefill_possible_map().

In hindsight, I should have noticed that combination earlier.

Sorry for the churn (also in stable)!

Fixes: ff8560512b ("x86/boot/smp: Don't try to poke disabled/non-existent APIC")
Reported-and-debugged-by: Michal Necasek <michal.necasek@oracle.com>
Reported-and-tested-by: Wolfgang Bauer <wbauer@tmo.at>
Cc: prarit@redhat.com
Cc: ville.syrjala@linux.intel.com
Cc: michael.thayer@oracle.com
Cc: knut.osmundsen@oracle.com
Cc: frank.mehnert@oracle.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610282114380.5053@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-10-29 14:00:46 +02:00
Linus Torvalds
c067affcd3 ACPI fixes for v4.9-rc3
Specifics:
 
  - Fix three ACPICA issues related to the interpreter locking and
    introduced by recent changes in that area (Lv Zheng).
 
  - Fix a PCI IRQ management regression introduced during the 4.7
    cycle and related to the configuration of shared IRQs on systems
    with an ISA bus (Sinan Kaya).
 
  - Fix up a return value of one function in the APEI code (Punit
    Agrawal).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJYE+wwAAoJEILEb/54YlRxe0wQAKIyO3ktsxxbz2iACxFPZGmn
 ML1+OTBIGQKYDSGCINhMV5PGd98IMBaVCB9RllG/B9iALb8VCGiJ6AuJKoR7q2pZ
 6mr7ioXNfTNlLISykt63cD/Lp/YobZMG6WhoNWzoKslVUQrSWAISV+wGpBxoj08i
 8X7t/QtvRIVWfy4H4reDgpQMIKUeDhk6REeb8FESiXYboOvNhbXZpPS+bv8XXEfD
 bu/ASQIZs3Z9YB2uTij16Tx95eJETHhr9zYIxbi848YDxjelpZNs1QuVoYxq3GO2
 S7vbGHZITMLSEz4jD0w98YvDcb0jywfSXX53NBMaSJqOAleVKNH1rE8KvywG6H0s
 2298yBc0o0ldBb+4nszoL7NyGQKDrmxMEGBRFlk67gZ1LA4cpk+Usv9Q0GmNx38H
 KQfuA144n9ICaM9Kw9CKD8xrQ+PtpoTIBXzKGsdqIwHsS+2XSwzgp4IWEuMRfZNu
 5ermtO2tRz47UDnQ5UxdweB0n2pEMrZXDDzBONdqp3ds4aOvOvVqQP3OB+iMaMrT
 rPvVlYLr2Q+ekIzHPCpB7uBwI//bJ3L2cLqHxp9hlbNeFQWdv/NlMlmbWgYVVpn3
 tCvqqpH+jtof8lnmWZtBdc4M0GhwM+CP6TYd65n2yDqVCfPraXokE0UdNBW+je4Z
 BRuhjVEi0vBsrc3M4IC9
 =wjnL
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix recent ACPICA regressions, an older PCI IRQ management
  regression, and an incorrect return value of a function in the APEI
  code.

  Specifics:

   - Fix three ACPICA issues related to the interpreter locking and
     introduced by recent changes in that area (Lv Zheng).

   - Fix a PCI IRQ management regression introduced during the 4.7 cycle
     and related to the configuration of shared IRQs on systems with an
     ISA bus (Sinan Kaya).

   - Fix up a return value of one function in the APEI code (Punit
     Agrawal)"

* tag 'acpi-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: Dispatcher: Fix interpreter locking around acpi_ev_initialize_region()
  ACPICA: Dispatcher: Fix an unbalanced lock exit path in acpi_ds_auto_serialize_method()
  ACPICA: Dispatcher: Fix order issue of method termination
  ACPI / APEI: Fix incorrect return value of ghes_proc()
  ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs
  ACPI/PCI: pci_link: penalize SCI correctly
  ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages
2016-10-28 18:34:19 -07:00
Linus Torvalds
b49c3170bf Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc kernel fixes: a virtualization environment related fix, an uncore
  PMU driver removal handling fix, a PowerPC fix and new events for
  Knights Landing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors
  perf/powerpc: Don't call perf_event_disable() from atomic context
  perf/core: Protect PMU device removal with a 'pmu_bus_running' check, to fix CONFIG_DEBUG_TEST_DRIVER_REMOVE=y kernel panic
  perf/x86/intel/cstate: Add C-state residency events for Knights Landing
2016-10-28 16:27:16 -07:00
Linus Torvalds
c38c04c630 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: three build fixes, an unwinder fix and a microcode loader
  fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y
  x86: Fix export for mcount and __fentry__
  x86/quirks: Hide maybe-uninitialized warning
  x86/build: Fix build with older GCC versions
  x86/unwind: Fix empty stack dereference in guess unwinder
2016-10-28 11:28:14 -07:00