Commit Graph

6597 Commits

Author SHA1 Message Date
Andy Lutomirski
f005f5d860 x86/asm: Make asm/alternative.h safe from assembly
asm/alternative.h isn't directly useful from assembly, but it
shouldn't break the build.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e5b693fcef99fe6e80341c9e97a002fb23871e91.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:56:41 +02:00
Andy Lutomirski
35de5b0692 x86/asm: Stop depending on ptrace.h in alternative.h
alternative.h pulls in ptrace.h, which means that alternatives can't
be used in anything referenced from ptrace.h, which is a mess.

Break the dependency by pulling text patching helpers into their own
header.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/99b93b13f2c9eb671f5c98bba4c2cbdc061293a2.1461698311.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-29 11:56:40 +02:00
Linus Torvalds
814dd9481d Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "x86 PMU driver fixes plus a core code race fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix incorrect lbr_sel_mask value
  perf/x86/intel/pt: Don't die on VMXON
  perf/core: Fix perf_event_open() vs. execve() race
  perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
  perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
  perf/x86/intel/rapl: Add missing Haswell model
  perf/x86/intel: Add model number for Skylake Server to perf
2016-04-28 20:19:04 -07:00
Andy Lutomirski
078194f8e9 x86/mm, sched/core: Turn off IRQs in switch_mm()
Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy.  AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.

Fix it with a big hammer: run switch_mm() with IRQs off.

To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:44:20 +02:00
Andy Lutomirski
69c0319aab x86/mm, sched/core: Uninline switch_mm()
It's fairly large and it has quite a few callers.  This may also
help untangle some headers down the road.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:44:19 +02:00
Mark Rutland
9788375dc4 x86/efi: Enable runtime call flag checking
Define ARCH_EFI_IRQ_FLAGS_MASK for x86, which will enable the generic
runtime wrapper code to detect when firmware erroneously modifies flags
over a runtime services function call.

For x86 (both 32-bit and 64-bit), we only need check the interrupt flag.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darren Hart <dvhart@infradead.org>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Harald Hoyer harald@redhat.com
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Kweh Hock Leong <hock.leong.kweh@intel.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-40-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:34:13 +02:00
Mark Rutland
bc25f9dba1 x86/efi: Move to generic {__,}efi_call_virt()
Now there's a common template for {__,}efi_call_virt(), remove the
duplicate logic from the x86 EFI code.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-35-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:34:09 +02:00
Ard Biesheuvel
21289ec02b x86/efi/efifb: Move DMI based quirks handling out of generic code
The efifb quirks handling based on DMI identification of the platform is
specific to x86, so move it to x86 arch code.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Peter Jones <pjones@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-19-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:33:57 +02:00
Ard Biesheuvel
2c23b73c2d x86/efi: Prepare GOP handling code for reuse as generic code
In preparation of moving this code to drivers/firmware/efi and reusing
it on ARM and arm64, apply any changes that will be required to make this
code build for other architectures. This should make it easier to track
down problems that this move may cause to its operation on x86.

Note that the generic version uses slightly different ways of casting the
protocol methods and some other variables to the correct types, since such
method calls are not loosely typed on ARM and arm64 as they are on x86.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Jones <pjones@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1461614832-17633-17-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 11:33:56 +02:00
Ingo Molnar
0b20e59cef Merge branch 'perf/urgent' into perf/core, to resolve conflict
Conflicts:
	arch/x86/events/intel/pt.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:35:17 +02:00
Alexander Shishkin
1c5ac21a0e perf/x86/intel/pt: Don't die on VMXON
Some versions of Intel PT do not support tracing across VMXON, more
specifically, VMXON will clear TraceEn control bit and any attempt to
set it before VMXOFF will throw a #GP, which in the current state of
things will crash the kernel. Namely:

  $ perf record -e intel_pt// kvm -nographic

on such a machine will kill it.

To avoid this, notify the intel_pt driver before VMXON and after
VMXOFF so that it knows when not to enable itself.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/87oa9dwrfk.fsf@ashishki-desk.ger.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-28 10:32:42 +02:00
Srinivas Pandruvada
dcee75b3b7 perf/x86/intel/rapl: Support Skylake RAPL domains
Add Skylake client support for RAPL domains. In addition to RAPL domains
in Broadwell clients, it has support for platform domain (aka PSys). The
PSys domain controls the entire SoC instead of just a CPU package. Unlike
package domain, PSys support requires more than just processor level
implementation. The other parts in the system need additional HW level
signaling, which OEMs need to support. When not supported, the energy
counter register in PSys domain returns 0.

Also corrected error in comment for GPU counter, which previously was
DRAM counter.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com
[ Cnverted to model_match stuff. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: bp@alien8.de
Cc: hpa@zytor.com
Cc: jacob.jun.pan@linux.intel.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1460930581-29748-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-23 14:13:36 +02:00
Luis R. Rodriguez
867fe800b4 x86/paravirt: Remove paravirt_enabled()
Now that all previous paravirt_enabled() uses were replaced with proper
x86 semantics by the previous patches we can remove the unused
paravirt_enabled() mechanism.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-15-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:29:07 +02:00
Luis R. Rodriguez
80dfd83dfa x86, drivers/pnpbios: Replace paravirt_enabled() check with legacy device check
Since we are removing paravirt_enabled() replace it with a
logical equivalent. Even though PNPBIOS is x86 specific we
add an arch-specific type call, which can be implemented by
any architecture to show how other legacy attribute devices
can later be also checked for with other ACPI legacy attribute
flags.

This implicates the first ACPI 5.2.9.3 IA-PC Boot Architecture
ACPI_FADT_LEGACY_DEVICES flag device, and shows how to add more.

The reason pnpbios gets a defined structure and as such uses
a different approach than the RTC legacy quirk is that ACPI
has a respective RTC flag, while pnpbios does not. We fold
the pnpbios quirk under ACPI_FADT_LEGACY_DEVICES ACPI flag
use case, and use a struct of possible devices to enable
future extensions of this.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
+32     +28    +28         +28

That's 4 byte overhead total, the rest is cleared out on init
as its all __init text.

v2: split out subarch handlng on switch to make it easier
    later to add other subarchs. The 'fall-through' switch
    handling can be confusing and we'll remove it later
    when we add handling for X86_SUBARCH_CE4100.
v3: document vmlinux size impact as per 0-day, and also
    explain why pnpbios is treated differently than the
    RTC legacy feature.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jgross@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-12-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:29:05 +02:00
Luis R. Rodriguez
1330e3bc54 x86/init: Use a platform legacy quirk for EBDA
This replaces the paravirt_enabled() check with a
proper x86 legacy platform quirk.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.text   x86_early_init_platform_quirks()
+39     +35    +35         +25

That's a 4 byte total overhead, the rest is all cleared out
upon init as its all __init text.

v2: document 0-day vmlinux size impact

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jgross@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-7-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:29:02 +02:00
Luis R. Rodriguez
8d152e7a5c x86/rtc: Replace paravirt rtc check with platform legacy quirk
We have 4 types of x86 platforms that disable RTC:

  * Intel MID
  * Lguest - uses paravirt
  * Xen dom-U - uses paravirt
  * x86 on legacy systems annotated with an ACPI legacy flag

We can consolidate all of these into a platform specific legacy
quirk set early in boot through i386_start_kernel() and through
x86_64_start_reservations(). This deals with the RTC quirks which
we can rely on through the hardware subarch, the ACPI check can
be dealt with separately.

For Xen things are bit more complex given that the @X86_SUBARCH_XEN
x86_hardware_subarch is shared on for Xen which uses the PV path for
both domU and dom0. Since the semantics for differentiating between
the two are Xen specific we provide a platform helper to help override
default legacy features -- x86_platform.set_legacy_features(). Use
of this helper is highly discouraged, its only purpose should be
to account for the lack of semantics available within your given
x86_hardware_subarch.

As per 0-day, this bumps the vmlinux size using i386-tinyconfig as
follows:

TOTAL   TEXT   init.text    x86_early_init_platform_quirks()
+70     +62    +62          +43

Only 8 bytes overhead total, as the main increase in size is
all removed via __init.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andrew.cooper3@citrix.com
Cc: andriy.shevchenko@linux.intel.com
Cc: bigeasy@linutronix.de
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: ffainelli@freebox.fr
Cc: george.dunlap@citrix.com
Cc: glin@suse.com
Cc: jlee@suse.com
Cc: josh@joshtriplett.org
Cc: julien.grall@linaro.org
Cc: konrad.wilk@oracle.com
Cc: kozerkov@parallels.com
Cc: lenb@kernel.org
Cc: lguest@lists.ozlabs.org
Cc: linux-acpi@vger.kernel.org
Cc: lv.zheng@intel.com
Cc: matt@codeblueprint.co.uk
Cc: mbizon@freebox.fr
Cc: rjw@rjwysocki.net
Cc: robert.moore@intel.com
Cc: rusty@rustcorp.com.au
Cc: tiwai@suse.de
Cc: toshi.kani@hp.com
Cc: xen-devel@lists.xensource.com
Link: http://lkml.kernel.org/r/1460592286-300-5-git-send-email-mcgrof@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:29:01 +02:00
Ingo Molnar
b2eafe890d Merge branch 'x86/urgent' into x86/asm, to fix semantic conflict
'cpu_has_pse' has changed to boot_cpu_has(X86_FEATURE_PSE), fix this
up in the merge commit when merging the x86/urgent tree that includes
the following commit:

  103f6112f2 ("x86/mm/xen: Suppress hugetlbfs in PV guests")

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:13:53 +02:00
Jan Beulich
103f6112f2 x86/mm/xen: Suppress hugetlbfs in PV guests
Huge pages are not normally available to PV guests. Not suppressing
hugetlbfs use results in an endless loop of page faults when user mode
code tries to access a hugetlbfs mapped area (since the hypervisor
denies such PTEs to be created, but error indications can't be
propagated out of xen_set_pte_at(), just like for various of its
siblings), and - once killed in an oops like this:

  kernel BUG at .../fs/hugetlbfs/inode.c:428!
  invalid opcode: 0000 [#1] SMP
  ...
  RIP: e030:[<ffffffff811c333b>]  [<ffffffff811c333b>] remove_inode_hugepages+0x25b/0x320
  ...
  Call Trace:
   [<ffffffff811c3415>] hugetlbfs_evict_inode+0x15/0x40
   [<ffffffff81167b3d>] evict+0xbd/0x1b0
   [<ffffffff8116514a>] __dentry_kill+0x19a/0x1f0
   [<ffffffff81165b0e>] dput+0x1fe/0x220
   [<ffffffff81150535>] __fput+0x155/0x200
   [<ffffffff81079fc0>] task_work_run+0x60/0xa0
   [<ffffffff81063510>] do_exit+0x160/0x400
   [<ffffffff810637eb>] do_group_exit+0x3b/0xa0
   [<ffffffff8106e8bd>] get_signal+0x1ed/0x470
   [<ffffffff8100f854>] do_signal+0x14/0x110
   [<ffffffff810030e9>] prepare_exit_to_usermode+0xe9/0xf0
   [<ffffffff814178a5>] retint_user+0x8/0x13

This is CVE-2016-3961 / XSA-174.

Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <JGross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: stable@vger.kernel.org
Cc: xen-devel <xen-devel@lists.xenproject.org>
Link: http://lkml.kernel.org/r/57188ED802000078000E431C@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:05:00 +02:00
Baoquan He
e8581e3d67 x86/KASLR: Drop CONFIG_RANDOMIZE_BASE_MAX_OFFSET
Currently CONFIG_RANDOMIZE_BASE_MAX_OFFSET is used to limit the maximum
offset for kernel randomization. This limit doesn't need to be a CONFIG
since it is tied completely to KERNEL_IMAGE_SIZE, and will make no sense
once physical and virtual offsets are randomized separately. This patch
removes CONFIG_RANDOMIZE_BASE_MAX_OFFSET and consolidates the Kconfig
help text.

[kees: rewrote changelog, dropped KERNEL_IMAGE_SIZE_DEFAULT, rewrote help]
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: H.J. Lu <hjl.tools@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1461185746-8017-3-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 10:00:50 +02:00
Michal Hocko
916633a403 locking/rwsem: Provide down_write_killable()
Now that all the architectures implement the necessary glue code
we can introduce down_write_killable(). The only difference wrt. regular
down_write() is that the slow path waits in TASK_KILLABLE state and the
interruption by the fatal signal is reported as -EINTR to the caller.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-12-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-22 08:58:33 +02:00
Liang Chen
c54cdf141c KVM: x86: optimize steal time calculation
Since accumulate_steal_time is now only called in record_steal_time, it
doesn't quite make sense to put the delta calculation in a separate
function. The function could be called thousands of times before guest
enables the steal time MSR (though the compiler may optimize out this
function call). And after it's enabled, the MSR enable bit is tested twice
every time. Removing the accumulate_steal_time function also avoids the
necessity of having the accum_steal field.

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-20 15:29:17 +02:00
Dmitry Safonov
abfb9498ee x86/entry: Rename is_{ia32,x32}_task() to in_{ia32,x32}_syscall()
The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.

A task may call 32-bit and 64-bit and x32 system calls without changing
any of its kernel visible state.

This specific minomer is also actively dangerous, as it might cause kernel
developers to use the wrong kind of security checks within system calls.

So rename it to in_{ia32,x32}_syscall().

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
[ Expanded the changelog. ]
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1460987025-30360-1-git-send-email-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:44:52 +02:00
Ingo Molnar
6666ea558b Linux 4.6-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXFELfAAoJEHm+PkMAQRiGRYIH+wWsUva7TR9arN1ZrURvI17b
 KqyQH8Ov9zJBsIaq/rFXOr5KfNgx7BU9BL9h7QkBy693HXTWf+GTZ1czHM4N12C3
 0ZdHGrLwTHo2zdisiQaFORZSfhSVTUNGXGHXw13bUMgEqatPgkozXEnsvXXNdt1Z
 HtlcuJn3pcj+QIY7qDXZgTLTwgn248hi1AgNag+ntFcWiz21IYaMIi7/mCY9QUIi
 AY+Y3hqFQM7/8cVyThGS5wZPTg1YzdhsLJpoCk0TbS8FvMEnA+ylcTgc15C78bwu
 AxOwM3OCmH4gMsd7Dd/O+i9lE3K6PFrgzdDisYL3P7eHap+EdiLDvVzPDPPx0xg=
 =Q7r3
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc4' into x86/asm, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-19 10:38:52 +02:00
Benjamin LaHaise
b2f680380d x86/mm/32: Add support for 64-bit __get_user() on 32-bit kernels
The existing __get_user() implementation does not support fetching
64-bit values on 32-bit x86.  Implement this in a way that does not
generate any incorrect warnings as cautioned by Russell King.

Test code available at:

  http://www.kvack.org/~bcrl/x86_32-get_user.tar .

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 13:01:03 +02:00
Guenter Roeck
47a541c3e1 x86/platform: Remove unused get_bios_ebda_length() function
get_bios_ebda_length() uses min_t() without including linux/kernel.h.

This may result in build errors with some configurations. Since the
function is not used anywhere in the kernel, let's just drop it.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Waychison <mikew@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459558314-5625-1-git-send-email-linux@roeck-us.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:43:09 +02:00
Andy Lutomirski
b828b79fcc x86/msr: Set the return value to zero when native_rdmsr_safe() fails
This will cause unchecked native_rdmsr_safe() failures to return
deterministic results.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/515fb611449a755312a476cfe11675906e7ddf6c.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:46 +02:00
Andy Lutomirski
4985ce15a3 x86/paravirt: Make "unsafe" MSR accesses unsafe even if PARAVIRT=y
Enabling CONFIG_PARAVIRT had an unintended side effect: rdmsr() turned
into rdmsr_safe() and wrmsr() turned into wrmsr_safe(), even on bare
metal.  Undo that by using the new unsafe paravirt MSR callbacks.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/414fabd6d3527703077c6c2a797223d0a9c3b081.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:46 +02:00
Andy Lutomirski
dd2f4a004b x86/paravirt: Add paravirt_{read,write}_msr()
This adds paravirt callbacks for unsafe MSR access.  On native, they
call native_{read,write}_msr().  On Xen, they use xen_{read,write}_msr_safe().

Nothing uses them yet for ease of bisection.  The next patch will
use them in rdmsrl(), wrmsrl(), etc.

I intentionally didn't make them warn on #GP on Xen.  I think that
should be done separately by the Xen maintainers.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/880eebc5dcd2ad9f310d41345f82061ea500e9fa.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:46 +02:00
Andy Lutomirski
fbd704374d x86/msr: Carry on after a non-"safe" MSR access fails
This demotes an OOPS and likely panic due to a failed non-"safe" MSR
access to a WARN_ONCE() and, for RDMSR, a return value of zero.

To be clear, this type of failure should *not* happen.  This patch
exists to minimize the chance of nasty undebuggable failures
happening when a CONFIG_PARAVIRT=y bug in the non-"safe" MSR helpers
gets fixed.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/26567b216aae70e795938f4b567eace5a0eb90ba.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:45 +02:00
Andy Lutomirski
c2ee03b2a9 x86/paravirt: Add _safe to the read_ms()r and write_msr() PV callbacks
These callbacks match the _safe variants, so name them accordingly.
This will make room for unsafe PV callbacks.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/9ee3fb6a196a514c93325bdfa15594beecf04876.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:45 +02:00
Andy Lutomirski
0e861fbb5b x86/head: Move early exception panic code into early_fixup_exception()
This removes a bunch of assembly and adds some C code instead.  It
changes the actual printouts on both 32-bit and 64-bit kernels, but
they still seem okay.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/4085070316fc3ab29538d3fcfe282648d1d4ee2e.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:44 +02:00
Andy Lutomirski
7bbcdb1ca4 x86/head: Pass a real pt_regs and trapnr to early_fixup_exception()
early_fixup_exception() is limited by the fact that it doesn't have a
real struct pt_regs.  Change both the 32-bit and 64-bit asm and the
C code to pass and accept a real pt_regs.

Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: KVM list <kvm@vger.kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel <Xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/e3fb680fcfd5e23e38237e8328b64a25cc121d37.1459605520.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:44 +02:00
Borislav Petkov
782511b00f x86/cpufeature: Replace cpu_has_xsaves with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <kvm@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:42 +02:00
Borislav Petkov
d366bf7eb9 x86/cpufeature: Replace cpu_has_xsave with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1459801503-15600-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
01f8fd7379 x86/cpufeature: Replace cpu_has_fxsr with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
93984fbd4e x86/cpufeature: Replace cpu_has_apic with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.linux-foundation.org
Cc: linux-pm@vger.kernel.org
Cc: oprofile-list@lists.sf.net
Link: http://lkml.kernel.org/r/1459801503-15600-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
59e21e3d00 x86/cpufeature: Replace cpu_has_tsc with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
Link: http://lkml.kernel.org/r/1459801503-15600-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:41 +02:00
Borislav Petkov
a402a8dffc x86/cpufeature: Replace cpu_has_fpu with boot_cpu_has() usage
Use static_cpu_has() in the timing-sensitive paths in fpstate_init() and
fpu__copy().

While at it, simplify the use in init_cyrix() and get rid of the ternary
operator.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:40 +02:00
Borislav Petkov
dda9edf7c1 x86/cpufeature: Replace cpu_has_xmm with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459801503-15600-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:40 +02:00
Borislav Petkov
da154e82af x86/cpufeature: Replace cpu_has_avx with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Link: http://lkml.kernel.org/r/1459801503-15600-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:40 +02:00
Borislav Petkov
1f4dd7938e x86/cpufeature: Replace cpu_has_aes with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Link: http://lkml.kernel.org/r/1459801503-15600-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:39 +02:00
Borislav Petkov
abcfdfe07d x86/cpufeature: Replace cpu_has_avx2 with boot_cpu_has() usage
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Link: http://lkml.kernel.org/r/1459801503-15600-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:37:39 +02:00
Ingo Molnar
95a8e746f8 Merge branch 'x86/urgent' into x86/asm to pick up dependent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:36:44 +02:00
Ingo Molnar
d8d1c35139 Merge branch 'x86/mm' into x86/asm to resolve conflict and to create common base
Conflicts:
	arch/x86/include/asm/cpufeature.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:36:19 +02:00
Ingo Molnar
cb44d0cfc2 Merge branch 'x86/cpu' into x86/asm, to merge more patches
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 11:15:39 +02:00
Michal Hocko
664b4e24c6 locking/rwsem, x86: Provide __down_write_killable()
which uses the same fast path as __down_write() except it falls back to
call_rwsem_down_write_failed_killable() slow path and return -EINTR if
killed. To prevent from code duplication extract the skeleton of
__down_write() into a helper macro which just takes the semaphore
and the slow path function to be called.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-11-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:22 +02:00
Michal Hocko
f8e04d8545 locking/rwsem: Get rid of __down_write_nested()
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.

This shouldn't introduce any functional change.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:42:16 +02:00
Andy Lutomirski
1ed95e52d9 x86/vdso: Remove direct HPET access through the vDSO
Allowing user code to map the HPET is problematic.  HPET
implementations are notoriously buggy, and there are probably many
machines on which even MMIO reads from bogus HPET addresses are
problematic.

We have a report that the Dell Precision M2800 with:

  ACPI: HPET 0x00000000C8FE6238 000038 (v01 DELL   CBX3  01072009 AMI. 00000005)

is either so slow when accessing the HPET or actually hangs in some
regard, causing soft lockups to be reported if users do unexpected
things to the HPET.

The vclock HPET code has also always been a questionable speedup.
Accessing an HPET is exceedingly slow (on the order of several
microseconds), so the added overhead in requiring a syscall to read
the HPET is a small fraction of the total code of accessing it.

To avoid future problems, let's just delete the code entirely.

In the long run, this could actually be a speedup.  Waiman Long as a
patch to optimize the case where multiple CPUs contend for the HPET,
but that won't help unless all the accesses are mediated by the
kernel.

Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: Waiman Long <waiman.long@hpe.com>
Link: http://lkml.kernel.org/r/d2f90bba98db9905041cff294646d290d378f67a.1460074438.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:28:34 +02:00
Borislav Petkov
96e5d28ae7 x86/cpu: Add Erratum 88 detection on AMD
Erratum 88 affects old AMD K8s, where a SWAPGS fails to cause an input
dependency on GS. Therefore, we need to MFENCE before it.

But that MFENCE is expensive and unnecessary on the remaining x86 CPUs
out there so patch it out on the CPUs which don't require it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Andy Lutomirski <luto@kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/aec6b2df1bfc56101d4e9e2e5d5d570bf41663c6.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:20:42 +02:00
Andy Lutomirski
7a5d670487 x86/cpu: Probe the behavior of nulling out a segment at boot time
AMD and Intel do different things when writing zero to a segment
selector.  Since neither vendor documents the behavior well and it's
easy to test the behavior, try nulling fs to see what happens.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/61588ba0e0df35beafd363dc8b68a4c5878ef095.1460075211.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 10:20:41 +02:00
Ingo Molnar
889fac6d67 Linux 4.6-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXCva8AAoJEHm+PkMAQRiGXBoIAIkrjxdbuT2nS9A3tHwkiFXa
 6/Th1UjbNaoLuZ+MckQHayAD9NcWY9lVjOUmFsSiSWMCQK/rTWDl8x5ITputrY2V
 VuhrJCwI7huEtu6GpRaJaUgwtdOjhIHz1Ue2MCdNIbKX3l+LjVyyJ9Vo8rruvZcR
 fC7kiivH04fYX58oQ+SHymCg54ny3qJEPT8i4+g26686m11hvZLI3UAs2PAn6ut+
 atCjxdQ4yLN3DWsbjuA7wYGWhTgFloxL4TIoisuOUc3FXnSi/ivIbXZvu4lUfisz
 LA2JBhfII3AEMBWG9xfGbXPijJTT4q7yNlTD0oYcnMtAt/Roh2F04asqB1LetEY=
 =bri6
 -----END PGP SIGNATURE-----

Merge tag 'v4.6-rc3' into perf/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-13 08:57:03 +02:00
Linus Torvalds
40bca9dbab Power management and ACPI material for v4.6-rc3
- intel_pstate fixes for two issues exposed by the recent switch
    over from using timers and for one issue introduced during the
    4.4 cycle plus new comments describing data structures used by
    the driver (Rafael Wysocki, Srinivas Pandruvada).
 
  - intel_idle fixes related to CPU offline/online (Richard Cochran).
 
  - intel_idle support (new CPU IDs and state definitions mostly) for
    Skylake-X and Kabylake processors (Len Brown).
 
  - PCC mailbox driver fix for an out-of-bounds memory access that
    may cause the kernel to panic() (Shanker Donthineni).
 
  - New (missing) CPU ID for one apparently overlooked Haswell model
    in the Intel RAPL power capping driver (Srinivas Pandruvada).
 
  - Fix for the PM core's wakeup IRQs framework to make it work after
    wakeup settings reconfiguration from sysfs (Grygorii Strashko).
 
  - Runtime PM documentation update to make it describe what needs
    to be done during device removal more precisely (Krzysztof
    Kozlowski).
 
  - Stale comment removal cleanup in the cpufreq-dt driver (Viresh
    Kumar).
 
  - turbostat utility fixes and support for Broxton, Skylake-X
    and Kabylake processors (Len Brown).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXCBaNAAoJEILEb/54YlRxDrkP/jdCfB3wOcRGp6LN6GHksC3u
 k2yUQl+XJZhggLda+yUK2ovV5tCJwGmP1N1B/MacRr/5MeicGXBvt6ZrHGqOfTKo
 9xAGKOt2xsoAbHkleB733GIRD9TYZpqCTx9n0dka7kRArQA/uhY+TlXYY2Kn+E0B
 1c4lEN3d9j4G2qW55YxnfRUOhwUc03syokOsS2z9ChEAVPSrn7Zv8V+8rPwxIz5w
 uUWphlnEQgaRwzwxACHwE0bht8sl1cJ1UUSVbzf30mzRlGHiteYyI/vuE2hF9SiV
 DEv9mhSB+4U6WuBHr4Cwu+nf9YTlaY1ZaS3r5EDzJaNJb8tMqPZcGitfn+fTGtUz
 9GlCQZIBcP1vWBDybuuPOzAH++QUFrikozuNfUu1d+WFEypRyadMIvUhtmZPG9mh
 9+Vem+ta32eXos07dx4Dvth+yNvmG5bxPnteDp3GPsnCCDlXutcKaaJaaj+1NzHi
 oqHyEynMhJuN/D2h+oIVIUppKBVO55M+lJMJrX891zICs/98K2LwOtFDuNRWQml0
 3yR7Ieoj5gxVfbT6zNeH5z7CxP/MymNAMjbxfiwqtGHhkNoAv92ymMxnptPiLCHn
 Zyycelsve3sneV+iyk1fAOZFvDXbzTUyigxGP7Kc87iia/Yd2TjLlvwJkMNK6x/K
 2OSkjJFqqrqAqcXNIz3G
 =f4MM
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "Fixes for some issues discovered after recent changes and for some
  that have just been found lately regardless of those changes
  (intel_pstate, intel_idle, PM core, mailbox/pcc, turbostat) plus
  support for some new CPU models (intel_idle, Intel RAPL driver,
  turbostat) and documentation updates (intel_pstate, PM core).

  Specifics:

   - intel_pstate fixes for two issues exposed by the recent switch over
     from using timers and for one issue introduced during the 4.4 cycle
     plus new comments describing data structures used by the driver
     (Rafael Wysocki, Srinivas Pandruvada).

   - intel_idle fixes related to CPU offline/online (Richard Cochran).

   - intel_idle support (new CPU IDs and state definitions mostly) for
     Skylake-X and Kabylake processors (Len Brown).

   - PCC mailbox driver fix for an out-of-bounds memory access that may
     cause the kernel to panic() (Shanker Donthineni).

   - New (missing) CPU ID for one apparently overlooked Haswell model in
     the Intel RAPL power capping driver (Srinivas Pandruvada).

   - Fix for the PM core's wakeup IRQs framework to make it work after
     wakeup settings reconfiguration from sysfs (Grygorii Strashko).

   - Runtime PM documentation update to make it describe what needs to
     be done during device removal more precisely (Krzysztof Kozlowski).

   - Stale comment removal cleanup in the cpufreq-dt driver (Viresh
     Kumar).

   - turbostat utility fixes and support for Broxton, Skylake-X and
     Kabylake processors (Len Brown)"

* tag 'pm+acpi-4.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (28 commits)
  PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
  tools/power turbostat: work around RC6 counter wrap
  tools/power turbostat: initial KBL support
  tools/power turbostat: initial SKX support
  tools/power turbostat: decode BXT TSC frequency via CPUID
  tools/power turbostat: initial BXT support
  tools/power turbostat: print IRTL MSRs
  tools/power turbostat: SGX state should print only if --debug
  intel_idle: Add KBL support
  intel_idle: Add SKX support
  intel_idle: Clean up all registered devices on exit.
  intel_idle: Propagate hot plug errors.
  intel_idle: Don't overreact to a cpuidle registration failure.
  intel_idle: Setup the timer broadcast only on successful driver load.
  intel_idle: Avoid a double free of the per-CPU data.
  intel_idle: Fix dangling registration on error path.
  intel_idle: Fix deallocation order on the driver exit path.
  intel_idle: Remove redundant initialization calls.
  intel_idle: Fix a helper function's return value.
  intel_idle: remove useless return from void function.
  ...
2016-04-09 11:03:48 -07:00
Rafael J. Wysocki
73659be769 Merge branches 'pm-core', 'powercap' and 'pm-tools'
* pm-core:
  PM / wakeirq: fix wakeirq setting after wakup re-configuration from sysfs
  PM / runtime: Document steps for device removal

* powercap:
  powercap: intel_rapl: Add missing Haswell model

* pm-tools:
  tools/power turbostat: work around RC6 counter wrap
  tools/power turbostat: initial KBL support
  tools/power turbostat: initial SKX support
  tools/power turbostat: decode BXT TSC frequency via CPUID
  tools/power turbostat: initial BXT support
  tools/power turbostat: print IRTL MSRs
  tools/power turbostat: SGX state should print only if --debug
2016-04-08 21:46:56 +02:00
Len Brown
5a63426e2a tools/power turbostat: print IRTL MSRs
Some processors use the Interrupt Response Time Limit (IRTL) MSR value
to describe the maximum IRQ response time latency for deep
package C-states.  (Though others have the register, but do not use it)
Lets print it out to give insight into the cases where it is used.

IRTL begain in SNB, with PC3/PC6/PC7, and HSW added PC8/PC9/PC10.

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-04-07 22:18:32 +02:00
Linus Torvalds
541d8f4d59 Miscellaneous bugfixes. ARM and s390 are new from the merge window,
others are usual stable material.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXA8x6AAoJEL/70l94x66D0x8H/RcBnc75994RQ++WmHSvD9GF
 yruGB8soLDdjX+Oceol0aEPHokrBu3JtcdoTBe0GwbCKV/F5NkQZ4EfLxDtR3tte
 7ILkPULLy5GElFpJNQuT4pmXzTEspFvXpqHhFik7WVBga3W9wMFQcjbrgmGBUzLE
 p2aJVhZyErpKxGFkUYWhDnlqWsguTTIzv/pqNhLY4VVc0UrXN9AA0fq9RkvgU3KS
 Hxk4/A6SV/b7dyzvttzITww0f1iu8FmlLj2TXapIEoOz7AnInD6KIN0RYpxbDjxN
 bEzEfpahUtuDeM87/t2kHEj0Gn09iHK7/BbCC1Hrwo1CQhbAQ/D0GIvqYAQixf4=
 =NugZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Miscellaneous bugfixes.

  The ARM and s390 fixes are for new regressions from the merge window,
  others are usual stable material"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  compiler-gcc: disable -ftracer for __noclone functions
  kvm: x86: make lapic hrtimer pinned
  s390/mm/kvm: fix mis-merge in gmap handling
  kvm: set page dirty only if page has been writable
  KVM: x86: reduce default value of halt_poll_ns parameter
  KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
  KVM: x86: Inject pending interrupt even if pending nmi exist
  arm64: KVM: Register CPU notifiers when the kernel runs at HYP
  arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
2016-04-05 16:16:00 -07:00
Linus Torvalds
30cebb6ca1 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "This lot contains:

   - Some fixups for the fallout of the topology consolidation which
     unearthed AMD/Intel inconsistencies
   - Documentation for the x86 topology management
   - Support for AMD advanced power management bits
   - Two simple cleanups removing duplicated code"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add advanced power management bits
  x86/thread_info: Merge two !__ASSEMBLY__ sections
  x86/cpufreq: Remove duplicated TDP MSR macro definitions
  x86/Documentation: Start documenting x86 topology
  x86/cpu: Get rid of compute_unit_id
  perf/x86/amd: Cleanup Fam10h NB event constraints
  x86/topology: Fix AMD core count
2016-04-03 06:32:28 -05:00
Nadav Amit
858eaaa711 mm/rmap: batched invalidations should use existing api
The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown.  However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.

This patch reuses the existing TLB flushing mechnaisms instead.  We use
NULL as mm to indicate a global invalidation is required.

Fixes 72b252aed5 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01 17:03:37 -05:00
Jessica Yu
425595a7fc livepatch: reuse module loader code to write relocations
Reuse module loader code to write relocations, thereby eliminating the need
for architecture specific relocation code in livepatch. Specifically, reuse
the apply_relocate_add() function in the module loader to write relocations
instead of duplicating functionality in livepatch's arch-dependent
klp_write_module_reloc() function.

In order to accomplish this, livepatch modules manage their own relocation
sections (marked with the SHF_RELA_LIVEPATCH section flag) and
livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
index). To apply livepatch relocation sections, livepatch symbols
referenced by relocs are resolved and then apply_relocate_add() is called
to apply those relocations.

In addition, remove x86 livepatch relocation code and the s390
klp_write_module_reloc() function stub. They are no longer needed since
relocation work has been offloaded to module loader.

Lastly, mark the module as a livepatch module so that the module loader
canappropriately identify and initialize it.

Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>   # for s390 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-04-01 15:00:11 +02:00
Paolo Bonzini
14ebda3394 KVM: x86: reduce default value of halt_poll_ns parameter
Windows lets applications choose the frequency of the timer tick,
and in Windows 10 the maximum rate was changed from 1024 Hz to
2048 Hz.  Unfortunately, because of the way the Windows API
works, most applications who need a higher rate than the default
64 Hz will just do

   timeGetDevCaps(&tc, sizeof(tc));
   timeBeginPeriod(tc.wPeriodMin);

and pick the maximum rate.  This causes very high CPU usage when
playing media or games on Windows 10, even if the guest does not
actually use the CPU very much, because the frequent timer tick
causes halt_poll_ns to kick in.

There is no really good solution, especially because Microsoft
could sooner or later bump the limit to 4096 Hz, but for now
the best we can do is lower a bit the upper limit for
halt_poll_ns. :-(

Reported-by: Jon Panozzo <jonp@lime-technology.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-04-01 12:10:10 +02:00
Alex Thorlton
1c532e00a0 x86/platform/uv: Disable UV BAU by default
For several years, the common practice has been to boot UVs with the
"nobau" parameter on the command line, to disable the BAU.  We've
decided that it makes more sense to just disable the BAU by default in
the kernel, and provide the option to turn it on, if desired.

For now, having the on/off switch doesn't buy us any more than just
reversing the logic would, but we're working towards having the BAU
enabled by default on UV4.  When those changes are in place, having the
on/off switch will make more sense than an enable flag, since the
default behavior will be different depending on the system version.

I've also added a bit of documentation for the new parameter to
Documentation/kernel-parameters.txt.

Signed-off-by: Alex Thorlton <athorlton@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459451909-121845-1-git-send-email-athorlton@sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-04-01 11:45:54 +02:00
Borislav Petkov
16bf92261b x86/cpufeature: Remove cpu_has_pse
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-11-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:10 +02:00
Borislav Petkov
c109bf9599 x86/cpufeature: Remove cpu_has_pge
Use static_cpu_has() in __flush_tlb_all() due to the time-sensitivity of
this one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-10-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:09 +02:00
Borislav Petkov
054efb6467 x86/cpufeature: Remove cpu_has_xmm2
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Link: http://lkml.kernel.org/r/1459266123-21878-8-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:09 +02:00
Borislav Petkov
906bf7fda2 x86/cpufeature: Remove cpu_has_clflush
Use the fast variant in the DRM code.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dri-devel@lists.freedesktop.org
Cc: intel-gfx@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1459266123-21878-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:09 +02:00
Borislav Petkov
b8291adc19 x86/cpufeature: Remove cpu_has_gbpages
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:08 +02:00
Borislav Petkov
62436a4d36 x86/cpufeature: Remove cpu_has_x2apic
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1459266123-21878-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:08 +02:00
Borislav Petkov
ab4a56fa2c x86/cpufeature: Remove cpu_has_osxsave
Use boot_cpu_has() instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-crypto@vger.kernel.org
Link: http://lkml.kernel.org/r/1459266123-21878-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:08 +02:00
Borislav Petkov
0c9f3536cc x86/cpufeature: Remove cpu_has_hypervisor
Use boot_cpu_has() instead.

Tested-by: David Kershner <david.kershner@unisys.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sparmaintainer@unisys.com
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1459266123-21878-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:35:07 +02:00
Borislav Petkov
7b5e74e637 x86/cpufeature: Remove cpu_has_arch_perfmon
Use boot_cpu_has() instead.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: oprofile-list@lists.sf.net
Link: http://lkml.kernel.org/r/1459266123-21878-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:33:17 +02:00
Borislav Petkov
568a58e5df x86/mm/pat, x86/cpufeature: Remove cpu_has_pat
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: intel-gfx@lists.freedesktop.org
Link: http://lkml.kernel.org/r/1459266123-21878-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 13:32:43 +02:00
Huang Rui
aaf248848d perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter
AMD Zeppelin (Family 17h, Model 00h) introduces an instructions
retired performance counter which is indicated by
CPUID.8000_0008H:EBX[1]. A dedicated Instructions Retired MSR register
(MSR 0xC000_000E9) increments once for every instruction retired.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1454056197-5893-3-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 10:30:39 +02:00
Huang Rui
8a22426184 perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) support
AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter
which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100
MHz rate in all P-states, and C states, S0, or S1. The frequency is
about 100MHz. This counter will be used to calculate processor power
and other parts. So add an interface into the MSR PMU to get the PTSC
counter value.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-31 10:30:39 +02:00
Toshi Kani
88ba281108 x86/xen, pat: Remove PAT table init code from Xen
Xen supports PAT without MTRRs for its guests.  In order to
enable WC attribute, it was necessary for xen_start_kernel()
to call pat_init_cache_modes() to update PAT table before
starting guest kernel.

Now that the kernel initializes PAT table to the BIOS handoff
state when MTRR is disabled, this Xen-specific PAT init code
is no longer necessary.  Delete it from xen_start_kernel().

Also change __init_cache_modes() to a static function since
PAT table should not be tweaked by other modules.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-7-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:27 +02:00
Toshi Kani
edfe63ec97 x86/mtrr: Fix Xorg crashes in Qemu sessions
A Xorg failure on qemu32 was reported as a regression [1] caused by
commit 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled").

This patch fixes the Xorg crash.

Negative effects of this regression were the following two failures [2]
in Xorg on QEMU with QEMU CPU model "qemu32" (-cpu qemu32), which were
triggered by the fact that its virtual CPU does not support MTRRs.

 #1. copy_process() failed in the check in reserve_pfn_range()

    copy_process
     copy_mm
      dup_mm
       dup_mmap
        copy_page_range
         track_pfn_copy
          reserve_pfn_range

 A WC map request was tracked as WC in memtype, which set a PTE as
 UC (pgprot) per __cachemode2pte_tbl[].  This led to this error in
 reserve_pfn_range() called from track_pfn_copy(), which obtained
 a pgprot from a PTE.  It converts pgprot to page_cache_mode, which
 does not necessarily result in the original page_cache_mode since
 __cachemode2pte_tbl[] redirects multiple types to UC.

 #2. error path in copy_process() then hit WARN_ON_ONCE in
     untrack_pfn().

     x86/PAT: Xorg:509 map pfn expected mapping type uncached-
     minus for [mem 0xfd000000-0xfdffffff], got write-combining
      Call Trace:
     dump_stack
     warn_slowpath_common
     ? untrack_pfn
     ? untrack_pfn
     warn_slowpath_null
     untrack_pfn
     ? __kunmap_atomic
     unmap_single_vma
     ? pagevec_move_tail_fn
     unmap_vmas
     exit_mmap
     mmput
     copy_process.part.47
     _do_fork
     SyS_clone
     do_syscall_32_irqs_on
     entry_INT80_32

These negative effects are caused by two separate bugs, but they
can be addressed in separate patches.  Fixing the pat_init() issue
described below addresses the root cause, and avoids Xorg to hit
these cases.

When the CPU does not support MTRRs, MTRR does not call pat_init(),
which leaves PAT enabled without initializing PAT.  This pat_init()
issue is a long-standing issue, but manifested as issue #1 (and then
hit issue #2) with the above-mentioned commit because the memtype
now tracks cache attribute with 'page_cache_mode'.

This pat_init() issue existed before the commit, but we used pgprot
in memtype.  Hence, we did not have issue #1 before.  But WC request
resulted in WT in effect because WC pgrot is actually WT when PAT
is not initialized.  This is not how it was designed to work.  When
PAT is set to disable properly, WC is converted to UC.  The use of
WT can result in a system crash if the target range does not support
WT.  Fortunately, nobody ran into such issue before.

To fix this pat_init() issue, PAT code has been enhanced to provide
pat_disable() interface.  Call this interface when MTRRs are disabled.
By setting PAT to disable properly, PAT bypasses the memtype check,
and avoids issue #1.

  [1]: https://lkml.org/lkml/2016/3/3/828
  [2]: https://lkml.org/lkml/2016/3/4/775

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:26 +02:00
Toshi Kani
224bb1e5d6 x86/mm/pat: Add pat_disable() interface
In preparation for fixing a regression caused by:

  9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")

... PAT needs to provide an interface that prevents the OS from
initializing the PAT MSR.

PAT MSR initialization must be done on all CPUs using the specific
sequence of operations defined in the Intel SDM.  This requires MTRRs
to be enabled since pat_init() is called as part of MTRR init
from mtrr_rendezvous_handler().

Make pat_disable() as the interface that prevents the OS from
initializing the PAT MSR.  MTRR will call this interface when it
cannot provide the SDM-defined sequence to initialize PAT.

This also assures that pat_disable() called from pat_bsp_init()
will set the PAT table properly when CPU does not support PAT.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Elliott <elliott@hpe.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:25 +02:00
Toshi Kani
02f037d641 x86/mm/pat: Add support of non-default PAT MSR setting
In preparation for fixing a regression caused by:

  9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")'

... PAT needs to support a case that PAT MSR is initialized with a
non-default value.

When pat_init() is called and PAT is disabled, it initializes the
PAT table with the BIOS default value. Xen, however, sets PAT MSR
with a non-default value to enable WC. This causes inconsistency
between the PAT table and PAT MSR when PAT is set to disable on Xen.

Change pat_init() to handle the PAT disable cases properly.  Add
init_cache_modes() to handle two cases when PAT is set to disable.

 1. CPU supports PAT: Set PAT table to be consistent with PAT MSR.
 2. CPU does not support PAT: Set PAT table to be consistent with
    PWT and PCD bits in a PTE.

Note, __init_cache_modes(), renamed from pat_init_cache_modes(),
will be changed to a static function in a later patch.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: paul.gortmaker@windriver.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1458769323-24491-2-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-29 12:23:25 +02:00
Borislav Petkov
5f870a3f71 x86/thread_info: Merge two !__ASSEMBLY__ sections
We have

  #ifndef __ASSEMBLY__
  ...
  #endif

  #ifndef __ASSEMBLY__
  ...
  #endif

Merge the two.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1459189217-25532-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 11:12:10 +02:00
Vladimir Zapolskiy
4a6772f514 x86/cpufreq: Remove duplicated TDP MSR macro definitions
The list of CPU model specific registers contains two copies of TDP
registers, remove the one, which is out of numerical order in the
list.

Fixes: 6a35fc2d6c ("cpufreq: intel_pstate: get P1 from TAR when available")
Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Kristen Carlson
 Accardi <kristen@linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: http://lkml.kernel.org/r/1459018020-24577-1-git-send-email-vladimir_zapolskiy@mentor.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 11:12:10 +02:00
Borislav Petkov
8196dab4fc x86/cpu: Get rid of compute_unit_id
It is cpu_core_id anyway.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1458917557-8757-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 10:45:04 +02:00
Peter Zijlstra
ee6825c80e x86/topology: Fix AMD core count
It turns out AMD gets x86_max_cores wrong when there are compute
units.

The issue is that Linux assumes:

	nr_logical_cpus = nr_cores * nr_siblings

But AMD reports its CU unit as 2 cores, but then sets num_smp_siblings
to 2 as well.

Boris: fixup ras/mce_amd_inj.c too, to compute the Node Base Core
properly, according to the new nomenclature.

Fixes: 1f12e32f4c ("x86/topology: Create logical package id")
Reported-by: Xiong Zhou <jencce.kernel@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andreas Herrmann <aherrmann@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Link: http://lkml.kernel.org/r/20160317095220.GO6344@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-29 10:45:04 +02:00
Dan Williams
fc0c202813 x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem()
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code.  Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-28 17:19:31 -07:00
Linus Torvalds
3fa2fe2ce0 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "This tree contains various perf fixes on the kernel side, plus three
  hw/event-enablement late additions:

   - Intel Memory Bandwidth Monitoring events and handling
   - the AMD Accumulated Power Mechanism reporting facility
   - more IOMMU events

  ... and a final round of perf tooling updates/fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  perf llvm: Use strerror_r instead of the thread unsafe strerror one
  perf llvm: Use realpath to canonicalize paths
  perf tools: Unexport some methods unused outside strbuf.c
  perf probe: No need to use formatting strbuf method
  perf help: Use asprintf instead of adhoc equivalents
  perf tools: Remove unused perf_pathdup, xstrdup functions
  perf tools: Do not include stringify.h from the kernel sources
  tools include: Copy linux/stringify.h from the kernel
  tools lib traceevent: Remove redundant CPU output
  perf tools: Remove needless 'extern' from function prototypes
  perf tools: Simplify die() mechanism
  perf tools: Remove unused DIE_IF macro
  perf script: Remove lots of unused arguments
  perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
  perf machine: Rename perf_event__preprocess_sample to machine__resolve
  perf tools: Add cpumode to struct perf_sample
  perf tests: Forward the perf_sample in the dwarf unwind test
  perf tools: Remove misplaced __maybe_unused
  perf list: Fix documentation of :ppp
  perf bench numa: Fix assertion for nodes bitfield
  ...
2016-03-24 10:02:14 -07:00
Linus Torvalds
d88f48e128 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes:

   - fix hotplug bugs
   - fix irq live lock
   - fix various topology handling bugs
   - fix APIC ACK ordering
   - fix PV iopl handling
   - fix speling
   - fix/tweak memcpy_mcsafe() return value
   - fix fbcon bug
   - remove stray prototypes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/msr: Remove unused native_read_tscp()
  x86/apic: Remove declaration of unused hw_nmi_is_cpu_stuck
  x86/oprofile/nmi: Add missing hotplug FROZEN handling
  x86/hpet: Use proper mask to modify hotplug action
  x86/apic/uv: Fix the hotplug notifier
  x86/apb/timer: Use proper mask to modify hotplug action
  x86/topology: Use total_cpus not nr_cpu_ids for logical packages
  x86/topology: Fix Intel HT disable
  x86/topology: Fix logical package mapping
  x86/irq: Cure live lock in fixup_irqs()
  x86/tsc: Prevent NULL pointer deref in calibrate_delay_is_known()
  x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
  x86/iopl: Fix iopl capability check on Xen PV
  x86/iopl/64: Properly context-switch IOPL on Xen PV
  selftests/x86: Add an iopl test
  x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
  x86/video: Don't assume all FB devices are PCI devices
  arch/x86/irq: Purge useless handler declarations from hw_irq.h
  x86: Fix misspellings in comments
2016-03-24 09:47:32 -07:00
Prarit Bhargava
9da77666d6 x86/msr: Remove unused native_read_tscp()
After e76b027 ("x86,vdso: Use LSL unconditionally for vgetcpu")
native_read_tscp() is unused in the kernel. The function can be removed like
native_read_tsc() was.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-23 12:34:17 +01:00
Linus Torvalds
a24e3d414e Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - more ocfs2 changes

 - a few hotfixes

 - Andy's compat cleanups

 - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
   panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

 - many rapidio updates: fixes, new drivers.

 - kcov: kernel code coverage feature.  Like gcov, but not
   "prohibitively expensive".

 - extable code consolidation for various archs

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  ia64/extable: use generic search and sort routines
  x86/extable: use generic search and sort routines
  s390/extable: use generic search and sort routines
  alpha/extable: use generic search and sort routines
  kernel/...: convert pr_warning to pr_warn
  drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
  drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
  memremap: add MEMREMAP_WC flag
  memremap: don't modify flags
  kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
  mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
  ipc/sem: make semctl setting sempid consistent
  ubsan: fix tree-wide -Wmaybe-uninitialized false positives
  kfifo: fix sparse complaints
  scripts/gdb: account for changes in module data structure
  scripts/gdb: add cmdline reader command
  scripts/gdb: add version command
  kernel: add kcov code coverage
  profile: hide unused functions when !CONFIG_PROC_FS
  hpwdt: use nmi_panic() when kernel panics in NMI handler
  ...
2016-03-22 17:09:14 -07:00
Ard Biesheuvel
29934b0fb8 x86/extable: use generic search and sort routines
Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
f970165bee x86/compat: remove is_compat_task()
x86's is_compat_task always checked the current syscall type, not the
task type.  It has no non-arch users any more, so just remove it to
avoid confusion.

On x86, nothing should really be checking the task ABI.  There are
legitimate users for the syscall ABI and for the mm ABI.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Huaitong Han
b9baba8614 KVM, pkeys: expose CPUID/CR4 to guest
X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:
CPUID.7.0.ECX[3]:PKU. X86_FEATURE_OSPKE is software support for pkeys,
enumerated with CPUID.7.0.ECX[4]:OSPKE, and it reflects the setting of
CR4.PKE(bit 22).

This patch disables CPUID:PKU without ept, because pkeys is not yet
implemented for shadow paging.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:38:17 +01:00
Huaitong Han
be94f6b710 KVM, pkeys: add pkeys support for permission_fault
Protection keys define a new 4-bit protection key field (PKEY) in bits
62:59 of leaf entries of the page tables, the PKEY is an index to PKRU
register(16 domains), every domain has 2 bits(write disable bit, access
disable bit).

Static logic has been produced in update_pkru_bitmask, dynamic logic need
read pkey from page table entries, get pkru value, and deduce the correct
result.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:23:37 +01:00
Huaitong Han
2d344105f5 KVM, pkeys: introduce pkru_mask to cache conditions
PKEYS defines a new status bit in the PFEC. PFEC.PK (bit 5), if some
conditions is true, the fault is considered as a PKU violation.
pkru_mask indicates if we need to check PKRU.ADi and PKRU.WDi, and
does cache some conditions for permission_fault.

[ Huaitong: Xiao helps to modify many sections. ]

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:06 +01:00
Xiao Guangrong
9e90199c25 x86: pkey: introduce write_pkru() for KVM
KVM will use it to switch pkru between guest and host.

CC: Ingo Molnar <mingo@redhat.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-22 16:21:05 +01:00
Huang Rui
01fe03ff1c x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flag
AMD CPU family 15h model 0x60 introduces a mechanism for measuring
accumulated power. It is used to report the processor power consumption
and support for it is indicated by CPUID Fn8000_0007_EDX[12].

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@googlemail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wan Zongshun <Vincent.Wan@amd.com>
Cc: spg_linux_kernel@amd.com
Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com
[ Resolved conflict and moved the synthetic CPUID slot to 19. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:35:29 +01:00
Vikas Shivappa
33c3cc7acf perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring)
and initializes the perf events and datastructures for monitoring the
memory b/w.

Its based on original patch series by Tony Luck and Kanaka Juvva.

Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.

Extending the cache quality of service monitoring (CQM) we add two
more events to the perf infrastructure:

  intel_cqm_llc/local_bytes - bytes sent through local socket memory controller
  intel_cqm_llc/total_bytes - total L3 external bytes sent

The tasks are associated with a Resouce Monitoring ID (RMID) just like
in CQM and OS uses a MSR write to indicate the RMID of the task during
scheduling.

Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: fenghua.yu@intel.com
Cc: h.peter.anvin@intel.com
Cc: ravi.v.shankar@intel.com
Cc: vikas.shivappa@intel.com
Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-21 09:08:19 +01:00
Linus Torvalds
643ad15d47 Merge branch 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 protection key support from Ingo Molnar:
 "This tree adds support for a new memory protection hardware feature
  that is available in upcoming Intel CPUs: 'protection keys' (pkeys).

  There's a background article at LWN.net:

      https://lwn.net/Articles/643797/

  The gist is that protection keys allow the encoding of
  user-controllable permission masks in the pte.  So instead of having a
  fixed protection mask in the pte (which needs a system call to change
  and works on a per page basis), the user can map a (handful of)
  protection mask variants and can change the masks runtime relatively
  cheaply, without having to change every single page in the affected
  virtual memory range.

  This allows the dynamic switching of the protection bits of large
  amounts of virtual memory, via user-space instructions.  It also
  allows more precise control of MMU permission bits: for example the
  executable bit is separate from the read bit (see more about that
  below).

  This tree adds the MM infrastructure and low level x86 glue needed for
  that, plus it adds a high level API to make use of protection keys -
  if a user-space application calls:

        mmap(..., PROT_EXEC);

  or

        mprotect(ptr, sz, PROT_EXEC);

  (note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
  this special case, and will set a special protection key on this
  memory range.  It also sets the appropriate bits in the Protection
  Keys User Rights (PKRU) register so that the memory becomes unreadable
  and unwritable.

  So using protection keys the kernel is able to implement 'true'
  PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
  PROT_READ as well.  Unreadable executable mappings have security
  advantages: they cannot be read via information leaks to figure out
  ASLR details, nor can they be scanned for ROP gadgets - and they
  cannot be used by exploits for data purposes either.

  We know about no user-space code that relies on pure PROT_EXEC
  mappings today, but binary loaders could start making use of this new
  feature to map binaries and libraries in a more secure fashion.

  There is other pending pkeys work that offers more high level system
  call APIs to manage protection keys - but those are not part of this
  pull request.

  Right now there's a Kconfig that controls this feature
  (CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
  (like most x86 CPU feature enablement code that has no runtime
  overhead), but it's not user-configurable at the moment.  If there's
  any serious problem with this then we can make it configurable and/or
  flip the default"

* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
  x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
  mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
  x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
  mm/core, x86/mm/pkeys: Add execute-only protection keys support
  x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
  x86/mm/pkeys: Allow kernel to modify user pkey rights register
  x86/fpu: Allow setting of XSAVE state
  x86/mm: Factor out LDT init from context init
  mm/core, x86/mm/pkeys: Add arch_validate_pkey()
  mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
  x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
  x86/mm/pkeys: Add Kconfig prompt to existing config option
  x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
  x86/mm/pkeys: Dump PKRU with other kernel registers
  mm/core, x86/mm/pkeys: Differentiate instruction fetches
  x86/mm/pkeys: Optimize fault handling in access_error()
  mm/core: Do not enforce PKEY permissions on remote mm access
  um, pkeys: Add UML arch_*_access_permitted() methods
  mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
  x86/mm/gup: Simplify get_user_pages() PTE bit handling
  ...
2016-03-20 19:08:56 -07:00
Linus Torvalds
24b5e20f11 Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "The main changes are:

   - Use separate EFI page tables when executing EFI firmware code.
     This isolates the EFI context from the rest of the kernel, which
     has security and general robustness advantages.  (Matt Fleming)

   - Run regular UEFI firmware with interrupts enabled.  This is already
     the status quo under other OSs.  (Ard Biesheuvel)

   - Various x86 EFI enhancements, such as the use of non-executable
     attributes for EFI memory mappings.  (Sai Praneeth Prakhya)

   - Various arm64 UEFI enhancements.  (Ard Biesheuvel)

   - ... various fixes and cleanups.

  The separate EFI page tables feature got delayed twice already,
  because it's an intrusive change and we didn't feel confident about
  it - third time's the charm we hope!"

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  x86/mm/pat: Fix boot crash when 1GB pages are not supported by the CPU
  x86/efi: Only map kernel text for EFI mixed mode
  x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
  x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd()
  efi/arm*: Perform hardware compatibility check
  efi/arm64: Check for h/w support before booting a >4 KB granular kernel
  efi/arm: Check for LPAE support before booting a LPAE kernel
  efi/arm-init: Use read-only early mappings
  efi/efistub: Prevent __init annotations from being used
  arm64/vmlinux.lds.S: Handle .init.rodata.xxx and .init.bss sections
  efi/arm64: Drop __init annotation from handle_kernel_image()
  x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings
  efi/runtime-wrappers: Run UEFI Runtime Services with interrupts enabled
  efi: Reformat GUID tables to follow the format in UEFI spec
  efi: Add Persistent Memory type name
  efi: Add NV memory attribute
  x86/efi: Show actual ending addresses in efi_print_memmap
  x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0
  efivars: Use to_efivar_entry
  efi: Runtime-wrapper: Get rid of the rtc_lock spinlock
  ...
2016-03-20 18:58:18 -07:00
Linus Torvalds
26660a4046 Merge branch 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull 'objtool' stack frame validation from Ingo Molnar:
 "This tree adds a new kernel build-time object file validation feature
  (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
  It was written by and is maintained by Josh Poimboeuf.

  The motivation: there's a category of hard to find kernel bugs, most
  of them in assembly code (but also occasionally in C code), that
  degrades the quality of kernel stack dumps/backtraces.  These bugs are
  hard to detect at the source code level.  Such bugs result in
  incorrect/incomplete backtraces most of time - but can also in some
  rare cases result in crashes or other undefined behavior.

  The build time correctness checking is done via the new 'objtool'
  user-space utility that was written for this purpose and which is
  hosted in the kernel repository in tools/objtool/.  The tool's (very
  simple) UI and source code design is shaped after Git and perf and
  shares quite a bit of infrastructure with tools/perf (which tooling
  infrastructure sharing effort got merged via perf and is already
  upstream).  Objtool follows the well-known kernel coding style.

  Objtool does not try to check .c or .S files, it instead analyzes the
  resulting .o generated machine code from first principles: it decodes
  the instruction stream and interprets it.  (Right now objtool supports
  the x86-64 architecture.)

  From tools/objtool/Documentation/stack-validation.txt:

   "The kernel CONFIG_STACK_VALIDATION option enables a host tool named
    objtool which runs at compile time.  It has a "check" subcommand
    which analyzes every .o file and ensures the validity of its stack
    metadata.  It enforces a set of rules on asm code and C inline
    assembly code so that stack traces can be reliable.

    Currently it only checks frame pointer usage, but there are plans to
    add CFI validation for C files and CFI generation for asm files.

    For each function, it recursively follows all possible code paths
    and validates the correct frame pointer state at each instruction.

    It also follows code paths involving special sections, like
    .altinstructions, __jump_table, and __ex_table, which can add
    alternative execution paths to a given instruction (or set of
    instructions).  Similarly, it knows how to follow switch statements,
    for which gcc sometimes uses jump tables."

  When this new kernel option is enabled (it's disabled by default), the
  tool, if it finds any suspicious assembly code pattern, outputs
  warnings in compiler warning format:

    warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
    warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
    warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
    warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer

  ... so that scripts that pick up compiler warnings will notice them.
  All known warnings triggered by the tool are fixed by the tree, most
  of the commits in fact prepare the kernel to be warning-free.  Most of
  them are bugfixes or cleanups that stand on their own, but there are
  also some annotations of 'special' stack frames for justified cases
  such entries to JIT-ed code (BPF) or really special boot time code.

  There are two other long-term motivations behind this tool as well:

   - To improve the quality and reliability of kernel stack frames, so
     that they can be used for optimized live patching.

   - To create independent infrastructure to check the correctness of
     CFI stack frames at build time.  CFI debuginfo is notoriously
     unreliable and we cannot use it in the kernel as-is without extra
     checking done both on the kernel side and on the build side.

  The quality of kernel stack frames matters to debuggability as well,
  so IMO we can merge this without having to consider the live patching
  or CFI debuginfo angle"

* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  objtool: Only print one warning per function
  objtool: Add several performance improvements
  tools: Copy hashtable.h into tools directory
  objtool: Fix false positive warnings for functions with multiple switch statements
  objtool: Rename some variables and functions
  objtool: Remove superflous INIT_LIST_HEAD
  objtool: Add helper macros for traversing instructions
  objtool: Fix false positive warnings related to sibling calls
  objtool: Compile with debugging symbols
  objtool: Detect infinite recursion
  objtool: Prevent infinite recursion in noreturn detection
  objtool: Detect and warn if libelf is missing and don't break the build
  tools: Support relative directory path for 'O='
  objtool: Support CROSS_COMPILE
  x86/asm/decoder: Use explicitly signed chars
  objtool: Enable stack metadata validation on 64-bit x86
  objtool: Add CONFIG_STACK_VALIDATION option
  objtool: Add tool to perform compile-time stack metadata validation
  x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
  sched: Always inline context_switch()
  ...
2016-03-20 18:23:21 -07:00
Linus Torvalds
1200b6809d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support more Realtek wireless chips, from Jes Sorenson.

   2) New BPF types for per-cpu hash and arrap maps, from Alexei
      Starovoitov.

   3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

   4) Allow the use of SO_REUSEPORT in order to do per-thread processing
   of incoming TCP/UDP connections.  The muxing can be done using a
   BPF program which hashes the incoming packet.  From Craig Gallek.

   5) Add a multiplexer for TCP streams, to provide a messaged based
      interface.  BPF programs can be used to determine the message
      boundaries.  From Tom Herbert.

   6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

   7) Avoid factorial complexity when taking down an inetdev interface
      with lots of configured addresses.  We were doing things like
      traversing the entire address less for each address removed, and
      flushing the entire netfilter conntrack table for every address as
      well.

   8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

   9) Allow offloading u32 classifiers to hardware, and implement for
      ixgbe, from John Fastabend.

  10) Allow configuring IRQ coalescing parameters on a per-queue basis,
      from Kan Liang.

  11) Extend ethtool so that larger link mode masks can be supported.
      From David Decotigny.

  12) Introduce devlink, which can be used to configure port link types
      (ethernet vs Infiniband, etc.), port splitting, and switch device
      level attributes as a whole.  From Jiri Pirko.

  13) Hardware offload support for flower classifiers, from Amir Vadai.

  14) Add "Local Checksum Offload".  Basically, for a tunneled packet
      the checksum of the outer header is 'constant' (because with the
      checksum field filled into the inner protocol header, the payload
      of the outer frame checksums to 'zero'), and we can take advantage
      of that in various ways.  From Edward Cree"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
  bonding: fix bond_get_stats()
  net: bcmgenet: fix dma api length mismatch
  net/mlx4_core: Fix backward compatibility on VFs
  phy: mdio-thunder: Fix some Kconfig typos
  lan78xx: add ndo_get_stats64
  lan78xx: handle statistics counter rollover
  RDS: TCP: Remove unused constant
  RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
  net: smc911x: convert pxa dma to dmaengine
  team: remove duplicate set of flag IFF_MULTICAST
  bonding: remove duplicate set of flag IFF_MULTICAST
  net: fix a comment typo
  ethernet: micrel: fix some error codes
  ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
  bpf, dst: add and use dst_tclassid helper
  bpf: make skb->tc_classid also readable
  net: mvneta: bm: clarify dependencies
  cls_bpf: reset class and reuse major in da
  ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
  ldmvsw: Add ldmvsw.c driver code
  ...
2016-03-19 10:05:34 -07:00
Thomas Gleixner
551adc6057 x86/irq: Cure live lock in fixup_irqs()
Harry reported, that he's able to trigger a system freeze with cpu hot
unplug. The freeze turned out to be a live lock caused by recent changes in
irq_force_complete_move().

When fixup_irqs() and from there irq_force_complete_move() is called on the
dying cpu, then all other cpus are in stop machine an wait for the dying cpu
to complete the teardown. If there is a move of an interrupt pending then
irq_force_complete_move() sends the cleanup IPI to the cpus in the old_domain
mask and waits for them to clear the mask. That's obviously impossible as
those cpus are firmly stuck in stop machine with interrupts disabled.

I should have known that, but I completely overlooked it being concentrated on
the locking issues around the vectors. And the existance of the call to
__irq_complete_move() in the code, which actually sends the cleanup IPI made
it reasonable to wait for that cleanup to complete. That call was bogus even
before the recent changes as it was just a pointless distraction.

We have to look at two cases:

1) The move_in_progress flag of the interrupt is set

   This means the ioapic has been updated with the new vector, but it has not
   fired yet. In theory there is a race:

   set_ioapic(new_vector) <-- Interrupt is raised before update is effective,
   			      i.e. it's raised on the old vector. 

   So if the target cpu cannot handle that interrupt before the old vector is
   cleaned up, we get a spurious interrupt and in the worst case the ioapic
   irq line becomes stale, but my experiments so far have only resulted in
   spurious interrupts.

   But in case of cpu hotplug this should be a non issue because if the
   affinity update happens right before all cpus rendevouz in stop machine,
   there is no way that the interrupt can be blocked on the target cpu because
   all cpus loops first with interrupts enabled in stop machine, so the old
   vector is not yet cleaned up when the interrupt fires.

   So the only way to run into this issue is if the delivery of the interrupt
   on the apic/system bus would be delayed beyond the point where the target
   cpu disables interrupts in stop machine. I doubt that it can happen, but at
   least there is a theroretical chance. Virtualization might be able to
   expose this, but AFAICT the IOAPIC emulation is not as stupid as the real
   hardware.

   I've spent quite some time over the weekend to enforce that situation,
   though I was not able to trigger the delayed case.

2) The move_in_progress flag is not set and the old_domain cpu mask is not
   empty.

   That means, that an interrupt was delivered after the change and the
   cleanup IPI has been sent to the cpus in old_domain, but not all CPUs have
   responded to it yet.

In both cases we can assume that the next interrupt will arrive on the new
vector, so we can cleanup the old vectors on the cpus in the old_domain cpu
mask.

Fixes: 98229aa36c "x86/irq: Plug vector cleanup race"
Reported-by: Harry Junior <harryjr@outlook.fr>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603140931430.3657@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-18 14:51:06 +01:00
Dave Jones
7834c10313 x86/apic: Fix suspicious RCU usage in smp_trace_call_function_interrupt()
Since 4.4, I've been able to trigger this occasionally:

===============================
[ INFO: suspicious RCU usage. ]
4.5.0-rc7-think+ #3 Not tainted
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20160315012054.GA17765@codemonkey.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

-------------------------------
./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/3/0.

stack backtrace:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.5.0-rc7-think+ #3
 ffffffff92f821e0 1f3e5c340597d7fc ffff880468e07f10 ffffffff92560c2a
 ffff880462145280 0000000000000001 ffff880468e07f40 ffffffff921376a6
 ffffffff93665ea0 0000cc7c876d28da 0000000000000005 ffffffff9383dd60
Call Trace:
 <IRQ>  [<ffffffff92560c2a>] dump_stack+0x67/0x9d
 [<ffffffff921376a6>] lockdep_rcu_suspicious+0xe6/0x100
 [<ffffffff925ae7a7>] do_trace_write_msr+0x127/0x1a0
 [<ffffffff92061c83>] native_apic_msr_eoi_write+0x23/0x30
 [<ffffffff92054408>] smp_trace_call_function_interrupt+0x38/0x360
 [<ffffffff92d1ca60>] trace_call_function_interrupt+0x90/0xa0
 <EOI>  [<ffffffff92ac5124>] ? cpuidle_enter_state+0x1b4/0x520

Move the entering_irq() call before ack_APIC_irq(), because entering_irq()
tells the RCU susbstems to end the extended quiescent state, so that the
following trace call in ack_APIC_irq() works correctly.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: 4787c368a9 "x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt()"
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2016-03-18 14:51:06 +01:00
Linus Torvalds
0f49fc95b8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching update from Jiri Kosina:

 - cleanup of module notifiers; this depends on a module.c cleanup which
   has been acked by Rusty; from Jessica Yu

 - small assorted fixes and MAINTAINERS update

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch/module: remove livepatch module notifier
  modules: split part of complete_formation() into prepare_coming_module()
  livepatch: Update maintainers
  livepatch: Fix the error message about unresolvable ambiguity
  klp: remove CONFIG_LIVEPATCH dependency from klp headers
  klp: remove superfluous errors in asm/livepatch.h
2016-03-17 21:46:32 -07:00
Linus Torvalds
1a46712aa9 This is the bulk of GPIO changes for kernel v4.6:
Core changes:
 
 - The gpio_chip is now a *real device*. Until now the gpio chips
   were just piggybacking the parent device or (gasp) floating in
   space outside of the device model. We now finally make GPIO chips
   devices. The gpio_chip will create a gpio_device which contains
   a struct device, and this gpio_device struct is kept private.
   Anything that needs to be kept private from the rest of the kernel
   will gradually be moved over to the gpio_device.
 
 - As a result of making the gpio_device a real device, we have added
   resource management, so devm_gpiochip_add_data() will cut down on
   overhead and reduce code lines. A huge slew of patches convert
   almost all drivers in the subsystem to use this.
 
 - Building on making the GPIO a real device, we add the first step
   of a new userspace ABI: the GPIO character device. We take small
   steps here, so we first add a pure *information* ABI and the tool
   "lsgpio" that will list all GPIO devices on the system and all
   lines on these devices. We can now discover GPIOs properly from
   userspace. We still have not come up with a way to actually *use*
   GPIOs from userspace.
 
 - To encourage people to use the character device for the future,
   we have it always-enabled when using GPIO. The old sysfs ABI is
   still opt-in (and can be used in parallel), but is marked as
   deprecated. We will keep it around for the foreseeable future,
   but it will not be extended to cover ever more use cases.
 
 Cleanup:
 
 - Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
   includes. This dates back to when GPIO was an opt-in feature and
   no shared library even existed: just a header file with proper
   prototypes was provided and all semantics were up to the arch to
   implement. These patches make the GPIO chip even more a proper
   device and cleans out leftovers of the old in-kernel API here
   and there. Still some cruft is left but it's very little now.
 
 - There is still some clamping of return values for .get() going
   on, but we now return sane values in the vast majority of drivers
   and the errorpath is sanitized. Some patches for powerpc, blackfin
   and unicore still drop in.
 
 - We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
   implementations to use gpiochip_add_data() and cut down on code
   lines.
 
 - MPC8xxx is converted to use the generic GPIO helpers.
 
 - ATH79 is converted to use the generic GPIO helpers.
 
 New drivers:
 
 - WinSystems WS16C48
 
 - Acces 104-DIO-48E
 
 - F81866 (a F7188x variant)
 
 - Qoric (a MPC8xxx variant)
 
 - TS-4800
 
 - SPI serializers (pisosr): simple 74xx shift registers connected
   to SPI to obtain a dirt-cheap output-only GPIO expander.
 
 - Texas Instruments TPIC2810
 
 - Texas Instruments TPS65218
 
 - Texas Instruments TPS65912
 
 - X-Gene (ARM64) standby GPIO controller
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6m24AAoJEEEQszewGV1zUasP/RpTrjRcNI5QFHjudd2oioDx
 R/IljC06Q072ZqVy/MR7QxwhoU8jUnCgKgv4rgMa1OcfHblxC2R1+YBKOUSij831
 E+SYmYDYmoMhN7j5Aslr66MXg1rLdFSdCZWemuyNruAK8bx6cTE1AWS8AELQzzTn
 Re/CPpCDbujLy0ZK2wJHgr9ZkdcBGICtDRCrOR3Kyjpwk/DSZcruK1PDN+VQMI3k
 bJlwgtGenOHINgCq/16edpwj/hzmoJXhTOZXJHI5XVR6czTwb3SvCYACvCkauI/a
 /N7b3quG88b5y0OPQPVxp5+VVl9GyVcv5oGzIfTNat/g5QinShZIT4kVV9r0xu6/
 TQHh1HlXleh+QI3yX0oRv9ztHreMf+vdpw1dhIwLqHqfJ7AWdOGk7BbKjwCrsOoq
 t/qUVFnyvooLpyr53Z5JY8+LqyynHF68G+jUQyHLgTZ0GCE+z+1jqNl1T501n3kv
 3CSlNYxSN/YUBN3cnroAIU/ZWcV4YRdxmOtEWP+7xgcdzTE6s/JHb2fuEfVHzWPf
 mHWtJGy8U0IR4VSSEln5RtjhRr0PAjTHeTOGAmivUnaIGDziTowyUVF+X5hwC77E
 DGTuLVx/Kniv173DK7xNAsUZNAETBa3fQZTgu+RfOpMiM1FZc7tI1rd7K7PjbyCc
 d2M0gcq+d11ITJTxC7OM
 =9AJ4
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO updates from Linus Walleij:
 "This is the bulk of GPIO changes for kernel v4.6.  There is quite a
  lot of interesting stuff going on.

  The patches to other subsystems and arch-wide are ACKed as far as
  possible, though I consider things like per-arch <asm/gpio.h> as
  essentially a part of the GPIO subsystem so it should not be needed.

  Core changes:

   - The gpio_chip is now a *real device*.  Until now the gpio chips
     were just piggybacking the parent device or (gasp) floating in
     space outside of the device model.

     We now finally make GPIO chips devices.  The gpio_chip will create
     a gpio_device which contains a struct device, and this gpio_device
     struct is kept private.  Anything that needs to be kept private
     from the rest of the kernel will gradually be moved over to the
     gpio_device.

   - As a result of making the gpio_device a real device, we have added
     resource management, so devm_gpiochip_add_data() will cut down on
     overhead and reduce code lines.  A huge slew of patches convert
     almost all drivers in the subsystem to use this.

   - Building on making the GPIO a real device, we add the first step of
     a new userspace ABI: the GPIO character device.  We take small
     steps here, so we first add a pure *information* ABI and the tool
     "lsgpio" that will list all GPIO devices on the system and all
     lines on these devices.

     We can now discover GPIOs properly from userspace.  We still have
     not come up with a way to actually *use* GPIOs from userspace.

   - To encourage people to use the character device for the future, we
     have it always-enabled when using GPIO.  The old sysfs ABI is still
     opt-in (and can be used in parallel), but is marked as deprecated.

     We will keep it around for the foreseeable future, but it will not
     be extended to cover ever more use cases.

  Cleanup:

   - Bjorn Helgaas removed a whole slew of per-architecture <asm/gpio.h>
     includes.

     This dates back to when GPIO was an opt-in feature and no shared
     library even existed: just a header file with proper prototypes was
     provided and all semantics were up to the arch to implement.  These
     patches make the GPIO chip even more a proper device and cleans out
     leftovers of the old in-kernel API here and there.

     Still some cruft is left but it's very little now.

   - There is still some clamping of return values for .get() going on,
     but we now return sane values in the vast majority of drivers and
     the errorpath is sanitized.  Some patches for powerpc, blackfin and
     unicore still drop in.

   - We continue to switch the ARM, MIPS, blackfin, m68k local GPIO
     implementations to use gpiochip_add_data() and cut down on code
     lines.

   - MPC8xxx is converted to use the generic GPIO helpers.

   - ATH79 is converted to use the generic GPIO helpers.

  New drivers:

   - WinSystems WS16C48

   - Acces 104-DIO-48E

   - F81866 (a F7188x variant)

   - Qoric (a MPC8xxx variant)

   - TS-4800

   - SPI serializers (pisosr): simple 74xx shift registers connected to
     SPI to obtain a dirt-cheap output-only GPIO expander.

   - Texas Instruments TPIC2810

   - Texas Instruments TPS65218

   - Texas Instruments TPS65912

   - X-Gene (ARM64) standby GPIO controller"

* tag 'gpio-v4.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (194 commits)
  Revert "Share upstreaming patches"
  gpio: mcp23s08: Fix clearing of interrupt.
  gpiolib: Fix comment referring to gpio_*() in gpiod_*()
  gpio: pca953x: Fix pca953x_gpio_set_multiple() on 64-bit
  gpio: xgene: Fix kconfig for standby GIPO contoller
  gpio: Add generic serializer DT binding
  gpio: uapi: use 0xB4 as ioctl() major
  gpio: tps65912: fix bad merge
  Revert "gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free"
  gpio: omap: drop dev field from gpio_bank structure
  gpio: mpc8xxx: Slightly update the code for better readability
  gpio: mpc8xxx: Remove *read_reg and *write_reg from struct mpc8xxx_gpio_chip
  gpio: mpc8xxx: Fixup setting gpio direction output
  gpio: mcp23s08: Add support for mcp23s18
  dt-bindings: gpio: altera: Fix altr,interrupt-type property
  gpio: add driver for MEN 16Z127 GPIO controller
  gpio: lp3943: Drop pin_used and lp3943_gpio_request/lp3943_gpio_free
  gpio: timberdale: Switch to devm_ioremap_resource()
  gpio: ts4800: Add IMX51 dependency
  gpiolib: rewrite gpiodev_add_to_list
  ...
2016-03-17 21:05:32 -07:00
Linus Torvalds
588ab3f9af arm64 updates for 4.6:
- Initial page table creation reworked to avoid breaking large block
   mappings (huge pages) into smaller ones. The ARM architecture requires
   break-before-make in such cases to avoid TLB conflicts but that's not
   always possible on live page tables
 
 - Kernel virtual memory layout: the kernel image is no longer linked to
   the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
   the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
   in physical RAM
 
 - Kernel ASLR: position independent kernel Image and modules being
   randomly mapped in the vmalloc space with the randomness is provided
   by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
   acked by Matt Fleming)
 
 - Implement relative exception tables for arm64, required by KASLR
   (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
   actual x86 conversion to deferred to 4.7 because of the merge
   dependencies)
 
 - Support for the User Access Override feature of ARMv8.2: this allows
   uaccess functions (get_user etc.) to be implemented using LDTR/STTR
   instructions. Such instructions, when run by the kernel, perform
   unprivileged accesses adding an extra level of protection. The
   set_fs() macro is used to "upgrade" such instruction to privileged
   accesses via the UAO bit
 
 - Half-precision floating point support (part of ARMv8.2)
 
 - Optimisations for CPUs with or without a hardware prefetcher (using
   run-time code patching)
 
 - copy_page performance improvement to deal with 128 bytes at a time
 
 - Sanity checks on the CPU capabilities (via CPUID) to prevent
   incompatible secondary CPUs from being brought up (e.g. weird
   big.LITTLE configurations)
 
 - valid_user_regs() reworked for better sanity check of the sigcontext
   information (restored pstate information)
 
 - ACPI parking protocol implementation
 
 - CONFIG_DEBUG_RODATA enabled by default
 
 - VDSO code marked as read-only
 
 - DEBUG_PAGEALLOC support
 
 - ARCH_HAS_UBSAN_SANITIZE_ALL enabled
 
 - Erratum workaround Cavium ThunderX SoC
 
 - set_pte_at() fix for PROT_NONE mappings
 
 - Code clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
 RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
 hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
 YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
 OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
 EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
 dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
 xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
 LepccTgykiUBqW5TRzPz
 =/oS+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Here are the main arm64 updates for 4.6.  There are some relatively
  intrusive changes to support KASLR, the reworking of the kernel
  virtual memory layout and initial page table creation.

  Summary:

   - Initial page table creation reworked to avoid breaking large block
     mappings (huge pages) into smaller ones.  The ARM architecture
     requires break-before-make in such cases to avoid TLB conflicts but
     that's not always possible on live page tables

   - Kernel virtual memory layout: the kernel image is no longer linked
     to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
     of the vmalloc space, allowing the kernel to be loaded (nearly)
     anywhere in physical RAM

   - Kernel ASLR: position independent kernel Image and modules being
     randomly mapped in the vmalloc space with the randomness is
     provided by UEFI (efi_get_random_bytes() patches merged via the
     arm64 tree, acked by Matt Fleming)

   - Implement relative exception tables for arm64, required by KASLR
     (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
     but actual x86 conversion to deferred to 4.7 because of the merge
     dependencies)

   - Support for the User Access Override feature of ARMv8.2: this
     allows uaccess functions (get_user etc.) to be implemented using
     LDTR/STTR instructions.  Such instructions, when run by the kernel,
     perform unprivileged accesses adding an extra level of protection.
     The set_fs() macro is used to "upgrade" such instruction to
     privileged accesses via the UAO bit

   - Half-precision floating point support (part of ARMv8.2)

   - Optimisations for CPUs with or without a hardware prefetcher (using
     run-time code patching)

   - copy_page performance improvement to deal with 128 bytes at a time

   - Sanity checks on the CPU capabilities (via CPUID) to prevent
     incompatible secondary CPUs from being brought up (e.g.  weird
     big.LITTLE configurations)

   - valid_user_regs() reworked for better sanity check of the
     sigcontext information (restored pstate information)

   - ACPI parking protocol implementation

   - CONFIG_DEBUG_RODATA enabled by default

   - VDSO code marked as read-only

   - DEBUG_PAGEALLOC support

   - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

   - Erratum workaround Cavium ThunderX SoC

   - set_pte_at() fix for PROT_NONE mappings

   - Code clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
  arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
  arm64: kasan: Use actual memory node when populating the kernel image shadow
  arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
  arm64: Fix misspellings in comments.
  arm64: efi: add missing frame pointer assignment
  arm64: make mrs_s prefixing implicit in read_cpuid
  arm64: enable CONFIG_DEBUG_RODATA by default
  arm64: Rework valid_user_regs
  arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
  arm64: KVM: Move kvm_call_hyp back to its original localtion
  arm64: mm: treat memstart_addr as a signed quantity
  arm64: mm: list kernel sections in order
  arm64: lse: deal with clobbered IP registers after branch via PLT
  arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
  arm64: kconfig: add submenu for 8.2 architectural features
  arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
  arm64: Add support for Half precision floating point
  arm64: Remove fixmap include fragility
  arm64: Add workaround for Cavium erratum 27456
  arm64: mm: Mark .rodata as RO
  ...
2016-03-17 20:03:47 -07:00
Andy Lutomirski
b7a584598a x86/iopl/64: Properly context-switch IOPL on Xen PV
On Xen PV, regs->flags doesn't reliably reflect IOPL and the
exit-to-userspace code doesn't change IOPL.  We need to context
switch it manually.

I'm doing this without going through paravirt because this is
specific to Xen PV.  After the dust settles, we can merge this with
the 32-bit code, tidy up the iopl syscall implementation, and remove
the set_iopl pvop entirely.

Fixes XSA-171.

Reviewewd-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/693c3bd7aeb4d3c27c92c622b7d0f554a458173c.1458162709.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:49:26 +01:00
Ingo Molnar
00f5268501 Merge branch 'x86/cleanups' into x86/urgent
Pull in some merge window leftovers.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-17 09:44:57 +01:00
Linus Torvalds
8759957b77 libnvdimm for 4.6
1/ Asynchronous address range scrub:
 Given the capacities of next generation persistent memory devices a
 scrub operation to find all poison may take 10s of seconds.  We want
 this scrub work to be done asynchronously with the rest of system
 initialization, so we move it out of line from the NFIT probing, i.e.
 acpi_nfit_add().
 
 2/ Clear poison:
 ACPI 6.1 introduces the ability to send "clear error" commands to the
 ACPI0012:00 device representing the root of an "nvdimm bus".  Similar to
 relocating a bad block on a disk, this support clears media errors in
 response to a write.
 
 3/ Persistent memory resource tracking:
 A persistent memory range may be designated as simply "reserved" by
 platform firmware in the efi/e820 memory map.  Later when the NFIT
 driver loads it discovers that the range is "Persistent Memory".  The
 NFIT bus driver inserts a resource to advertise that "persistent"
 attribute in the system resource tree for /proc/iomem and
 kernel-internal usages.
 
 4/ Miscellaneous cleanups and fixes:
 Workaround section misaligned pmem ranges when allocating a struct page
 memmap, fix handling of the read-only case in the ioctl path, and clean
 up block device major number allocation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6E0QAAoJEB7SkWpmfYgCM9EP/Aibi3BAYlv6KeCgLFXxNIyR
 Y7rA0K5UiJwzQMWmo3xQ4EOvAHVCQ33cVEdXy0zJPLnzQ+GCvaMuD/pxOB+HoZWq
 qUYdVvNomh7VzZDkbONidjuk4kwNHq8HtOo1bdGlPiXjIWEh3uop/rIShPFsRp9i
 RVByTE/9TGoDQ9Q6Aakw1GlvT75tZ36ZqwkM2jyzu1a7fmqfkfAJjjDY6gzm3/fJ
 OVv1SDGwknoTPMZFoAh5iyrzHsShw1l1nZFhP4LiulSUEYv4B1I0YNvzbmY9EkgQ
 LHg/HChXpDCfQN/68k0W7OX6rYPSNjeiX0Y+kqc9owznA32lxsdSMUHcEnGz/3ZE
 2yy0XfGMHYsXaWI514dKp1LceTvWYsuQ+NtYnDzEwMch9YjAJpOkxaJTqoRjD0rI
 2yxPamLrF1RP7r0jUw2OiMBBpf/N6NvwbIUJ4ssR87ryA8axNcs8Teeu1lgDjajS
 Xp2AKP5ViWP+lGdAJBY/fa70nSL6oyrHQlzV/3zAPyrVyhAfOTc5mHamlvzYYSBJ
 EoHDG1A0diP/E4wdiVNrD2fcKie5Vmp4Ws59OCAM8PwOJRXyRGfVB7PP+Q1DSZlc
 Tsh0QFjfGQOhS02VEaQPm7A19BYFgpTMgU6YqPOPyqVYALIqzj21Ov7+2VI73FyG
 ORqEjCAxLVto+3gjN0oD
 =F67V
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Asynchronous address range scrub:

     Given the capacities of next generation persistent memory devices a
     scrub operation to find all poison may take 10s of seconds.  We
     want this scrub work to be done asynchronously with the rest of
     system initialization, so we move it out of line from the NFIT
     probing, i.e. acpi_nfit_add().

 - Clear poison:

     ACPI 6.1 introduces the ability to send "clear error" commands to
     the ACPI0012:00 device representing the root of an "nvdimm bus".
     Similar to relocating a bad block on a disk, this support clears
     media errors in response to a write.

 - Persistent memory resource tracking:

     A persistent memory range may be designated as simply "reserved" by
     platform firmware in the efi/e820 memory map.  Later when the NFIT
     driver loads it discovers that the range is "Persistent Memory".

     The NFIT bus driver inserts a resource to advertise that
     "persistent" attribute in the system resource tree for /proc/iomem
     and kernel-internal usages.

 - Miscellaneous cleanups and fixes:

     Workaround section misaligned pmem ranges when allocating a struct
     page memmap, fix handling of the read-only case in the ioctl path,
     and clean up block device major number allocation.

* tag 'libnvdimm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits)
  libnvdimm, pmem: clear poison on write
  libnvdimm, pmem: fix kmap_atomic() leak in error path
  nvdimm/btt: don't allocate unused major device number
  nvdimm/blk: don't allocate unused major device number
  pmem: don't allocate unused major device number
  ACPI: Change NFIT driver to insert new resource
  resource: Export insert_resource and remove_resource
  resource: Add remove_resource interface
  resource: Change __request_region to inherit from immediate parent
  libnvdimm, pmem: fix ia64 build, use PHYS_PFN
  nfit, libnvdimm: clear poison command support
  libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices
  libnvdimm, pmem: adjust for section collisions with 'System RAM'
  libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces
  libnvdimm: Fix security issue with DSM IOCTL.
  libnvdimm: Clean-up access mode check.
  tools/testing/nvdimm: expand ars unit testing
  nfit: disable userspace initiated ars during scrub
  nfit: scrub and register regions in a workqueue
  nfit, libnvdimm: async region scrub workqueue
  ...
2016-03-16 17:45:56 -07:00
Linus Torvalds
63e30271b0 PCI changes for the v4.6 merge window:
Enumeration
     Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
     Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas
 
   Resource management
     Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
     Don't assign or reassign immutable resources (Bjorn Helgaas)
     Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
     Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
     Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
     ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
     ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
     Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
     Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
     rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
     designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
 
   Virtualization
     Wait for up to 1000ms after FLR reset (Alex Williamson)
     Support SR-IOV on any function type (Kelly Zytaruk)
     Add ACS quirk for all Cavium devices (Manish Jaggi)
 
   AER
     Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
     Restore pci_ops pointer while calling original pci_ops (David Daney)
     Fix aer_inject error codes (Jean Delvare)
     Use dev_warn() in aer_inject (Jean Delvare)
     Log actual error causes in aer_inject (Jean Delvare)
     Log aer_inject error injections (Jean Delvare)
 
   VPD
     Prevent VPD access for buggy devices (Babu Moger)
     Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
     Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
     Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
     Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
     Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
     Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
     Update VPD definitions (Hannes Reinecke)
     Allow access to VPD attributes with size 0 (Hannes Reinecke)
     Determine actual VPD size on first access (Hannes Reinecke)
 
   Generic host bridge driver
     Move structure definitions to separate header file (David Daney)
     Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
     Expose pci_host_common_probe() for use by other drivers (David Daney)
 
   Altera host bridge driver
     Fix altera_pcie_link_is_up() (Ley Foon Tan)
 
   Cavium ThunderX host bridge driver
     Add PCIe host driver for ThunderX processors (David Daney)
     Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)
 
   Freescale i.MX6 host bridge driver
     Add DT bindings to configure PHY Tx driver settings (Justin Waters)
     Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
     Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
     Remove broken Gen2 workaround (Lucas Stach)
     Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)
 
   Freescale Layerscape host bridge driver
     Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)
 
   Intel VMD host bridge driver
     Attach VMD resources to parent domain's resource tree (Jon Derrick)
     Set bus resource start to 0 (Keith Busch)
 
   Microsoft Hyper-V host bridge driver
     Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
     Look up IRQ domain by fwnode_handle (Jake Oshins)
     Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)
 
   NVIDIA Tegra host bridge driver
     Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
     Implement ->{add,remove}_bus() callbacks (Thierry Reding)
     Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
     Track bus -> CPU mapping (Thierry Reding)
     Remove misleading PHYS_OFFSET (Thierry Reding)
 
   Renesas R-Car host bridge driver
     Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)
 
   Synopsys DesignWare host bridge driver
     ARC: Add PCI support (Joao Pinto)
     Add generic dw_pcie_wait_for_link() (Joao Pinto)
     Add default link up check if sub-driver doesn't override (Joao Pinto)
     Add driver for prototyping kits based on ARC SDP (Joao Pinto)
 
   TI Keystone host bridge driver
     Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)
 
   Xilinx AXI host bridge driver
     Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
     Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
     Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
     Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
     microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)
 
   Xilinx NWL host bridge driver
     Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)
 
   Miscellaneous
     Check device_attach() return value always (Bjorn Helgaas)
     Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
     Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
     Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
     Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
     unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
     Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
     Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
     Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
     frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
     Move pci_dma_* helpers to common code (Christoph Hellwig)
     Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
     Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
     Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6XgMAAoJEFmIoMA60/r8Yq4P/1nNwwZPikU+9Z8k0HyGPll6
 vqXBOYj/wlbAxJTzH2weaoyUamFrwvsKaO3Vap3xHkAeTFPD/Dp0TipCCNMrZ82Z
 j1y83JJpenkRyX6ifLARCNYpOtvnvgzSrO9x7Sb2Xfqb64dPb7+jGAfOpGNzhKsO
 n1nj/L7RGx8Q6fNFGf8ANMXKTsdkdL+1pdwegjUXmD5WdOT+oW8DmqVbhyfSKwl0
 E8r4Ml2lIg7Qd5Wu5iKMIBsR0+5HEyrwV7ch92wXChwKfoRwG70qnn7FGdc0y5ZB
 XvJuj8UD5UeMxEUeoRa9SwU6wWQT3Q9e6BzMS+P+43z36SPYjMfy/Xffv054z/bY
 rQomLjuGxNLESpmfNK5JfKxWoe2YNXjHQIDWMrAHyNlwdKJbYiwPcxnZJhvOa/eB
 p0QYcGS7O43STjibG9PZhzeq8tuSJRshxi0W6iB9QlqO8qs8nJQxIO+sZj/vl4yz
 lSnswWcV9062KITl8Fe9xDw244/RTz1xSVCdldlSoDhJyeMOjRvzS8raUMyyVmbA
 YULsI3l2iCl+fwDm/T21o7hJG966oYdAmgEv7lc7BWfgEAMg//LZXvMzVvrPFB2D
 R77u/0idtOciVJrmnO/x9DnQO2hzro9SLmVH6m0+0YU4wSSpZfGn98PCrtkatOAU
 c8zT9dJgyJVE3Z7cnPJ4
 =otsF
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for v4.6:

  Enumeration:
   - Disable IO/MEM decoding for devices with non-compliant BARs (Bjorn Helgaas)
   - Mark Broadwell-EP Home Agent & PCU as having non-compliant BARs (Bjorn Helgaas

  Resource management:
   - Mark shadow copy of VGA ROM as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
   - Don't assign or reassign immutable resources (Bjorn Helgaas)
   - Don't enable/disable ROM BAR if we're using a RAM shadow copy (Bjorn Helgaas)
   - Set ROM shadow location in arch code, not in PCI core (Bjorn Helgaas)
   - Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs (Bjorn Helgaas)
   - ia64: Use ioremap() instead of open-coded equivalent (Bjorn Helgaas)
   - ia64: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - MIPS: Keep CPU physical (not virtual) addresses in shadow ROM resource (Bjorn Helgaas)
   - Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY (Bjorn Helgaas)
   - Don't leak memory if sysfs_create_bin_file() fails (Bjorn Helgaas)
   - rcar: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)
   - designware: Remove PCI_PROBE_ONLY handling (Lorenzo Pieralisi)

  Virtualization:
   - Wait for up to 1000ms after FLR reset (Alex Williamson)
   - Support SR-IOV on any function type (Kelly Zytaruk)
   - Add ACS quirk for all Cavium devices (Manish Jaggi)

  AER:
   - Rename pci_ops_aer to aer_inj_pci_ops (Bjorn Helgaas)
   - Restore pci_ops pointer while calling original pci_ops (David Daney)
   - Fix aer_inject error codes (Jean Delvare)
   - Use dev_warn() in aer_inject (Jean Delvare)
   - Log actual error causes in aer_inject (Jean Delvare)
   - Log aer_inject error injections (Jean Delvare)

  VPD:
   - Prevent VPD access for buggy devices (Babu Moger)
   - Move pci_read_vpd() and pci_write_vpd() close to other VPD code (Bjorn Helgaas)
   - Move pci_vpd_release() from header file to pci/access.c (Bjorn Helgaas)
   - Remove struct pci_vpd_ops.release function pointer (Bjorn Helgaas)
   - Rename VPD symbols to remove unnecessary "pci22" (Bjorn Helgaas)
   - Fold struct pci_vpd_pci22 into struct pci_vpd (Bjorn Helgaas)
   - Sleep rather than busy-wait for VPD access completion (Bjorn Helgaas)
   - Update VPD definitions (Hannes Reinecke)
   - Allow access to VPD attributes with size 0 (Hannes Reinecke)
   - Determine actual VPD size on first access (Hannes Reinecke)

  Generic host bridge driver:
   - Move structure definitions to separate header file (David Daney)
   - Add pci_host_common_probe(), based on gen_pci_probe() (David Daney)
   - Expose pci_host_common_probe() for use by other drivers (David Daney)

  Altera host bridge driver:
   - Fix altera_pcie_link_is_up() (Ley Foon Tan)

  Cavium ThunderX host bridge driver:
   - Add PCIe host driver for ThunderX processors (David Daney)
   - Add driver for ThunderX-pass{1,2} on-chip devices (David Daney)

  Freescale i.MX6 host bridge driver:
   - Add DT bindings to configure PHY Tx driver settings (Justin Waters)
   - Move imx6_pcie_reset_phy() near other PHY handling functions (Lucas Stach)
   - Move PHY reset into imx6_pcie_establish_link() (Lucas Stach)
   - Remove broken Gen2 workaround (Lucas Stach)
   - Move link up check into imx6_pcie_wait_for_link() (Lucas Stach)

  Freescale Layerscape host bridge driver:
   - Add "fsl,ls2085a-pcie" compatible ID (Yang Shi)

  Intel VMD host bridge driver:
   - Attach VMD resources to parent domain's resource tree (Jon Derrick)
   - Set bus resource start to 0 (Keith Busch)

  Microsoft Hyper-V host bridge driver:
   - Add fwnode_handle to x86 pci_sysdata (Jake Oshins)
   - Look up IRQ domain by fwnode_handle (Jake Oshins)
   - Add paravirtual PCI front-end for Microsoft Hyper-V VMs (Jake Oshins)

  NVIDIA Tegra host bridge driver:
   - Add pci_ops.{add,remove}_bus() callbacks (Thierry Reding)
   - Implement ->{add,remove}_bus() callbacks (Thierry Reding)
   - Remove unused struct tegra_pcie.num_ports field (Thierry Reding)
   - Track bus -> CPU mapping (Thierry Reding)
   - Remove misleading PHYS_OFFSET (Thierry Reding)

  Renesas R-Car host bridge driver:
   - Depend on ARCH_RENESAS, not ARCH_SHMOBILE (Simon Horman)

  Synopsys DesignWare host bridge driver:
   - ARC: Add PCI support (Joao Pinto)
   - Add generic dw_pcie_wait_for_link() (Joao Pinto)
   - Add default link up check if sub-driver doesn't override (Joao Pinto)
   - Add driver for prototyping kits based on ARC SDP (Joao Pinto)

  TI Keystone host bridge driver:
   - Defer probing if devm_phy_get() returns -EPROBE_DEFER (Shawn Lin)

  Xilinx AXI host bridge driver:
   - Use of_pci_get_host_bridge_resources() to parse DT (Bharat Kumar Gogada)
   - Remove dependency on ARM-specific struct hw_pci (Bharat Kumar Gogada)
   - Don't call pci_fixup_irqs() on Microblaze (Bharat Kumar Gogada)
   - Update Zynq binding with Microblaze node (Bharat Kumar Gogada)
   - microblaze: Support generic Xilinx AXI PCIe Host Bridge IP driver (Bharat Kumar Gogada)

  Xilinx NWL host bridge driver:
   - Add support for Xilinx NWL PCIe Host Controller (Bharat Kumar Gogada)

  Miscellaneous:
   - Check device_attach() return value always (Bjorn Helgaas)
   - Move pci_set_flags() from asm-generic/pci-bridge.h to linux/pci.h (Bjorn Helgaas)
   - Remove includes of empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - ARM64: Remove generated include of asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove empty asm-generic/pci-bridge.h (Bjorn Helgaas)
   - Remove includes of asm/pci-bridge.h (Bjorn Helgaas)
   - Consolidate PCI DMA constants and interfaces in linux/pci-dma-compat.h (Bjorn Helgaas)
   - unicore32: Remove unused HAVE_ARCH_PCI_SET_DMA_MASK definition (Bjorn Helgaas)
   - Cleanup pci/pcie/Kconfig whitespace (Andreas Ziegler)
   - Include pci/hotplug Kconfig directly from pci/Kconfig (Bjorn Helgaas)
   - Include pci/pcie/Kconfig directly from pci/Kconfig (Bogicevic Sasa)
   - frv: Remove stray pci_{alloc,free}_consistent() declaration (Christoph Hellwig)
   - Move pci_dma_* helpers to common code (Christoph Hellwig)
   - Add PCI_CLASS_SERIAL_USB_DEVICE definition (Heikki Krogerus)
   - Add QEMU top-level IDs for (sub)vendor & device (Robin H. Johnson)
   - Fix broken URL for Dell biosdevname (Naga Venkata Sai Indubhaskar Jupudi)"

* tag 'pci-v4.6-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (94 commits)
  PCI: Add PCI_CLASS_SERIAL_USB_DEVICE definition
  PCI: designware: Add driver for prototyping kits based on ARC SDP
  PCI: designware: Add default link up check if sub-driver doesn't override
  PCI: designware: Add generic dw_pcie_wait_for_link()
  PCI: Cleanup pci/pcie/Kconfig whitespace
  PCI: Simplify pci_create_attr() control flow
  PCI: Don't leak memory if sysfs_create_bin_file() fails
  PCI: Simplify sysfs ROM cleanup
  PCI: Remove unused IORESOURCE_ROM_COPY and IORESOURCE_ROM_BIOS_COPY
  MIPS: Loongson 3: Keep CPU physical (not virtual) addresses in shadow ROM resource
  MIPS: Loongson 3: Use temporary struct resource * to avoid repetition
  ia64/PCI: Keep CPU physical (not virtual) addresses in shadow ROM resource
  ia64/PCI: Use ioremap() instead of open-coded equivalent
  ia64/PCI: Use temporary struct resource * to avoid repetition
  PCI: Clean up pci_map_rom() whitespace
  PCI: Remove arch-specific IORESOURCE_ROM_SHADOW size from sysfs
  PCI: thunder: Add driver for ThunderX-pass{1,2} on-chip devices
  PCI: thunder: Add PCIe host driver for ThunderX processors
  PCI: generic: Expose pci_host_common_probe() for use by other drivers
  PCI: generic: Add pci_host_common_probe(), based on gen_pci_probe()
  ...
2016-03-16 14:45:55 -07:00
Linus Torvalds
277edbabf6 Power management and ACPI material for v4.6-rc1, part 1
- Redesign of cpufreq governors and the intel_pstate driver to
    make them use callbacks invoked by the scheduler to trigger CPU
    frequency evaluation instead of using per-CPU deferrable timers
    for that purpose (Rafael Wysocki).
 
  - Reorganization and cleanup of cpufreq governor code to make it
    more straightforward and fix some concurrency problems in it
    (Rafael Wysocki, Viresh Kumar).
 
  - Cleanup and improvements of locking in the cpufreq core (Viresh
    Kumar).
 
  - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
    Kumar, Eric Biggers).
 
  - intel_pstate driver updates including fixes, optimizations and a
    modification to make it enable enable hardware-coordinated P-state
    selection (HWP) by default if supported by the processor (Philippe
    Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
    Franciosi).
 
  - Operating Performance Points (OPP) framework updates to improve
    its handling of voltage regulators and device clocks and updates
    of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
 
  - Updates of the powernv cpufreq driver to fix initialization
    and cleanup problems in it and correct its worker thread handling
    with respect to CPU offline, new powernv_throttle tracepoint
    (Shilpasri Bhat).
 
  - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
 
  - ACPICA updates including one fix for a regression introduced
    by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
    David Box, Colin Ian King).
 
  - Support for installing ACPI tables from initrd (Lv Zheng).
 
  - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
    Chaugule).
 
  - Support for _HID(ACPI0010) devices (ACPI processor containers)
    and ACPI processor driver cleanups (Sudeep Holla).
 
  - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
    Aleksey Makarov).
 
  - Modification of the ACPI PCI IRQ management code to make it treat
    255 in the Interrupt Line register as "not connected" on x86 (as
    per the specification) and avoid attempts to use that value as
    a valid interrupt vector (Chen Fan).
 
  - ACPI APEI fixes related to resource leaks (Josh Hunt).
 
  - Removal of modularity from a few ACPI drivers (BGRT, GHES,
    intel_pmic_crc) that cannot be built as modules in practice (Paul
    Gortmaker).
 
  - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
    as a valid resource type (Harb Abdulhamid).
 
  - New device ID (future AMD I2C controller) in the ACPI driver for
    AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
 
  - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
 
  - cpuidle menu governor optimization to avoid a square root
    computation in it (Rasmus Villemoes).
 
  - Fix for potential use-after-free in the generic device properties
    framework (Heikki Krogerus).
 
  - Updates of the generic power domains (genpd) framework including
    support for multiple power states of a domain, fixes and debugfs
    output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
    Geert Uytterhoeven).
 
  - Intel RAPL power capping driver updates to reduce IPI overhead in
    it (Jacob Pan).
 
  - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
    Sengar).
 
  - Year 2038 fix for the process freezer (Abhilash Jindal).
 
  - turbostat utility updates including new features (decoding of more
    registers and CPUID fields, sub-second intervals support, GFX MHz
    and RC6 printout, --out command line option), fixes (syscall jitter
    detection and workaround, reductioin of the number of syscalls made,
    fixes related to Xeon x200 processors, compiler warning fixes) and
    cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
 QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
 nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
 JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
 bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
 tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
 fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
 UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
 Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
 mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
 alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
 WB+itPc4Fw0YHOrAFsrx
 =cfty
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "This time the majority of changes go into cpufreq and they are
  significant.

  First off, the way CPU frequency updates are triggered is different
  now.  Instead of having to set up and manage a deferrable timer for
  each CPU in the system to evaluate and possibly change its frequency
  periodically, cpufreq governors set up callbacks to be invoked by the
  scheduler on a regular basis (basically on utilization updates).  The
  "old" governors, "ondemand" and "conservative", still do all of their
  work in process context (although that is triggered by the scheduler
  now), but intel_pstate does it all in the callback invoked by the
  scheduler with no need for any additional asynchronous processing.

  Of course, this eliminates the overhead related to the management of
  all those timers, but also it allows the cpufreq governor code to be
  simplified quite a bit.  On top of that, the common code and data
  structures used by the "ondemand" and "conservative" governors are
  cleaned up and made more straightforward and some long-standing and
  quite annoying problems are addressed.  In particular, the handling of
  governor sysfs attributes is modified and the related locking becomes
  more fine grained which allows some concurrency problems to be avoided
  (particularly deadlocks with the core cpufreq code).

  In principle, the new mechanism for triggering frequency updates
  allows utilization information to be passed from the scheduler to
  cpufreq.  Although the current code doesn't make use of it, in the
  works is a new cpufreq governor that will make decisions based on the
  scheduler's utilization data.  That should allow the scheduler and
  cpufreq to work more closely together in the long run.

  In addition to the core and governor changes, cpufreq drivers are
  updated too.  Fixes and optimizations go into intel_pstate, the
  cpufreq-dt driver is updated on top of some modification in the
  Operating Performance Points (OPP) framework and there are fixes and
  other updates in the powernv cpufreq driver.

  Apart from the cpufreq updates there is some new ACPICA material,
  including a fix for a problem introduced by previous ACPICA updates,
  and some less significant changes in the ACPI code, like CPPC code
  optimizations, ACPI processor driver cleanups and support for loading
  ACPI tables from initrd.

  Also updated are the generic power domains framework, the Intel RAPL
  power capping driver and the turbostat utility and we have a bunch of
  traditional assorted fixes and cleanups.

  Specifics:

   - Redesign of cpufreq governors and the intel_pstate driver to make
     them use callbacks invoked by the scheduler to trigger CPU
     frequency evaluation instead of using per-CPU deferrable timers for
     that purpose (Rafael Wysocki).

   - Reorganization and cleanup of cpufreq governor code to make it more
     straightforward and fix some concurrency problems in it (Rafael
     Wysocki, Viresh Kumar).

   - Cleanup and improvements of locking in the cpufreq core (Viresh
     Kumar).

   - Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
     Kumar, Eric Biggers).

   - intel_pstate driver updates including fixes, optimizations and a
     modification to make it enable enable hardware-coordinated P-state
     selection (HWP) by default if supported by the processor (Philippe
     Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
     Franciosi).

   - Operating Performance Points (OPP) framework updates to improve its
     handling of voltage regulators and device clocks and updates of the
     cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).

   - Updates of the powernv cpufreq driver to fix initialization and
     cleanup problems in it and correct its worker thread handling with
     respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
     Bhat).

   - ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).

   - ACPICA updates including one fix for a regression introduced by
     previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
     Colin Ian King).

   - Support for installing ACPI tables from initrd (Lv Zheng).

   - Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
     Chaugule).

   - Support for _HID(ACPI0010) devices (ACPI processor containers) and
     ACPI processor driver cleanups (Sudeep Holla).

   - Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
     Aleksey Makarov).

   - Modification of the ACPI PCI IRQ management code to make it treat
     255 in the Interrupt Line register as "not connected" on x86 (as
     per the specification) and avoid attempts to use that value as a
     valid interrupt vector (Chen Fan).

   - ACPI APEI fixes related to resource leaks (Josh Hunt).

   - Removal of modularity from a few ACPI drivers (BGRT, GHES,
     intel_pmic_crc) that cannot be built as modules in practice (Paul
     Gortmaker).

   - PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
     as a valid resource type (Harb Abdulhamid).

   - New device ID (future AMD I2C controller) in the ACPI driver for
     AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).

   - Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).

   - cpuidle menu governor optimization to avoid a square root
     computation in it (Rasmus Villemoes).

   - Fix for potential use-after-free in the generic device properties
     framework (Heikki Krogerus).

   - Updates of the generic power domains (genpd) framework including
     support for multiple power states of a domain, fixes and debugfs
     output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
     Geert Uytterhoeven).

   - Intel RAPL power capping driver updates to reduce IPI overhead in
     it (Jacob Pan).

   - System suspend/hibernation code cleanups (Eric Biggers, Saurabh
     Sengar).

   - Year 2038 fix for the process freezer (Abhilash Jindal).

   - turbostat utility updates including new features (decoding of more
     registers and CPUID fields, sub-second intervals support, GFX MHz
     and RC6 printout, --out command line option), fixes (syscall jitter
     detection and workaround, reductioin of the number of syscalls
     made, fixes related to Xeon x200 processors, compiler warning
     fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"

* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  ACPI / APEI: ERST: Fixed leaked resources in erst_init
  ACPI / APEI: Fix leaked resources
  intel_pstate: Do not skip samples partially
  intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
  intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
  ...
2016-03-16 14:10:53 -07:00
Linus Torvalds
10dc374766 One of the largest releases for KVM... Hardly any generic improvement,
but lots of architecture-specific changes.
 
 * ARM:
 - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - various optimizations to the vgic save/restore code.
 
 * PPC:
 - enabled KVM-VFIO integration ("VFIO device")
 - optimizations to speed up IPIs between vcpus
 - in-kernel handling of IOMMU hypercalls
 - support for dynamic DMA windows (DDW).
 
 * s390:
 - provide the floating point registers via sync regs;
 - separated instruction vs. data accesses
 - dirty log improvements for huge guests
 - bugfixes and documentation improvements.
 
 * x86:
 - Hyper-V VMBus hypercall userspace exit
 - alternative implementation of lowest-priority interrupts using vector
 hashing (for better VT-d posted interrupt support)
 - fixed guest debugging with nested virtualizations
 - improved interrupt tracking in the in-kernel IOAPIC
 - generic infrastructure for tracking writes to guest memory---currently
 its only use is to speedup the legacy shadow paging (pre-EPT) case, but
 in the future it will be used for virtual GPUs as well
 - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
 yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
 oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
 tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
 Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
 WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
 =OUZZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "One of the largest releases for KVM...  Hardly any generic
  changes, but lots of architecture-specific updates.

  ARM:
   - VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
   - PMU support for guests
   - 32bit world switch rewritten in C
   - various optimizations to the vgic save/restore code.

  PPC:
   - enabled KVM-VFIO integration ("VFIO device")
   - optimizations to speed up IPIs between vcpus
   - in-kernel handling of IOMMU hypercalls
   - support for dynamic DMA windows (DDW).

  s390:
   - provide the floating point registers via sync regs;
   - separated instruction vs.  data accesses
   - dirty log improvements for huge guests
   - bugfixes and documentation improvements.

  x86:
   - Hyper-V VMBus hypercall userspace exit
   - alternative implementation of lowest-priority interrupts using
     vector hashing (for better VT-d posted interrupt support)
   - fixed guest debugging with nested virtualizations
   - improved interrupt tracking in the in-kernel IOAPIC
   - generic infrastructure for tracking writes to guest
     memory - currently its only use is to speedup the legacy shadow
     paging (pre-EPT) case, but in the future it will be used for
     virtual GPUs as well
   - much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
  KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
  KVM: x86: disable MPX if host did not enable MPX XSAVE features
  arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
  arm64: KVM: vgic-v3: Reset LRs at boot time
  arm64: KVM: vgic-v3: Do not save an LR known to be empty
  arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
  arm64: KVM: vgic-v3: Avoid accessing ICH registers
  KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
  KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
  KVM: arm/arm64: vgic-v2: Reset LRs at boot time
  KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
  KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
  KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
  KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
  KVM: s390: allocate only one DMA page per VM
  KVM: s390: enable STFLE interpretation only if enabled for the guest
  KVM: s390: wake up when the VCPU cpu timer expires
  KVM: s390: step the VCPU timer while in enabled wait
  KVM: s390: protect VCPU cpu timer with a seqcount
  KVM: s390: step VCPU cpu timer during kvm_run ioctl
  ...
2016-03-16 09:55:35 -07:00
Tony Luck
cbf8b5a2b6 x86/mm, x86/mce: Fix return type/value for memcpy_mcsafe()
Returning a 'bool' was very unpopular. Doubly so because the
code was just wrong (returning zero for true, one for false;
great for shell programming, not so good for C).

Change return type to "int". Keep zero as the success indicator
because it matches other similar code and people may be more
comfortable writing:

	if (memcpy_mcsafe(to, from, count)) {
		printk("Sad panda, copy failed\n");
		...
	}

Make the failure return value -EFAULT for now.

Reported by: Mika Penttilä <mika.penttila@nextfour.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: mika.penttila@nextfour.com
Fixes: 92b0729c34 ("x86/mm, x86/mce: Add memcpy_mcsafe()")
Link: http://lkml.kernel.org/r/695f14233fa7a54fcac4406c706d7fec228e3f4c.1457993040.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-16 09:02:18 +01:00
Linus Torvalds
df2e37c814 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
 "The 4.6 pile of irq updates contains:

   - Support for IPI irqdomains to support proper integration of IPIs to
     and from coprocessors.  The first user of this new facility is
     MIPS.  The relevant MIPS patches come with the core to avoid merge
     ordering issues and have been acked by Ralf.

   - A new command line option to set the default interrupt affinity
     mask at boot time.

   - Support for some more new ARM and MIPS interrupt controllers:
     tango, alpine-msix and bcm6345-l1

   - Two small cleanups for x86/apic which we merged into irq/core to
     avoid yet another branch in x86 with two tiny commits.

   - The usual set of updates, cleanups in drivers/irqchip.  Mostly in
     the area of ARM-GIC, arada-37-xp and atmel chips.  Nothing
     outstanding here"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  irqchip/irq-alpine-msi: Release the correct domain on error
  irqchip/mxs: Fix error check of of_io_request_and_map()
  irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
  genirq: Export IRQ functions for module use
  irqchip/gic/realview: Support more RealView DCC variants
  Documentation/bindings: Document the Alpine MSIX driver
  irqchip: Add the Alpine MSIX interrupt controller
  irqchip/gic-v3: Always return IRQ_SET_MASK_OK_DONE in gic_set_affinity
  irqchip/gic-v3-its: Mark its_init() and its children as __init
  irqchip/gic-v3: Remove gic_root_node variable from the ITS code
  irqchip/gic-v3: ACPI: Add redistributor support via GICC structures
  irqchip/gic-v3: Add ACPI support for GICv3/4 initialization
  irqchip/gic-v3: Refactor gic_of_init() for GICv3 driver
  x86/apic: Deinline _flat_send_IPI_mask, save ~150 bytes
  x86/apic: Deinline __default_send_IPI_*, save ~200 bytes
  dt-bindings: interrupt-controller: Add SoC-specific compatible string to Marvell ODMI
  irqchip/mips-gic: Add new DT property to reserve IPIs
  MIPS: Delete smp-gic.c
  MIPS: Make smp CMP, CPS and MT use the new generic IPI functions
  MIPS: Add generic SMP IPI support
  ...
2016-03-15 12:48:48 -07:00
Linus Torvalds
8a284c062e Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer department delivers this time:

   - Support for cross clock domain timestamps in the core code plus a
     first user.  That allows more precise timestamping for PTP and
     later for audio and other peripherals.

     The ptp/e1000e patches have been acked by the relevant maintainers
     and are carried in the timer tree to avoid merge ordering issues.

   - Support for unregistering the current clocksource watchdog.  That
     lifts a limitation for switching clocksources which has been there
     from day 1

   - The usual pile of fixes and updates to the core and the drivers.
     Nothing outstanding and exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
  time/timekeeping: Work around false positive GCC warning
  e1000e: Adds hardware supported cross timestamp on e1000e nic
  ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
  x86/tsc: Always Running Timer (ART) correlated clocksource
  hrtimer: Revert CLOCK_MONOTONIC_RAW support
  time: Add history to cross timestamp interface supporting slower devices
  time: Add driver cross timestamp interface for higher precision time synchronization
  time: Remove duplicated code in ktime_get_raw_and_real()
  time: Add timekeeping snapshot code capturing system time and counter
  time: Add cycles to nanoseconds translation
  jiffies: Use CLOCKSOURCE_MASK instead of constant
  clocksource: Introduce clocksource_freq2mult()
  clockevents/drivers/exynos_mct: Implement ->set_state_oneshot_stopped()
  clockevents/drivers/arm_global_timer: Implement ->set_state_oneshot_stopped()
  clockevents/drivers/arm_arch_timer: Implement ->set_state_oneshot_stopped()
  clocksource/drivers/arm_global_timer: Register delay timer
  clocksource/drivers/lpc32xx: Support timer-based ARM delay
  clocksource/drivers/lpc32xx: Support periodic mode
  clocksource/drivers/lpc32xx: Don't use the prescaler counter for clockevents
  clocksource/drivers/rockchip: Add err handle for rk_timer_init
  ...
2016-03-15 12:13:56 -07:00
Linus Torvalds
8ab84ef699 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar:
 "Intel Quark and Geode SoC platform updates"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/intel/quark: Drop IMR lock bit support
  x86/platform/intel/mid: Remove dead code
  x86/platform: Make platform/geode/net5501.c explicitly non-modular
  x86/platform: Make platform/geode/alix.c explicitly non-modular
  x86/platform: Make platform/geode/geos.c explicitly non-modular
  x86/platform: Make platform/intel-quark/imr_selftest.c explicitly non-modular
  x86/platform: Make platform/intel-quark/imr.c explicitly non-modular
2016-03-15 11:20:44 -07:00
Linus Torvalds
13c76ad872 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Enable full ASLR randomization for 32-bit programs (Hector
     Marco-Gisbert)

   - Add initial minimal INVPCI support, to flush global mappings (Andy
     Lutomirski)

   - Add KASAN enhancements (Andrey Ryabinin)

   - Fix mmiotrace for huge pages (Karol Herbst)

   - ... misc cleanups and small enhancements"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm/32: Enable full randomization on i386 and X86_32
  x86/mm/kmmio: Fix mmiotrace for hugepages
  x86/mm: Avoid premature success when changing page attributes
  x86/mm/ptdump: Remove paravirt_enabled()
  x86/mm: Fix INVPCID asm constraint
  x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()
  x86/mm: If INVPCID is available, use it to flush global mappings
  x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
  x86/mm: Add INVPCID helpers
  x86/kasan: Write protect kasan zero shadow
  x86/kasan: Clear kasan_zero_page after TLB flush
  x86/mm/numa: Check for failures in numa_clear_kernel_node_hotplug()
  x86/mm/numa: Clean up numa_clear_kernel_node_hotplug()
  x86/mm: Make kmap_prot into a #define
  x86/mm/32: Set NX in __supported_pte_mask before enabling paging
  x86/mm: Streamline and restore probe_memory_block_size()
2016-03-15 10:45:39 -07:00
Linus Torvalds
9cf8d6360c Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Ingo Molnar:
 "The biggest change in this cycle was the separation of the microcode
  loading mechanism from the initrd code plus the support of built-in
  microcode images.

  There were also lots cleanups and general restructuring (by Borislav
  Petkov)"

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/microcode/intel: Drop orig_sum from ext signature checksum
  x86/microcode/intel: Improve microcode sanity-checking error messages
  x86/microcode/intel: Merge two consecutive if-statements
  x86/microcode/intel: Get rid of DWSIZE
  x86/microcode/intel: Change checksum variables to u32
  x86/microcode: Use kmemdup() rather than duplicating its implementation
  x86/microcode: Remove unnecessary paravirt_enabled check
  x86/microcode: Document builtin microcode loading method
  x86/microcode/AMD: Issue microcode updated message later
  x86/microcode/intel: Cleanup get_matching_model_microcode()
  x86/microcode/intel: Remove unused arg of get_matching_model_microcode()
  x86/microcode/intel: Rename mc_saved_in_initrd
  x86/microcode/intel: Use *wrmsrl variants
  x86/microcode/intel: Cleanup apply_microcode_intel()
  x86/microcode/intel: Move the BUG_ON up and turn it into WARN_ON
  x86/microcode/intel: Rename mc_intel variable to mc
  x86/microcode/intel: Rename mc_saved_count to num_saved
  x86/microcode/intel: Rename local variables of type struct mc_saved_data
  x86/microcode/AMD: Drop redundant printk prefix
  x86/microcode: Issue update message only once
  ...
2016-03-15 10:39:22 -07:00
Linus Torvalds
ecc026bff6 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "The biggest change in terms of impact is the changing of the FPU
  context switch model to 'eagerfpu' for all CPU types, via: commit
  58122bf1d8: "x86/fpu: Default eagerfpu=on on all CPUs"

  This makes all FPU saves and restores synchronous and makes the FPU
  code a lot more obvious to read.  In the next cycle, if this change is
  problem free, we'll remove the old lazy FPU restore code altogether.

  This change flushed out some old bugs, which should all be fixed by
  now, BYMMV"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Default eagerfpu=on on all CPUs
  x86/fpu: Speed up lazy FPU restores slightly
  x86/fpu: Fold fpu_copy() into fpu__copy()
  x86/fpu: Fix FNSAVE usage in eagerfpu mode
  x86/fpu: Fix math emulation in eager fpu mode
2016-03-15 10:23:56 -07:00
Linus Torvalds
ba33ea811e Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "This is another big update. Main changes are:

   - lots of x86 system call (and other traps/exceptions) entry code
     enhancements.  In particular the complex parts of the 64-bit entry
     code have been migrated to C code as well, and a number of dusty
     corners have been refreshed.  (Andy Lutomirski)

   - vDSO special mapping robustification and general cleanups (Andy
     Lutomirski)

   - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

   - lots of other changes ..."

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
  x86/cpufeature: Enable new AVX-512 features
  x86/entry/traps: Show unhandled signal for i386 in do_trap()
  x86/entry: Call enter_from_user_mode() with IRQs off
  x86/entry/32: Change INT80 to be an interrupt gate
  x86/entry: Improve system call entry comments
  x86/entry: Remove TIF_SINGLESTEP entry work
  x86/entry/32: Add and check a stack canary for the SYSENTER stack
  x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
  x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
  x86/entry: Vastly simplify SYSENTER TF (single-step) handling
  x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
  x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
  x86/entry/32: Restore FLAGS on SYSEXIT
  x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
  x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
  selftests/x86: In syscall_nt, test NT|TF as well
  x86/asm-offsets: Remove PARAVIRT_enabled
  x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
  uprobes: __create_xol_area() must nullify xol_mapping.fault
  x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
  ...
2016-03-15 09:32:27 -07:00
Bjorn Helgaas
cfeb8139a1 Merge branch 'pci/host-hv' into next
* pci/host-hv:
  PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs
  PCI: Look up IRQ domain by fwnode_handle
  PCI: Add fwnode_handle to x86 pci_sysdata
2016-03-15 08:56:16 -05:00
Linus Torvalds
d88bfe1d68 Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "Various RAS updates:

   - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable
     MCA) error decoding support (Aravind Gopalakrishnan)

   - x86 memcpy_mcsafe() support, to enable smart(er) hardware error
     recovery in NVDIMM drivers, based on an extension of the x86
     exception handling code.  (Tony Luck)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  EDAC/sb_edac: Fix computation of channel address
  x86/mm, x86/mce: Add memcpy_mcsafe()
  x86/mce/AMD: Document some functionality
  x86/mce: Clarify comments regarding deferred error
  x86/mce/AMD: Fix logic to obtain block address
  x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
  x86/mce: Move MCx_CONFIG MSR definitions
  x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries
  x86/mm: Expand the exception table logic to allow new handling options
  x86/mce/AMD: Set MCAX Enable bit
  x86/mce/AMD: Carve out threshold block preparation
  x86/mce/AMD: Fix LVT offset configuration for thresholding
  x86/mce/AMD: Reduce number of blocks scanned per bank
  x86/mce/AMD: Do not perform shared bank check for future processors
  x86/mce: Fix order of AMD MCE init function call
2016-03-14 18:43:51 -07:00
Linus Torvalds
e71c2c1eeb Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main kernel side changes:

   - Big reorganization of the x86 perf support code.  The old code grew
     organically deep inside arch/x86/kernel/cpu/perf* and its naming
     became somewhat messy.

     The new location is under arch/x86/events/, using the following
     cleaner hierarchy of source code files:

       perf/x86: Move perf_event.c .................. => x86/events/core.c
       perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
       perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
       perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
       perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
       perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
       perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
       perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
       perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
       perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
       perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
       perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
       perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
       perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
       perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
       perf/x86: Move perf_event_intel_uncore_snb.c   => x86/events/intel/uncore_snb.c
       perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
       perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
       perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
       perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
       perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

     (Borislav Petkov)

   - Update various x86 PMU constraint and hw support details (Stephane
     Eranian)

   - Optimize kprobes for BPF execution (Martin KaFai Lau)

   - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
     Gleixner)

   - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

   - Various fixes and smaller cleanups.

  There are lots of perf tooling updates as well.  A few highlights:

  perf report/top:

     - Hierarchy histogram mode for 'perf top' and 'perf report',
       showing multiple levels, one per --sort entry: (Namhyung Kim)

       On a mostly idle system:

         # perf top --hierarchy -s comm,dso

       Then expand some levels and use 'P' to take a snapshot:

         # cat perf.hist.0
         -  92.32%         perf
               58.20%         perf
               22.29%         libc-2.22.so
                5.97%         [kernel]
                4.18%         libelf-0.165.so
                1.69%         [unknown]
         -   4.71%         qemu-system-x86
                3.10%         [kernel]
                1.60%         qemu-system-x86_64 (deleted)
         +   2.97%         swapper
         #

     - Add 'L' hotkey to dynamicly set the percent threshold for
       histogram entries and callchains, i.e.  dynamicly do what the
       --percent-limit command line option to 'top' and 'report' does.
       (Namhyung Kim)

  perf mem:

     - Allow specifying events via -e in 'perf mem record', also listing
       what events can be specified via 'perf mem record -e list' (Jiri
       Olsa)

  perf record:

     - Add 'perf record' --all-user/--all-kernel options, so that one
       can tell that all the events in the command line should be
       restricted to the user or kernel levels (Jiri Olsa), i.e.:

         perf record -e cycles:u,instructions:u

       is equivalent to:

         perf record --all-user -e cycles,instructions

     - Make 'perf record' collect CPU cache info in the perf.data file header:

         $ perf record usleep 1
         [ perf record: Woken up 1 times to write data ]
         [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
         $ perf report --header-only -I | tail -10 | head -8
         # CPU cache info:
         #  L1 Data                 32K [0-1]
         #  L1 Instruction          32K [0-1]
         #  L1 Data                 32K [2-3]
         #  L1 Instruction          32K [2-3]
         #  L2 Unified             256K [0-1]
         #  L2 Unified             256K [2-3]
         #  L3 Unified            4096K [0-3]

       Will be used in 'perf c2c' and eventually in 'perf diff' to
       allow, for instance running the same workload in multiple
       machines and then when using 'diff' show the hardware difference.
       (Jiri Olsa)

     - Improved support for Java, using the JVMTI agent library to do
       jitdumps that then will be inserted in synthesized
       PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
       ELF files stored in ~/.debug and keyed with build-ids, to allow
       symbol resolution and even annotation with source line info, see
       the changeset comments to see how to use it (Stephane Eranian)

  perf script/trace:

     - Decode data_src values (e.g.  perf.data files generated by 'perf
       mem record') in 'perf script': (Jiri Olsa)

         # perf script
           perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No <SNIP>
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     - Improve support to 'data_src', 'weight' and 'addr' fields in
       'perf script' (Jiri Olsa)

     - Handle empty print fmts in 'perf script -s' i.e. when running
       python or perl scripts (Taeung Song)

  perf stat:

     - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
       interval mode too.  E.g:

         # perf stat -I 1000 -e instructions,cycles sleep 1
         #         time   counts unit events
            1.000215928  519,620      instructions     #  0.69 insn per cycle
            1.000215928  752,003      cycles
         <SNIP>

     - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

     - Implement CSV metrics output in 'perf stat' (Andi Kleen)

  perf BPF support:

     - Support converting data from bpf events in 'perf data' (Wang Nan)

     - Print bpf-output events in 'perf script': (Wang Nan).

         # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
         # perf script
            usleep  4882 21384.532523:   evt:  ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
             BPF output: 0000: 52 61 69 73 65 20 61 20  Raise a
                         0008: 42 50 46 20 65 76 65 6e  BPF even
                         0010: 74 21 00 00              t!..
             BPF string: "Raise a BPF event!"
         #

     - Add API to set values of map entries in a BPF object, be it
       individual map slots or ranges (Wang Nan)

     - Introduce support for the 'bpf-output' event (Wang Nan)

     - Add glue to read perf events in a BPF program (Wang Nan)

     - Improve support for bpf-output events in 'perf trace' (Wang Nan)

  ... and tons of other changes as well - see the shortlog and git log
  for details!"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
  perf stat: Add --metric-only support for -A
  perf stat: Implement --metric-only mode
  perf stat: Document CSV format in manpage
  perf hists browser: Check sort keys before hot key actions
  perf hists browser: Allow thread filtering for comm sort key
  perf tools: Add sort__has_comm variable
  perf tools: Recalc total periods using top-level entries in hierarchy
  perf tools: Remove nr_sort_keys field
  perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
  perf tools: Remove hist_entry->fmt field
  perf tools: Fix command line filters in hierarchy mode
  perf tools: Add more sort entry check functions
  perf tools: Fix hist_entry__filter() for hierarchy
  perf jitdump: Build only on supported archs
  tools lib traceevent: Add '~' operation within arg_num_eval()
  perf tools: Omit unnecessary cast in perf_pmu__parse_scale
  perf tools: Pass perf_hpp_list all the way through setup_sort_list
  perf tools: Fix perf script python database export crash
  perf jitdump: DWARF is also needed
  perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
  ...
2016-03-14 17:58:53 -07:00
Linus Torvalds
d09e356ad0 Merge branch 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull read-only kernel memory updates from Ingo Molnar:
 "This tree adds two (security related) enhancements to the kernel's
  handling of read-only kernel memory:

   - extend read-only kernel memory to a new class of formerly writable
     kernel data: 'post-init read-only memory' via the __ro_after_init
     attribute, and mark the ARM and x86 vDSO as such read-only memory.

     This kind of attribute can be used for data that requires a once
     per bootup initialization sequence, but is otherwise never modified
     after that point.

     This feature was based on the work by PaX Team and Brad Spengler.

     (by Kees Cook, the ARM vDSO bits by David Brown.)

   - make CONFIG_DEBUG_RODATA always enabled on x86 and remove the
     Kconfig option.  This simplifies the kernel and also signals that
     read-only memory is the default model and a first-class citizen.
     (Kees Cook)"

* 'mm-readonly-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ARM/vdso: Mark the vDSO code read-only after init
  x86/vdso: Mark the vDSO code read-only after init
  lkdtm: Verify that '__ro_after_init' works correctly
  arch: Introduce post-init read-only memory
  x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
  mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
  asm-generic: Consolidate mark_rodata_ro()
2016-03-14 16:58:50 -07:00
Linus Torvalds
fbed0bc091 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
 "Various updates:

   - Futex scalability improvements: remove page lock use for shared
     futex get_futex_key(), which speeds up 'perf bench futex hash'
     benchmarks by over 40% on a 60-core Westmere.  This makes anon-mem
     shared futexes perform close to private futexes.  (Mel Gorman)

   - lockdep hash collision detection and fix (Alfredo Alvarez
     Fernandez)

   - lockdep testing enhancements (Alfredo Alvarez Fernandez)

   - robustify lockdep init by using hlists (Andrew Morton, Andrey
     Ryabinin)

   - mutex and csd_lock micro-optimizations (Davidlohr Bueso)

   - small x86 barriers tweaks (Michael S Tsirkin)

   - qspinlock updates (Waiman Long)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/csd_lock: Use smp_cond_acquire() in csd_lock_wait()
  locking/csd_lock: Explicitly inline csd_lock*() helpers
  futex: Replace barrier() in unqueue_me() with READ_ONCE()
  locking/lockdep: Detect chain_key collisions
  locking/lockdep: Prevent chain_key collisions
  tools/lib/lockdep: Fix link creation warning
  tools/lib/lockdep: Add tests for AA and ABBA locking
  tools/lib/lockdep: Add userspace version of READ_ONCE()
  tools/lib/lockdep: Fix the build on recent kernels
  locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
  locking/mutex: Allow next waiter lockless wakeup
  locking/pvqspinlock: Enable slowpath locking count tracking
  locking/qspinlock: Use smp_cond_acquire() in pending code
  locking/pvqspinlock: Move lock stealing count tracking code into pv_queued_spin_steal_lock()
  locking/mcs: Fix mcs_spin_lock() ordering
  futex: Remove requirement for lock_page() in get_futex_key()
  futex: Rename barrier references in ordering guarantees
  locking/atomics: Update comment about READ_ONCE() and structures
  locking/lockdep: Eliminate lockdep_init()
  locking/lockdep: Convert hash tables to hlists
  ...
2016-03-14 15:50:44 -07:00
Rafael J. Wysocki
0d571b62dd Merge branch 'pm-tools'
* pm-tools:
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  tools/power turbostat: Decode MSR_MISC_PWR_MGMT
  tools/power turbostat: decode HWP registers
  x86 msr-index: Simplify syntax for HWP fields
  tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
  tools/power turbostat: decode more CPUID fields
2016-03-14 14:22:34 +01:00
Alexander Duyck
1e94082963 ipv6: Pass proto to csum_ipv6_magic as __u8 instead of unsigned short
This patch updates csum_ipv6_magic so that it correctly recognizes that
protocol is a unsigned 8 bit value.

This will allow us to better understand what limitations may or may not be
present in how we handle the data.  For example there are a number of
places that call htonl on the protocol value.  This is likely not necessary
and can be replaced with a multiplication by ntohl(1) which will be
converted to a shift by the compiler.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 23:55:13 -04:00
Alexander Duyck
01cfbad79a ipv4: Update parameters for csum_tcpudp_magic to their original types
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs.  For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value.  The length is usually populated based
on skb->len which is an unsigned integer.

This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.

With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.

I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.

I also updated a few spots in terms of whitespace and type differences for
the addresses.  Most of this was just to make sure all of the definitions
were in sync going forward.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 23:55:13 -04:00
Rafael J. Wysocki
3fdb74649b Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools
Pull turbostat updates for 4.6 from Len Brown.

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: bugfix: TDP MSRs print bits fixing
  tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
  tools/power turbostat: call __cpuid() instead of __get_cpuid()
  tools/power turbostat: indicate SMX and SGX support
  tools/power turbostat: detect and work around syscall jitter
  tools/power turbostat: show GFX%rc6
  tools/power turbostat: show GFXMHz
  tools/power turbostat: show IRQs per CPU
  tools/power turbostat: make fewer systems calls
  tools/power turbostat: fix compiler warnings
  tools/power turbostat: add --out option for saving output in a file
  tools/power turbostat: re-name "%Busy" field to "Busy%"
  tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
  tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
  tools/power turbostat: allow sub-sec intervals
  tools/power turbostat: Decode MSR_MISC_PWR_MGMT
  tools/power turbostat: decode HWP registers
  x86 msr-index: Simplify syntax for HWP fields
  tools/power turbostat: CPUID(0x16) leaf shows base, max, and bus frequency
  tools/power turbostat: decode more CPUID fields
2016-03-14 02:13:05 +01:00
Fenghua Yu
d050049442 x86/cpufeature: Enable new AVX-512 features
A few new AVX-512 instruction groups/features are added in cpufeatures.h
for enuermation: AVX512DQ, AVX512BW, and AVX512VL.

Clear the flags in fpu__xstate_clear_all_cpu_caps().

The specification for latest AVX-512 including the features can be found at:

  https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up
the flags and enable them in KVM.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com
[ Added more detailed feature descriptions. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-12 17:30:53 +01:00
Dave Hansen
0d47638f80 x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
Kirill Shutemov pointed this out to me.

The tip tree currently has commit:

	dfb4a70f2 [x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions]

whioch added support for two new CPUID bits: X86_FEATURE_PKU and
X86_FEATURE_OSPKE.  But, those bits were mis-merged and put in
cpufeature.h instead of cpufeatures.h.

This didn't cause any breakage *except* it keeps the "ospke" and
"pku" bits from showing up in cpuinfo.

Now cpuinfo has the two new flags:

	flags	: ...  pku ospke

BTW, is it really wise to have cpufeature.h and cpufeatures.h?
It seems like they can only cause confusion and mahem with tab
completion.

Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160310221213.06F9DB53@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-11 09:55:57 +01:00
Andy Lutomirski
9999c8c01f x86/entry: Call enter_from_user_mode() with IRQs off
Now that slow-path syscalls always enter C before enabling
interrupts, it's straightforward to call enter_from_user_mode() before
enabling interrupts rather than doing it as part of entry tracing.

With this change, we should finally be able to retire exception_enter().

This will also enable optimizations based on knowing that we never
change context tracking state with interrupts on.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:53:26 +01:00
Yu-cheng Yu
a65050c6f1 x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")
Leonid Shatz noticed that the SDM interpretation of the following
recent commit:

  394db20ca2 ("x86/fpu: Disable AVX when eagerfpu is off")

... is incorrect and that the original behavior of the FPU code was correct.

Because AVX is not stated in CR0 TS bit description, it was mistakenly
believed to be not supported for lazy context switch. This turns out
to be false:

  Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers:

   'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/
    MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until
    an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed
    by the new task.'

  Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception
  Specification:

   'AVX instructions refer to exceptions by classes that include #NM
    "Device Not Available" exception for lazy context switch.'

So revert the commit.

Reported-by: Leonid Shatz <leonid.shatz@ravellosystems.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 10:15:58 +01:00
Andy Lutomirski
392a62549f x86/entry: Remove TIF_SINGLESTEP entry work
Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into
regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall
entry code.  Remove it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
Andy Lutomirski
2a41aa4feb x86/entry/32: Add and check a stack canary for the SYSENTER stack
The first instruction of the SYSENTER entry runs on its own tiny
stack.  That stack can be used if a #DB or NMI is delivered before
the SYSENTER prologue switches to a real stack.

We have code in place to prevent us from overflowing the tiny stack.
For added paranoia, add a canary to the stack and check it in
do_debug() -- that way, if something goes wrong with the #DB logic,
we'll eventually notice.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
Andy Lutomirski
6dcc94149d x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
The SYSENTER stack is only used on 32-bit kernels.  Remove it on 64-bit kernels.

( We may end up using it down the road on 64-bit kernels. If so,
  we'll re-enable it for CONFIG_IA32_EMULATION. )

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/9dbd18429f9ff61a76b6eda97a9ea20510b9f6ba.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:14 +01:00
Andy Lutomirski
f2b375756c x86/entry: Vastly simplify SYSENTER TF (single-step) handling
Due to a blatant design error, SYSENTER doesn't clear TF (single-step).

As a result, if a user does SYSENTER with TF set, we will single-step
through the kernel until something clears TF.  There is absolutely
nothing we can do to prevent this short of turning off SYSENTER [1].

Simplify the handling considerably with two changes:

  1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
     add TF to that list of flags to sanitize with no overhead whatsoever.

  2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue.

That's all we need to do.

Don't get too excited -- our handling is still buggy on 32-bit
kernels.  There's nothing wrong with the SYSENTER code itself, but
the #DB prologue has a clever fixup for traps on the very first
instruction of entry_SYSENTER_32, and the fixup doesn't work quite
correctly.  The next two patches will fix that.

[1] We could probably prevent it by forcing BTF on at all times and
    making sure we clear TF before any branches in the SYSENTER
    code.  Needless to say, this is a bad idea.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-10 09:48:13 +01:00
Dan Williams
59e6473980 libnvdimm, pmem: clear poison on write
If a write is directed at a known bad block perform the following:

1/ write the data

2/ send a clear poison command

3/ invalidate the poison out of the cache hierarchy

Cc: <x86@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2016-03-09 15:15:32 -08:00
Paolo Bonzini
5a5fbdc0e3 KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
It is now equal to use_eager_fpu(), which simply tests a cpufeature bit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-09 14:04:36 +01:00
Paolo Bonzini
ab92f30875 KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
 - PMU support for guests
 - 32bit world switch rewritten in C
 - Various optimizations to the vgic save/restore code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW36xjAAoJECPQ0LrRPXpDGQkQAMDppzcTOixT3e8VPdHAX09a
 Z5PO0gyTMVV7Jyz5Ul3pedPJA2GSK9mxOCwqvIFbdxLAR6ZB00juO5FrTHkSdI91
 1XLPj4bKoMWcVvhL/g5A4Glp/pVMW1k/9Yq8zZAtYlsLRlqG5rLOutSadcqHcYaJ
 cTD/pFf7b2oPtkTPyoFml75KgHBT/8uvAvFDOWA66Id2z6T11+PsBT/6XnGDiwKg
 tpGTNzx3kPIKIzOAOHqVW6UBxFOeabebXLT8wUz3VwNn/UbG6gkumMNApMAyF2q1
 zU0nAh8+7Ek6Dr4OFWE6BfW6sgg/l7i1lA8XoAmqG7ZTrSptCc59fvaZJxPruG+Q
 dMsU6QgR77JJjbZTinf9a1jReZ/liZrx2gZXedVKdILrjmDSq0UnGcxjUOEDZOGy
 2/dbrlJhv+LhpcJtuPpxPCfoqbW5L0ynzmuYuXRdRz3lTHiOWIRx5gugrhO+wH4D
 4gvZhbw3XCiYfpYHYhl8A1EH5kanKgdXDocz9yIm7mZm89gngufF/HkeXS3ZU25T
 yThyBGulGjqN4FCdgf1HolkTfFjnfSx4qJovJ58eHga+HNLXRkTecZZcbFy2OOHv
 8Bx0PIlwj4RgSaRLWQUudAhdhKS2g22DKDDljxFwhkMPNghvqkYMJCRDKLu6GBXQ
 4YsLKM+TaShHFjSpx+ao
 =rpvb
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.6

- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code

Conflicts:
	include/uapi/linux/kvm.h
2016-03-09 11:50:42 +01:00
Linus Walleij
0bae2f1732 Merge branch 'ib-mfd-regulator-gpio-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into devel 2016-03-09 17:40:37 +07:00
David S. Miller
810813c47a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 12:34:12 -05:00
Tony Luck
92b0729c34 x86/mm, x86/mce: Add memcpy_mcsafe()
Make use of the EXTABLE_FAULT exception table entries to write
a kernel copy routine that doesn't crash the system if it
encounters a machine check. Prime use case for this is to copy
from large arrays of non-volatile memory used as storage.

We have to use an unrolled copy loop for now because current
hardware implementations treat a machine check in "rep mov"
as fatal. When that is fixed we can simplify.

Return type is a "bool". True means that we copied OK, false means
that it didn't.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@gmail.com>
Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 17:54:38 +01:00
Ingo Molnar
14ddde78c7 Merge branch 'linus' into x86/fpu, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 14:25:45 +01:00
Andy Lutomirski
58a5aac533 x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
x86_64 has very clean espfix handling on paravirt: espfix64 is set
up in native_iret, so paravirt systems that override iret bypass
espfix64 automatically.  This is robust and straightforward.

x86_32 is messier.  espfix is set up before the IRET paravirt patch
point, so it can't be directly conditionalized on whether we use
native_iret.  We also can't easily move it into native_iret without
regressing performance due to a bizarre consideration.  Specifically,
on 64-bit kernels, the logic is:

  if (regs->ss & 0x4)
          setup_espfix;

On 32-bit kernels, the logic is:

  if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 &&
      (regs->flags & X86_EFLAGS_VM) == 0)
          setup_espfix;

The performance of setup_espfix itself is essentially irrelevant, but
the comparison happens on every IRET so its performance matters.  On
x86_64, there's no need for any registers except flags to implement
the comparison, so we fold the whole thing into native_iret.  On
x86_32, we don't do that because we need a free register to
implement the comparison efficiently.  We therefore do espfix setup
before restoring registers on x86_32.

This patch gets rid of the explicit paravirt_enabled check by
introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE
to skip espfix on paravirt systems where iret != native_iret.  This is
also messy, but it's at least in line with other things we do.

This improves espfix performance by removing a branch, but no one
cares.  More importantly, it removes a paravirt_enabled user, which is
good because paravirt_enabled is ill-defined and is going away.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 14:16:44 +01:00
Paolo Bonzini
6bb69c9b69 KVM: MMU: simplify last_pte_bitmap
Branch-free code is fun and everybody knows how much Avi loves it,
but last_pte_bitmap takes it a bit to the extreme.  Since the code
is simply doing a range check, like

	(level == 1 ||
	 ((gpte & PT_PAGE_SIZE_MASK) && level < N)

we can make it branch-free without storing the entire truth table;
it is enough to cache N.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-08 12:33:38 +01:00
Denys Vlasenko
1a8aa8acab x86/apic: Deinline __default_send_IPI_*, save ~200 bytes
__default_send_IPI_shortcut: 49 bytes, 2 callsites
__default_send_IPI_dest_field: 108 bytes, 7 callsites

     text     data      bss       dec     hex filename
 96184086 20860488 36122624 153167198 921255e vmlinux_before
 96183823 20860520 36122624 153166967 9212477 vmlinux

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Borislav Petkov <bp@alien.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1457287876-6001-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:26:41 +01:00
Stephane Eranian
5690ae28e4 perf/x86/intel: Add definition for PT PMI bit
This patch adds a definition for GLOBAL_OVFL_STATUS bit 55
which is used with the Processor Trace (PT) feature.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 12:18:34 +01:00
Aravind Gopalakrishnan
ea2ca36b65 x86/mce/AMD: Document some functionality
In an attempt to aid in understanding of what the threshold_block
structure holds, provide comments to describe the members here. Also,
trim comments around threshold_restart_bank() and update copyright info.

No functional change is introduced.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Shorten comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-6-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:15 +01:00
Aravind Gopalakrishnan
2cd3b5f903 x86/mce: Clarify comments regarding deferred error
Deferred errors indicate errors that hardware could not fix. But it
still does not cause any interruption to program flow. So it does not
generate any #MC and UC bit in MCx_STATUS is not set.

Correct comment.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-5-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:15 +01:00
Aravind Gopalakrishnan
8dd1e17a55 x86/mce/AMD: Fix logic to obtain block address
In upcoming processors, the BLKPTR field is no longer used to indicate
the MSR number of the additional register. Insted, it simply indicates
the prescence of additional MSRs.

Fix the logic here to gather MSR address from MSR_AMD64_SMCA_MCx_MISC()
for newer processors and fall back to existing logic for older
processors.

[ Drop nextaddr_out label; style cleanups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-4-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Aravind Gopalakrishnan
be0aec23bf x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors
For Scalable MCA enabled processors, errors are listed per IP block. And
since it is not required for an IP to map to a particular bank, we need
to use HWID and McaType values from the MCx_IPID register to figure out
which IP a given bank represents.

We also have a new bit (TCC) in the MCx_STATUS register to indicate Task
context is corrupt.

Add logic here to decode errors from all known IP blocks for Fam17h
Model 00-0fh and to print TCC errors.

[ Minor fixups. ]
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Aravind Gopalakrishnan
adc53f2e0a x86/mce: Move MCx_CONFIG MSR definitions
Those MSRs are used only by the MCE code so move them there.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1456785179-14378-2-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:14 +01:00
Ingo Molnar
a1a8ba2d4a Merge branch 'linus' into ras/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-08 11:48:00 +01:00
Borislav Petkov
c041462217 x86/microcode/intel: Get rid of DWSIZE
sizeof(u32) is perfectly clear as it is.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1457345404-28884-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-03-08 09:08:44 +01:00
Christoph Hellwig
bc4b024a8b PCI: Move pci_dma_* helpers to common code
For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.

Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-07 10:40:02 -06:00
Ingo Molnar
ec87e1cf7d Linux 4.5-rc7
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW3LO0AAoJEHm+PkMAQRiGhewIAIVHA1+qSSXEHTFeuLRuYpiz
 +ptQUIjPJdakWm/XqOnwSG8SWUuD4XL6ysfNmLSZIdqXYBAPpAuwT1UA2FZhz0dN
 soZxMNleAvzHWRDFLqwjVdOVlTxS6CTTdEQNzi+3R0ZCADllsRcuj/GBIY+M8cr6
 LvxK8BnhDU+Au3gZQjaujTMO7fKG6gOq4wKz/U7RIG37A6rwW577kEfLg4ZgFwt9
 RVjsky5mrX9+4l3QFtox9ZC383P/0VZ6+vXwN2QH1/joDK4EvA8pCwsGTyjRJiqi
 fArHbS+mHyAtbPWJmDbVlQ5dkZJAqRgtWBydjQYoC16S4Bwdce2/FbhBiTgEQAo=
 =sqln
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc7' into x86/asm, to pick up SMAP fix

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-07 09:27:30 +01:00
Jiri Kosina
335e073faa klp: remove CONFIG_LIVEPATCH dependency from klp headers
There is no need for livepatch.h (generic and arch-specific) to depend
on CONFIG_LIVEPATCH. Remove that superfluous dependency.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:22:10 +01:00
Miroslav Benes
b24b78a113 klp: remove superfluous errors in asm/livepatch.h
There is an #error in asm/livepatch.h for both x86 and s390 in
!CONFIG_LIVEPATCH cases. It does not make much sense as pointed out by
Michael Ellerman. One can happily include asm/livepatch.h with
CONFIG_LIVEPATCH. Remove it as useless.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:19:26 +01:00
Christopher S. Hall
f9677e0f83 x86/tsc: Always Running Timer (ART) correlated clocksource
On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 1]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
fixes]
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-03-03 14:23:34 -08:00
Xiao Guangrong
13d268ca2c KVM: MMU: apply page track notifier
Register the notifier to receive write track event so that we can update
our shadow page table

It makes kvm_mmu_pte_write() be the callback of the notifier, no function
is changed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:24 +01:00
Xiao Guangrong
0eb05bf290 KVM: page track: add notifier support
Notifier list is introduced so that any node wants to receive the track
event can register to the list

Two APIs are introduced here:
- kvm_page_track_register_notifier(): register the notifier to receive
  track event

- kvm_page_track_unregister_notifier(): stop receiving track event by
  unregister the notifier

The callback, node->track_write() is called when a write access on the
write tracked page happens

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:22 +01:00
Xiao Guangrong
e5691a81e8 KVM: MMU: clear write-flooding on the fast path of tracked page
If the page fault is caused by write access on write tracked page, the
real shadow page walking is skipped, we lost the chance to clear write
flooding for the page structure current vcpu is using

Fix it by locklessly waking shadow page table to clear write flooding
on the shadow page structure out of mmu-lock. So that we change the
count to atomic_t

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:22 +01:00
Xiao Guangrong
3d0c27ad6e KVM: MMU: let page fault handler be aware tracked page
The page fault caused by write access on the write tracked page can not
be fixed, it always need to be emulated. page_fault_handle_page_track()
is the fast path we introduce here to skip holding mmu-lock and shadow
page table walking

However, if the page table is not present, it is worth making the page
table entry present and readonly to make the read access happy

mmu_need_write_protect() need to be cooked to avoid page becoming writable
when making page table present or sync/prefetch shadow page table entries

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:21 +01:00
Xiao Guangrong
f29d4d7810 KVM: page track: introduce kvm_slot_page_track_{add,remove}_page
These two functions are the user APIs:
- kvm_slot_page_track_add_page(): add the page to the tracking pool
  after that later specified access on that page will be tracked

- kvm_slot_page_track_remove_page(): remove the page from the tracking
  pool, the specified access on the page is not tracked after the last
  user is gone

Both of these are called under the protection both of mmu-lock and
kvm->srcu or kvm->slots_lock

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:21 +01:00
Xiao Guangrong
21ebbedadd KVM: page track: add the framework of guest page tracking
The array, gfn_track[mode][gfn], is introduced in memory slot for every
guest page, this is the tracking count for the gust page on different
modes. If the page is tracked then the count is increased, the page is
not tracked after the count reaches zero

We use 'unsigned short' as the tracking count which should be enough as
shadow page table only can use 2^14 (2^3 for level, 2^1 for cr4_pae, 2^2
for quadrant, 2^3 for access, 2^1 for nxe, 2^1 for cr0_wp, 2^1 for
smep_andnot_wp, 2^1 for smap_andnot_wp, and 2^1 for smm) at most, there
is enough room for other trackers

Two callbacks, kvm_page_track_create_memslot() and
kvm_page_track_free_memslot() are implemented in this patch, they are
internally used to initialize and reclaim the memory of the array

Currently, only write track mode is supported

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:20 +01:00
Xiao Guangrong
92f94f1e9e KVM: MMU: rename has_wrprotected_page to mmu_gfn_lpage_is_disallowed
kvm_lpage_info->write_count is used to detect if the large page mapping
for the gfn on the specified level is allowed, rename it to disallow_lpage
to reflect its purpose, also we rename has_wrprotected_page() to
mmu_gfn_lpage_is_disallowed() to make the code more clearer

Later we will extend this mechanism for page tracking: if the gfn is
tracked then large mapping for that gfn on any level is not allowed.
The new name is more straightforward

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-03-03 14:36:19 +01:00
Ingo Molnar
39a1142dbb Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into locking/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:55:22 +01:00
Thomas Gleixner
1f12e32f4c x86/topology: Create logical package id
For per package oriented services we must be able to rely on the number of CPU
packages to be within bounds. Create a tracking facility, which

- calculates the number of possible packages depending on nr_cpu_ids after boot

- makes sure that the package id is within the number of possible packages. If
  the apic id is outside we map it to a logical package id if there is enough
  space available.

Provide interfaces for drivers to query the mapping and do translations from
physcial to logical ids.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Harish Chegondi <harish.chegondi@intel.com>
Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:35:18 +01:00
Ingo Molnar
0a7348925f Linux 4.5-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1
 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG
 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl
 w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa
 v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM
 /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg=
 =tvkh
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:04:01 +01:00
Linus Torvalds
a9f8094aae PCI updates for v4.5:
Enumeration
     Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)
 
   Marvell MVEBU host bridge driver
     Restrict build to 32-bit ARM (Thierry Reding)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW0gP1AAoJEFmIoMA60/r84qkQAJXOFW20cie2yepQXIk7f5aN
 M2/+iFte8YHf4ZFgZWA/oS+mZAp1OqctSTjWg1KTPZsPHAiB6DkL7WOV6fK+uXr9
 fX8D7Ec2eLgeIFl78iSQaAht4kfmfz8f5LlU6Oi9kvQOt+35gp4lP834HClx7Jep
 XT2qZy/zUQy8GylTzRqueMBpXBCnBQR8iyaD8j4rmklQB3yLXaEMTs7HzwJKBmhM
 ZDnH1xrV5cWYb7niSCBkq4IomCmezJZCvxcDjh/Z8gjDKbVl7TLYOdU8Jh4wNO++
 ng0J8WDSKQJ9Hfv6H+5dgPzoqgrIrWb/Oz5GXd8i6cqv00szG5S/w8nHcO8LPSJv
 dJxxfTlz4KRxdv/sqOVW4cDFUmScODMkDMh+hAeEVYKl9ty5fQ4O2iNwNehzrdNj
 FRrgN1980amYN2n09NZNF863dvVN+DMJ4Ll2VT01rOIUH3bwt4cO6rVWrEUlEKCn
 DiSvJlXHm5nLLCQpkkGKAeq5hYl25DFtYVwLopIbUSHFXCASHPtQewDvgzfn9zYi
 M7J8bDa/uTscSqJsGsb4/gHLEblCfju7Pj2gEHoiK4XtbCuuamFA3nsA7lzcAG9j
 W5pVDQTqctdgHq/UMLKIeoBJ592fhYzKipY8vELOKwkieDR9F3g3u8nWt4ZAUIXE
 /oS5F1eWMkDMvdjyZO4C
 =yQ41
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "Enumeration:
    Revert x86 pcibios_alloc_irq() to fix regression (Bjorn Helgaas)

  Marvell MVEBU host bridge driver:
    Restrict build to 32-bit ARM (Thierry Reding)"

* tag 'pci-v4.5-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: mvebu: Restrict build to 32-bit ARM
  Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
  Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"
  Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"
2016-02-27 12:33:42 -08:00
Bjorn Helgaas
6c777e8799 Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"
991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug.

Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not
v4.3 and bisected it to 991de2e590.  Sunjin reported that the RocketRAID
272x driver worked in v4.2 but not v4.3.  In both cases booting with
"pci=routirq" is a workaround.

I think the problem is that after 991de2e590, we no longer call
pcibios_enable_irq() for upstream bridges.  Prior to 991de2e590, when a
driver called pci_enable_device(), we recursively called
pcibios_enable_irq() for upstream bridges via pci_enable_bridge().

After 991de2e590, we call pcibios_enable_irq() from pci_device_probe()
instead of the pci_enable_device() path, which does *not* call
pcibios_enable_irq() for upstream bridges.

Revert 991de2e590 to fix these driver regressions.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
Fixes: 991de2e590 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
Reported-and-tested-by: Олег Мороз <oleg.moroz@mcc.vniiem.ru>
Reported-by: Sunjin Yang <fan4326@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
CC: Jiang Liu <jiang.liu@linux.intel.com>
2016-02-27 08:52:20 -06:00
Nicolai Stange
d89abe2a1f arch/x86/irq: Purge useless handler declarations from hw_irq.h
arch/x86/include/asm/hw_irq.h contains declarations for the C-level handlers
called into directly from the IDT-referenced assembly stubs. These
declarations are never used as they are referenced from assembly only.

Furthermore, these declarations got their attributes wrong: there is no
'__irqentry' (parameter passing via stack) attached to them.

Also, the list of declarations isn't complete: none of the tracing-capable
variants is declared, for example.

Purge the handler declarations.

Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 10:39:15 +01:00
Ingo Molnar
319e305ca4 Merge branch 'ras/core' into core/objtool, to pick up the new exception table format
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25 09:01:09 +01:00
Ingo Molnar
c0853867a1 Merge branch 'x86/debug' into core/objtool, to pick up frame pointer fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-25 09:00:38 +01:00
Ard Biesheuvel
48fcb2d021 efi: stub: use high allocation for converted command line
Before we can move the command line processing before the allocation
of the kernel, which is required for detecting the 'nokaslr' option
which controls that allocation, move the converted command line higher
up in memory, to prevent it from interfering with the kernel itself.

Since x86 needs the address to fit in 32 bits, use UINT_MAX as the upper
bound there. Otherwise, use ULONG_MAX (i.e., no limit)

Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-24 14:57:28 +00:00
Adam Buchbinder
6a6256f9e0 x86: Fix misspellings in comments
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:44:58 +01:00
Josh Poimboeuf
821eae7d14 sched/x86: Add stack frame dependency to __preempt_schedule[_notrace]()
If __preempt_schedule() or __preempt_schedule_notrace() is referenced at
the beginning of a function, gcc can insert the asm inline "call
___preempt_schedule[_notrace]" instruction before setting up a stack
frame, which breaks frame pointer convention if CONFIG_FRAME_POINTER is
enabled and can result in bad stack traces.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the inline asm
statements.

Specifically this fixes the following stacktool warnings:

  stacktool: drivers/scsi/hpsa.o: hpsa_scsi_do_simple_cmd.constprop.106()+0x79: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x70: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_find_first()+0x92: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xff: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0xf5: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_free()+0x11a: call without frame pointer save/setup
  stacktool: fs/mbcache.o: mb_cache_entry_get()+0x225: call without frame pointer save/setup
  stacktool: kernel/locking/percpu-rwsem.o: percpu_up_read()+0x27: call without frame pointer save/setup
  stacktool: kernel/profile.o: do_profile_hits.isra.5()+0x139: call without frame pointer save/setup
  stacktool: lib/nmi_backtrace.o: nmi_trigger_all_cpu_backtrace()+0x2b6: call without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_recv()+0x58: call without frame pointer save/setup
  stacktool: net/rds/ib_cm.o: rds_ib_cq_comp_handler_send()+0x58: call without frame pointer save/setup
  stacktool: net/rds/ib_recv.o: rds_ib_attempt_ack()+0xc1: call without frame pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_attempt_ack()+0xc1: call without frame pointer save/setup
  stacktool: net/rds/iw_recv.o: rds_iw_recv_cq_comp_handler()+0x55: call without frame pointer save/setup

So it only adds a stack frame to 15 call sites out of ~5000 calls to
___preempt_schedule[_notrace]().  All the others already had stack frames.

Oddly, this change actually seems to make things faster in a lot of
cases.  For many smaller functions it causes the stack frame creation to
get moved out of the common path and into the unlikely path.

For example, here's the original cyc2ns_read_end():

  ffffffff8101f8c0 <cyc2ns_read_end>:
  ffffffff8101f8c0:	55                   	push   %rbp
  ffffffff8101f8c1:	48 89 e5             	mov    %rsp,%rbp
  ffffffff8101f8c4:	83 6f 10 01          	subl   $0x1,0x10(%rdi)
  ffffffff8101f8c8:	75 08                	jne    ffffffff8101f8d2 <cyc2ns_read_end+0x12>
  ffffffff8101f8ca:	65 48 89 3d e6 5a ff 	mov    %rdi,%gs:0x7eff5ae6(%rip)        # 153b8 <cyc2ns+0x38>
  ffffffff8101f8d1:	7e
  ffffffff8101f8d2:	65 ff 0d 77 c4 fe 7e 	decl   %gs:0x7efec477(%rip)        # bd50 <__preempt_count>
  ffffffff8101f8d9:	74 02                	je     ffffffff8101f8dd <cyc2ns_read_end+0x1d>
  ffffffff8101f8db:	5d                   	pop    %rbp
  ffffffff8101f8dc:	c3                   	retq
  ffffffff8101f8dd:	e8 1e 37 fe ff       	callq  ffffffff81003000 <___preempt_schedule>
  ffffffff8101f8e2:	5d                   	pop    %rbp
  ffffffff8101f8e3:	c3                   	retq
  ffffffff8101f8e4:	66 66 66 2e 0f 1f 84 	data16 data16 nopw %cs:0x0(%rax,%rax,1)
  ffffffff8101f8eb:	00 00 00 00 00

And here's the same function with the patch:

  ffffffff8101f8c0 <cyc2ns_read_end>:
  ffffffff8101f8c0:	83 6f 10 01          	subl   $0x1,0x10(%rdi)
  ffffffff8101f8c4:	75 08                	jne    ffffffff8101f8ce <cyc2ns_read_end+0xe>
  ffffffff8101f8c6:	65 48 89 3d ea 5a ff 	mov    %rdi,%gs:0x7eff5aea(%rip)        # 153b8 <cyc2ns+0x38>
  ffffffff8101f8cd:	7e
  ffffffff8101f8ce:	65 ff 0d 7b c4 fe 7e 	decl   %gs:0x7efec47b(%rip)        # bd50 <__preempt_count>
  ffffffff8101f8d5:	74 01                	je     ffffffff8101f8d8 <cyc2ns_read_end+0x18>
  ffffffff8101f8d7:	c3                   	retq
  ffffffff8101f8d8:	55                   	push   %rbp
  ffffffff8101f8d9:	48 89 e5             	mov    %rsp,%rbp
  ffffffff8101f8dc:	e8 1f 37 fe ff       	callq  ffffffff81003000 <___preempt_schedule>
  ffffffff8101f8e1:	5d                   	pop    %rbp
  ffffffff8101f8e2:	c3                   	retq
  ffffffff8101f8e3:	66 66 66 66 2e 0f 1f 	data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
  ffffffff8101f8ea:	84 00 00 00 00 00

Notice that it moved the frame pointer setup code to the unlikely
___preempt_schedule() call path.  Going through a sampling of the
differences in the asm, that's the most common change I see.

Otherwise it has no real effect on callers which already have stack
frames (though it does result in the reordering of some 'mov's).

Reported-by: Jiri Slaby <jslaby@suse.cz>
Tested-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/20160218174158.GA28230@treble.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:45 +01:00
Josh Poimboeuf
16df4ff860 x86/locking: Create stack frame in PV unlock
The assembly PV_UNLOCK function is a callable non-leaf function which
doesn't honor CONFIG_FRAME_POINTER, which can result in bad stack
traces.

Create a stack frame when CONFIG_FRAME_POINTER is enabled.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/6685a72ddbbd0ad3694337cca0af4b4ea09f5f40.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:44 +01:00
Chris J Arges
f05058c4d6 x86/uaccess: Add stack frame output operand in get_user() inline asm
Numerous 'call without frame pointer save/setup' warnings are introduced
by stacktool because of functions using the get_user() macro. Bad stack
traces could occur due to lack of or misplacement of stack frame setup
code.

This patch forces a stack frame to be created before the inline asm code
if CONFIG_FRAME_POINTER is enabled by listing the stack pointer as an
output operand for the get_user() inline assembly statement.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/bc85501f221ee512670797c7f110022e64b12c81.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:43 +01:00
Josh Poimboeuf
87b240cbe3 x86/paravirt: Create a stack frame in PV_CALLEE_SAVE_REGS_THUNK
A function created with the PV_CALLEE_SAVE_REGS_THUNK macro doesn't set
up a new stack frame before the call instruction, which breaks frame
pointer convention if CONFIG_FRAME_POINTER is enabled and can result in
a bad stack trace.  Also, the thunk functions aren't annotated as ELF
callable functions.

Create a stack frame when CONFIG_FRAME_POINTER is enabled and add the
ELF function type.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/a2cad74e87c4aba7fd0f54a1af312e66a824a575.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:42 +01:00
Josh Poimboeuf
bb93eb4cd6 x86/paravirt: Add stack frame dependency to PVOP inline asm calls
If a PVOP call macro is inlined at the beginning of a function, gcc can
insert the call instruction before setting up a stack frame, which
breaks frame pointer convention if CONFIG_FRAME_POINTER is enabled and
can result in a bad stack trace.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the PVOP inline asm
statements.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/6a13e48c5a8cf2de1aa112ae2d4c0ac194096282.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:42 +01:00
Josh Poimboeuf
0e8e2238b5 x86/xen: Add stack frame dependency to hypercall inline asm calls
If a hypercall is inlined at the beginning of a function, gcc can insert
the call instruction before setting up a stack frame, which breaks frame
pointer convention if CONFIG_FRAME_POINTER is enabled and can result in
a bad stack trace.

Force a stack frame to be created if CONFIG_FRAME_POINTER is enabled by
listing the stack pointer as an output operand for the hypercall inline
asm statements.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: live-patching@vger.kernel.org
Link: http://lkml.kernel.org/r/c6face5a46713108bded9c4c103637222abc4528.1453405861.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-24 08:35:41 +01:00
Linus Torvalds
de9e478b9d x86: fix SMAP in 32-bit environments
In commit 11f1a4b975 ("x86: reorganize SMAP handling in user space
accesses") I changed how the stac/clac instructions were generated
around the user space accesses, which then made it possible to do
batched accesses efficiently for user string copies etc.

However, in doing so, I completely spaced out, and didn't even think
about the 32-bit case.  And nobody really even seemed to notice, because
SMAP doesn't even exist until modern Skylake processors, and you'd have
to be crazy to run 32-bit kernels on a modern CPU.

Which brings us to Andy Lutomirski.

He actually tested the 32-bit kernel on new hardware, and noticed that
it doesn't work.  My bad.  The trivial fix is to add the required
uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.

I feel a bit bad about this patch, just because that header file really
should be cleaned up to avoid all the duplicated code in it, and this
commit just expands on the problem.  But this just fixes the bug without
any bigger cleanup surgery.

Reported-and-tested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-02-23 16:25:20 -08:00
Bryan O'Donoghue
c637fa5294 x86/platform/intel/quark: Drop IMR lock bit support
Isolated Memory Regions support a lock bit. The lock bit in an IMR prevents
modification of the IMR until the core goes through a warm or cold reset.
The lock bit feature is not useful in the context of the kernel API and is
not really necessary since modification of IMRs is possible only from
ring-zero anyway. This patch drops support for IMR locks bits, it
simplifies the kernel API and removes an unnecessary and needlessly complex
feature.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Bryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andriy.shevchenko@linux.intel.com
Cc: boon.leong.ong@intel.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/1456190999-12685-3-git-send-email-pure.logic@nexus-software.ie
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-23 07:37:23 +01:00
David S. Miller
b633353115 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23 00:09:14 -05:00
Linus Torvalds
692b8c663c Xen bug fixes for 4.5-rc5
- Two scsiback fixes (resource leak and spurious warning).
 - Fix DMA mapping of compound pages on arm/arm64.
 - Fix some pciback regressions in MSI-X handling.
 - Fix a pcifront crash due to some uninitialize state.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWyvatAAoJEFxbo/MsZsTRBFcH+wWnv0/N+gKib3cKCI4lwmTg
 n8iVgf8dNWwD36M2s/OlzCAglAIt8Xr6ySNvPqTerpm7lT9yXlIVQxGXTbIGuTAA
 h8Kt8WiC0BNLHHlLxBuCz62KR47DvMhsr84lFURE8FmpUiulFjXmRcbrZkHIMYRS
 l/X+xJWO1vxwrSYho0P9n3ksTWHm488DTPvZz3ICNI2G2sndDfbT3gv3tMDaQhcX
 ZaQR93vtIoldqk29Ga59vaVtksbgxHZIbasY9PQ8rqOxHJpDQbPzpjocoLxAzf50
 cioQVyKQ7i9vUvZ+B3TTAOhxisA2hDwNhLGQzmjgxe2TXeKdo3yjYwO6m1dDBzY=
 =VY/S
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Two scsiback fixes (resource leak and spurious warning).

 - Fix DMA mapping of compound pages on arm/arm64.

 - Fix some pciback regressions in MSI-X handling.

 - Fix a pcifront crash due to some uninitialize state.

* tag 'for-linus-4.5-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
  xen/pcifront: Report the errors better.
  xen/pciback: Save the number of MSI-X entries to be copied later.
  xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
  xen: fix potential integer overflow in queue_reply
  xen/arm: correctly handle DMA mapping of compound pages
  xen/scsiback: avoid warnings when adding multiple LUNs to a domain
  xen/scsiback: correct frontend counting
2016-02-22 13:57:01 -08:00
Kees Cook
9ccaf77cf0 x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
This removes the CONFIG_DEBUG_RODATA option and makes it always enabled.

This simplifies the code and also makes it clearer that read-only mapped
memory is just as fundamental a security feature in kernel-space as it is
in user-space.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Brown <david.brown@linaro.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1455748879-21872-4-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:51:38 +01:00
Kees Cook
e267d97b83 asm-generic: Consolidate mark_rodata_ro()
Instead of defining mark_rodata_ro() in each architecture, consolidate it.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Gross <agross@codeaurora.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashok Kumar <ashoks@broadcom.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Brown <david.brown@linaro.org>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Link: http://lkml.kernel.org/r/1455748879-21872-2-git-send-email-keescook@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:51:37 +01:00
Sai Praneeth
6d0cc887d5 x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables
Now that we have EFI memory region bits that indicate which regions do
not need execute permission or read/write permission in the page tables,
let's use them.

We also check for EFI_NX_PE_DATA and only enforce the restrictive
mappings if it's present (to allow us to ignore buggy firmware that sets
bits it didn't mean to and to preserve backwards compatibility).

Instead of assuming that firmware would set appropriate attributes in
memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for
data, we can expect some firmware out there which might only set *type*
in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or
EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to
improper mappings of EFI runtime regions. In order to avoid it, we check
attribute and type of memory descriptor to update mappings and moreover
Windows works this way.

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:26:28 +01:00
Ingo Molnar
ab876728a9 Linux 4.5-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWyN0eAAoJEHm+PkMAQRiGqIAIAKKodaqX5ACJhTRozj3GN5iV
 dDHU/SJQj4nIyJecaCVAJIBa3gvAX6GyY+Jg4JKJ4TKAdR0Hd/3EwOWIR+0+BQIM
 0MqmB0CRLzq42AOQtpDUdwB+OTE8jFQFQd2gFKuQYJJ61ppykCC36OWV0bTfQLSV
 b2esO4Ry6eoQnDMw8oT52ncUIZEvQ2DZE3L6tNDEPD/0je14GWkV1Fx1+X2jb9cB
 diFA2TmaEEXMHNT1NCLSQ+D7QefXV3mFl85leNlFi5QQNy7ZdSh7kvvOodMQ2uAS
 qa9V8Uk6LZYv5O71+Jr5Rmlqh3GxNRCMXu2tlMd2gtw8ApEvBw6XoL5YZYE13Lk=
 =3HMg
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc5' into efi/core, before queueing up new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-22 08:26:05 +01:00
Alexei Starovoitov
568b329a02 perf: generalize perf_callchain
. avoid walking the stack when there is no room left in the buffer
. generalize get_perf_callchain() to be called from bpf helper

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-20 00:21:44 -05:00
Linus Torvalds
705d43dbe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - regression (from 4.4) fix for ordering issue, introduced by an
   earlier ftrace change, that broke live patching of modules.

   The fix replaces the ftrace module notifier by direct call in order
   to make the ordering guaranteed and well-defined.  The patch, from
   Jessica Yu, has been acked both by Steven and Rusty

 - error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  ftrace/module: remove ftrace module notifier
  livepatch: change the error message in asm/livepatch.h header files
2016-02-18 16:34:15 -08:00
Dave Hansen
62b5f7d013 mm/core, x86/mm/pkeys: Add execute-only protection keys support
Protection keys provide new page-based protection in hardware.
But, they have an interesting attribute: they only affect data
accesses and never affect instruction fetches.  That means that
if we set up some memory which is set as "access-disabled" via
protection keys, we can still execute from it.

This patch uses protection keys to set up mappings to do just that.
If a user calls:

	mmap(..., PROT_EXEC);
or
	mprotect(ptr, sz, PROT_EXEC);

(note PROT_EXEC-only without PROT_READ/WRITE), the kernel will
notice this, and set a special protection key on the memory.  It
also sets the appropriate bits in the Protection Keys User Rights
(PKRU) register so that the memory becomes unreadable and
unwritable.

I haven't found any userspace that does this today.  With this
facility in place, we expect userspace to move to use it
eventually.  Userspace _could_ start doing this today.  Any
PROT_EXEC calls get converted to PROT_READ inside the kernel, and
would transparently be upgraded to "true" PROT_EXEC with this
code.  IOW, userspace never has to do any PROT_EXEC runtime
detection.

This feature provides enhanced protection against leaking
executable memory contents.  This helps thwart attacks which are
attempting to find ROP gadgets on the fly.

But, the security provided by this approach is not comprehensive.
The PKRU register which controls access permissions is a normal
user register writable from unprivileged userspace.  An attacker
who can execute the 'wrpkru' instruction can easily disable the
protection provided by this feature.

The protection key that is used for execute-only support is
permanently dedicated at compile time.  This is fine for now
because there is currently no API to set a protection key other
than this one.

Despite there being a constant PKRU value across the entire
system, we do not set it unless this feature is in use in a
process.  That is to preserve the PKRU XSAVE 'init state',
which can lead to faster context switches.

PKRU *is* a user register and the kernel is modifying it.  That
means that code doing:

	pkru = rdpkru()
	pkru |= 0x100;
	mmap(..., PROT_EXEC);
	wrpkru(pkru);

could lose the bits in PKRU that enforce execute-only
permissions.  To avoid this, we suggest avoiding ever calling
mmap() or mprotect() when the PKRU value is expected to be
unstable.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Piotr Kwapulinski <kwapulinski.piotr@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: keescook@google.com
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210240.CB4BB5CA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:33 +01:00
Dave Hansen
8459429693 x86/mm/pkeys: Allow kernel to modify user pkey rights register
The Protection Key Rights for User memory (PKRU) is a 32-bit
user-accessible register.  It contains two bits for each
protection key: one to write-disable (WD) access to memory
covered by the key and another to access-disable (AD).

Userspace can read/write the register with the RDPKRU and WRPKRU
instructions.  But, the register is saved and restored with the
XSAVE family of instructions, which means we have to treat it
like a floating point register.

The kernel needs to write to the register if it wants to
implement execute-only memory or if it implements a system call
to change PKRU.

To do this, we need to create a 'pkru_state' buffer, read the old
contents in to it, modify it, and then tell the FPU code that
there is modified data in there so it can (possibly) move the
buffer back in to the registers.

This uses the fpu__xfeature_set_state() function that we defined
in the previous patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210236.0BE13217@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:32 +01:00
Dave Hansen
b8b9b6ba9d x86/fpu: Allow setting of XSAVE state
We want to modify the Protection Key rights inside the kernel, so
we need to change PKRU's contents.  But, if we do a plain
'wrpkru', when we return to userspace we might do an XRSTOR and
wipe out the kernel's 'wrpkru'.  So, we need to go after PKRU in
the xsave buffer.

We do this by:

  1. Ensuring that we have the XSAVE registers (fpregs) in the
     kernel FPU buffer (fpstate)
  2. Looking up the location of a given state in the buffer
  3. Filling in the stat
  4. Ensuring that the hardware knows that state is present there
     (basically that the 'init optimization' is not in place).
  5. Copying the newly-modified state back to the registers if
     necessary.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210235.5A3139BF@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:32 +01:00
Dave Hansen
39a0526fb3 x86/mm: Factor out LDT init from context init
The arch-specific mm_context_t is a great place to put
protection-key allocation state.

But, we need to initialize the allocation state because pkey 0 is
always "allocated".  All of the runtime initialization of
mm_context_t is done in *_ldt() manipulation functions.  This
renames the existing LDT functions like this:

	init_new_context() -> init_new_context_ldt()
	destroy_context() -> destroy_context_ldt()

and makes init_new_context() and destroy_context() available for
generic use.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210234.DB34FCC5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:31 +01:00
Dave Hansen
66d375709d mm/core, x86/mm/pkeys: Add arch_validate_pkey()
The syscall-level code is passed a protection key and need to
return an appropriate error code if the protection key is bogus.
We will be using this in subsequent patches.

Note that this also begins a series of arch-specific calls that
we need to expose in otherwise arch-independent code.  We create
a linux/pkeys.h header where we will put *all* the stubs for
these functions.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210232.774EEAAB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:31 +01:00
Dave Hansen
d61172b4b6 mm/core, x86/mm/pkeys: Differentiate instruction fetches
As discussed earlier, we attempt to enforce protection keys in
software.

However, the code checks all faults to ensure that they are not
violating protection key permissions.  It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.

But, there is a third category of faults for protection keys:
instruction faults.  Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.

So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.

We also add a new FAULT_FLAG_INSTRUCTION.  This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:29 +01:00
Dave Hansen
1b2ee1266e mm/core: Do not enforce PKEY permissions on remote mm access
We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.

Thanks to Jerome Glisse for pointing out this scenario.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:28 +01:00
Dave Hansen
33a709b25a mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

	mprotect(ptr, size, PROT_NONE);
	*ptr = foo;

they see the same effects with protection keys when they do this:

	mprotect(ptr, size, PROT_READ|PROT_WRITE);
	set_pkey(ptr, size, 4);
	wrpkru(0xffffff3f); // access disable pkey 4
	*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

	arch_pte_access_permitted(pte_flags, write)
	arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:32:44 +01:00
Dave Hansen
a927cb83f3 x86/mm/pkeys: Add functions to fetch PKRU
This adds the raw instruction to access PKRU as well as some
accessor functions that correctly handle when the CPU does not
support the instruction.  We don't use it here, but we will use
read_pkru() in the next patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210215.15238D34@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:32:43 +01:00
Dave Hansen
019132ff3d x86/mm/pkeys: Fill in pkey field in siginfo
This fills in the new siginfo field: si_pkey to indicate to
userspace which protection key was set on the PTE that we faulted
on.

Note though that *ALL* protection key faults have to be generated
by a valid, present PTE at some point.  But this code does no PTE
lookups which seeds odd.  The reason is that we take advantage of
the way we generate PTEs from VMAs.  All PTEs under a VMA share
some attributes.  For instance, they are _all_ either PROT_READ
*OR* PROT_NONE.  They also always share a protection key, so we
never have to walk the page tables; we just use the VMA.

Note that _pkey is a 64-bit value.  The current hardware only
supports 4-bit protection keys.  We do this because there is
_plenty_ of space in _sigfault and it is possible that future
processors would support more than 4 bits of protection keys.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210213.ABC488FA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:32:43 +01:00
Dave Hansen
8f62c88322 x86/mm/pkeys: Add arch-specific VMA protection bits
Lots of things seem to do:

        vma->vm_page_prot = vm_get_page_prot(flags);

and the ptes get created right from things we pull out
of ->vm_page_prot.  So it is very convenient if we can
store the protection key in flags and vm_page_prot, just
like the existing permission bits (_PAGE_RW/PRESENT).  It
greatly reduces the amount of plumbing and arch-specific
hacking we have to do in generic code.

This also takes the new PROT_PKEY{0,1,2,3} flags and
turns *those* in to VM_ flags for vma->vm_flags.

The protection key values are stored in 4 places:
	1. "prot" argument to system calls
	2. vma->vm_flags, filled from the mmap "prot"
	3. vma->vm_page prot, filled from vma->vm_flags
	4. the PTE itself.

The pseudocode for these for steps are as follows:

	mmap(PROT_PKEY*)
	vma->vm_flags 	  = ... | arch_calc_vm_prot_bits(mmap_prot);
	vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags);
	pte = pfn | vma->vm_page_prot

Note that this provides a new definitions for x86:

	arch_vm_get_page_prot()

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:31:51 +01:00
Dave Hansen
5c1d90f510 x86/mm/pkeys: Add PTE bits for storing protection key
Previous documentation has referred to these 4 bits as "ignored".
That means that software could have made use of them.  But, as
far as I know, the kernel never used them.

They are still ignored when protection keys is not enabled, so
they could theoretically still get used for software purposes.

We also implement "empty" versions so that code that references
to them can be optimized away by the compiler when the config
option is not enabled.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210205.81E33ED6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:31:44 +01:00
Tony Luck
0f68c088c0 x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
The Intel Software Developer Manual describes bit 24 in the MCG_CAP
MSR:

   MCG_SER_P (software error recovery support present) flag,
   bit 24 — Indicates (when set) that the processor supports
   software error recovery

But only some models with this capability bit set will actually
generate recoverable machine checks.

Check the model name and set a synthetic capability bit. Provide
a command line option to set this bit anyway in case the kernel
doesn't recognise the model name.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/2e5bfb23c89800a036fb8a45fa97a74bb16bc362.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:28:47 +01:00
Ingo Molnar
3a2f2ac9b9 Merge branch 'x86/urgent' into x86/asm, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:28:03 +01:00
Tony Luck
548acf1923 x86/mm: Expand the exception table logic to allow new handling options
Huge amounts of help from  Andy Lutomirski and Borislav Petkov to
produce this. Andy provided the inspiration to add classes to the
exception table with a clever bit-squeezing trick, Boris pointed
out how much cleaner it would all be if we just had a new field.

Linus Torvalds blessed the expansion with:

  ' I'd rather not be clever in order to save just a tiny amount of space
    in the exception table, which isn't really criticial for anybody. '

The third field is another relative function pointer, this one to a
handler that executes the actions.

We start out with three handlers:

 1: Legacy - just jumps the to fixup IP
 2: Fault - provide the trap number in %ax to the fixup code
 3: Cleaned up legacy for the uaccess error hack

Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f6af78fcbd348cf4939875cfda9c19689b5e50b8.1455732970.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:21:46 +01:00
Ingo Molnar
061f817eb6 Linux 4.5-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWwOwYAAoJEHm+PkMAQRiGjQMH/jy+GLa3KSe/dgdh3IMnhV1Q
 kAFJ6n7o9n7iYtlbrDeDj4eulktuMpsCjFP3nubG35GvnFPyKMg3BlMhx1oZUQcB
 NiN8ulMXZQepmSjART1SxCgMkE7uRg9j61tEDjjSzCX4PbpbjSm50mjefjqI2pAh
 bkjHiepQ4maKlHqCf5S4rtofML0FTt1HXtUIPwlEMlB0pVrRO7yNLp6QuK43caft
 hAn/jZq/c5m7ykggLJOb3EBTlZD7TdTm9KduTqZUBLebwYbwbZLuXLZq3S5IvcHm
 01+MNwD7VsW4p0R+p3WBP0+MdgPo3zlifCQRnUP22DSboXiDVAAsukrj/DiN+DA=
 =qnLX
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc4' into ras/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:03:57 +01:00
Ingo Molnar
9109dc97b0 Merge branch 'perf/urgent' into perf/core, to queue up dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 10:37:36 +01:00
Borislav Petkov
053080a9d1 x86/msr: Document msr-index.h rule for addition
In order to keep this file's size sensible and not cause too much
unnecessary churn, make the rule explicit - similar to pci_ids.h - that
only MSRs which are used in multiple compilation units, should get added
to it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: alex.williamson@redhat.com
Cc: gleb@kernel.org
Cc: joro@8bytes.org
Cc: kvm@vger.kernel.org
Cc: sherry.hurwitz@amd.com
Cc: wei@redhat.com
Link: http://lkml.kernel.org/r/1455612202-14414-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 08:47:55 +01:00
Andy Lutomirski
6c25da5ad5 x86/signal/64: Re-add support for SS in the 64-bit signal context
This is a second attempt to make the improvements from c6f2062935
("x86/signal/64: Fix SS handling for signals delivered to 64-bit
programs"), which was reverted by 51adbfbba5c6 ("x86/signal/64: Add
support for SS in the 64-bit signal context").

This adds two new uc_flags flags.  UC_SIGCONTEXT_SS will be set for
all 64-bit signals (including x32).  It indicates that the saved SS
field is valid and that the kernel supports the new behavior.

The goal is to fix a problems with signal handling in 64-bit tasks:
SS wasn't saved in the 64-bit signal context, making it awkward to
determine what SS was at the time of signal delivery and making it
impossible to return to a non-flat SS (as calling sigreturn clobbers
SS).

This also made it extremely difficult for 64-bit tasks to return to
fully-defined 16-bit contexts, because only the kernel can easily do
espfix64, but sigreturn was unable to set a non-flag SS:ESP.
(DOSEMU has a monstrous hack to partially work around this
limitation.)

If we could go back in time, the correct fix would be to make 64-bit
signals work just like 32-bit signals with respect to SS: save it
in signal context, reset it when delivering a signal, and restore
it in sigreturn.

Unfortunately, doing that (as I tried originally) breaks DOSEMU:
DOSEMU wouldn't reset the signal context's SS when clearing the LDT
and changing the saved CS to 64-bit mode, since it predates the SS
context field existing in the first place.

This patch is a bit more complicated, and it tries to balance a
bunch of goals.  It makes most cases of changing ucontext->ss during
signal handling work as expected.

I do this by special-casing the interesting case.  On sigreturn,
ucontext->ss will be honored by default, unless the ucontext was
created from scratch by an old program and had a 64-bit CS
(unfortunately, CRIU can do this) or was the result of changing a
32-bit signal context to 64-bit without resetting SS (as DOSEMU
does).

For the benefit of new 64-bit software that uses segmentation (new
versions of DOSEMU might), the new behavior can be detected with a
new ucontext flag UC_SIGCONTEXT_SS.

To avoid compilation issues, __pad0 is left as an alias for ss in
ucontext.

The nitty-gritty details are documented in the header file.

This patch also re-enables the sigreturn_64 and ldt_gdt_64 selftests,
as the kernel change allows both of them to pass.

Tested-by: Stas Sergeev <stsp@list.ru>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/749149cbfc3e75cd7fcdad69a854b399d792cc6f.1455664054.git.luto@kernel.org
[ Small readability edit. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 08:32:11 +01:00
Andy Lutomirski
8ff5bd2e1e x86/signal/64: Fix SS if needed when delivering a 64-bit signal
Signals are always delivered to 64-bit tasks with CS set to a long
mode segment.  In long mode, SS doesn't matter as long as it's a
present writable segment.

If SS starts out invalid (this can happen if the signal was caused
by an IRET fault or was delivered on the way out of set_thread_area
or modify_ldt), then IRET to the signal handler can fail, eventually
killing the task.

The straightforward fix would be to simply reset SS when delivering
a signal.  That breaks DOSEMU, though: 64-bit builds of DOSEMU rely
on SS being set to the faulting SS when signals are delivered.

As a compromise, this patch leaves SS alone so long as it's valid.

The net effect should be that the behavior of successfully delivered
signals is unchanged.  Some signals that would previously have
failed to be delivered will now be delivered successfully.

This has no effect for x32 or 32-bit tasks: their signal handlers
were already called with SS == __USER_DS.

(On Xen, there's a slight hole: if a task sets SS to a writable
 *kernel* data segment, then we will fail to identify it as invalid
 and we'll still kill the task.  If anyone cares, this could be fixed
 with a new paravirt hook.)

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stas Sergeev <stsp@list.ru>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/163c6e1eacde41388f3ff4d2fe6769be651d7b6e.1455664054.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-17 08:32:11 +01:00
Len Brown
670e27d809 x86 msr-index: Simplify syntax for HWP fields
syntax only, no functional change

Signed-off-by: Len Brown <len.brown@intel.com>
2016-02-17 01:42:26 -05:00
Jake Oshins
92016ba5c1 PCI: Add fwnode_handle to x86 pci_sysdata
Add an fwnode_handle to the x86 struct pci_sysdata, which will be used to
locate an IRQ domain associated with a root PCI bus.

[bhelgaas: changelog]
Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-02-16 16:48:11 -06:00
Dave Hansen
c8df400984 x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures
The protection keys register (PKRU) is saved and restored using
xsave.  Define the data structure that we will use to access it
inside the xsave buffer.

Note that we also have to widen the printk of the xsave feature
masks since this is feature 0x200 and we only did two characters
before.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210204.56DF8F7B@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:14 +01:00
Dave Hansen
dfb4a70f20 x86/cpufeature, x86/mm/pkeys: Add protection keys related CPUID definitions
There are two CPUID bits for protection keys.  One is for whether
the CPU contains the feature, and the other will appear set once
the OS enables protection keys.  Specifically:

	Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable
	Protection keys (and the RDPKRU/WRPKRU instructions)

This is because userspace can not see CR4 contents, but it can
see CPUID contents.

X86_FEATURE_PKU is referred to as "PKU" in the hardware documentation:

	CPUID.(EAX=07H,ECX=0H):ECX.PKU [bit 3]

X86_FEATURE_OSPKE is "OSPKU":

	CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4]

These are the first CPU features which need to look at the
ECX word in CPUID leaf 0x7, so this patch also includes
fetching that word in to the cpuinfo->x86_capability[] array.

Add it to the disabled-features mask when its config option is
off.  Even though we are not using it here, we also extend the
REQUIRED_MASK_BIT_SET() macro to keep it mirroring the
DISABLED_MASK_BIT_SET() version.

This means that in almost all code, you should use:

	cpu_has(c, X86_FEATURE_PKU)

and *not* the CONFIG option.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210201.7714C250@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:13 +01:00
Dave Hansen
1f96b1efba x86/fpu: Add placeholder for 'Processor Trace' XSAVE state
There is an XSAVE state component for Intel Processor Trace (PT).
But, we do not currently use it.

We add a placeholder in the code for it so it is not a mystery and
also so we do not need an explicit enum initialization for Protection
Keys in a moment.

Why don't we use it?

We might end up using this at _some_ point in the future.  But,
this is a "system" state which requires using the currently
unsupported XSAVES feature.  Unlike all the other XSAVE states,
PT state is also not directly tied to a thread.  You might
context-switch between threads, but not want to change any of the
PT state.  Or, you might switch between threads, and *do* want to
change PT state, all depending on what is being traced.

We currently just manually set some MSRs to do this PT context
switching, and it is unclear whether replacing our direct MSR use
with XSAVE will be a net win or loss, both in code complexity and
performance.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: fenghua.yu@intel.com
Cc: linux-mm@kvack.org
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20160212210158.5E4BCAE2@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 10:11:13 +01:00
Ingo Molnar
1fe3f29e4a Merge branches 'x86/fpu', 'x86/mm' and 'x86/asm' into x86/pkeys
Provide a stable basis for the pkeys patches, which touches various
x86 details.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 09:37:37 +01:00
Borislav Petkov
f2cc8e0791 x86/cpufeature: Speed up cpu_feature_enabled()
When GCC cannot do constant folding for this macro, it falls back to
cpu_has(). But static_cpu_has() is optimal and it works at all times
now. So use it and speedup the fallback case.

Before we had this:

  mov    0x99d674(%rip),%rdx        # ffffffff81b0d9f4 <boot_cpu_data+0x34>
  shr    $0x2e,%rdx
  and    $0x1,%edx
  jne    ffffffff811704e9 <do_munmap+0x3f9>

After alternatives patching, it turns into:

		  jmp    0xffffffff81170390
		  nopl   (%rax)
		  ...
		  callq  ffffffff81056e00 <mpx_notify_unmap>
ffffffff81170390: mov    0x170(%r12),%rdi

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455578358-28347-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-16 08:45:15 +01:00
Bjorn Helgaas
ed07247dbf gpio: Remove unused asm/gpio.h files
asm/gpio.h is included only by linux/gpio.h, and then only when the arch
selects ARCH_HAVE_CUSTOM_GPIO_H.  Only the following arches select it: arm
avr32 blackfin m68k (COLDFIRE only) sh unicore32.

Remove the unused asm/gpio.h files for the arches that do not select
ARCH_HAVE_CUSTOM_GPIO_H.

This is a follow-on to 7563bbf89d ("gpiolib/arches: Centralise
bolierplate asm/gpio.h").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2016-02-16 00:20:04 +01:00
Konrad Rzeszutek Wilk
2cfec6a2f9 xen/pcifront: Report the errors better.
The messages should be different depending on the type of error.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2016-02-15 14:21:18 +00:00
Borislav Petkov
e2c7698cd6 x86/mm: Fix INVPCID asm constraint
So we want to specify the dependency on both @pcid and @addr so that the
compiler doesn't reorder accesses to them *before* the TLB flush. But
for that to work, we need to express this properly in the inline asm and
deref the whole desc array, not the pointer to it. See clwb() for an
example.

This fixes the build error on 32-bit:

  arch/x86/include/asm/tlbflush.h: In function ‘__invpcid’:
  arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not directly addressable

which gcc4.7 caught but 5.x didn't. Which is strange. :-\

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Michael Matz <matz@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-14 11:19:06 +01:00
Andy Lutomirski
4ecd16ec70 x86/fpu: Fix math emulation in eager fpu mode
Systems without an FPU are generally old and therefore use lazy FPU
switching. Unsurprisingly, math emulation in eager FPU mode is a
bit buggy. Fix it.

There were two bugs involving kernel code trying to use the FPU
registers in eager mode even if they didn't exist and one BUG_ON()
that was incorrect.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/b4b8d112436bd6fab866e1b4011131507e8d7fbe.1453675014.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 15:42:55 +01:00
Andy Lutomirski
ce1143aa60 x86/dmi: Switch dmi_remap() from ioremap() [uncached] to ioremap_cache()
DMI cacheability is very confused on x86.

dmi_early_remap() uses early_ioremap(), which uses FIXMAP_PAGE_IO,
which is __PAGE_KERNEL_IO, which is __PAGE_KERNEL, which is cached.

Don't ask me why this makes any sense.

dmi_remap() uses ioremap(), which requests an uncached mapping.

However, on non-EFI systems, the DMI data generally lives between
0xf0000 and 0x100000, which is in the legacy ISA range, which
triggers a special case in the PAT code that overrides the cache
mode requested by ioremap() and forces a WB mapping.

On a UEFI boot, however, the DMI table can live at any physical
address.  On my laptop, it's around 0x77dd0000.  That's nowhere near
the legacy ISA range, so the ioremap() implicit uncached type is
honored and we end up with a UC- mapping.

UC- is a very, very slow way to read from main memory, so dmi_walk()
is likely to take much longer than necessary.

Given that, even on UEFI, we do early cached DMI reads, it seems
safe to just ask for cached access.  Switch to ioremap_cache().

I haven't tried to benchmark this, but I'd guess it saves several
milliseconds of boot time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/3147c38e51f439f3c8911db34c7d4ab22d854915.1453791969.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 14:36:43 +01:00
Andy Lutomirski
d8bced79af x86/mm: If INVPCID is available, use it to flush global mappings
On my Skylake laptop, INVPCID function 2 (flush absolutely
everything) takes about 376ns, whereas saving flags, twiddling
CR4.PGE to flush global mappings, and restoring flags takes about
539ns.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/ed0ef62581c0ea9c99b9bf6df726015e96d44743.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 13:36:11 +01:00
Andy Lutomirski
060a402a1d x86/mm: Add INVPCID helpers
This adds helpers for each of the four currently-specified INVPCID
modes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/8a62b23ad686888cee01da134c91409e22064db9.1454096309.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 13:36:10 +01:00
Feng Wu
520040146a KVM: x86: Use vector-hashing to deliver lowest-priority interrupts
Use vector-hashing to deliver lowest-priority interrupts, As an
example, modern Intel CPUs in server platform use this method to
handle lowest-priority interrupts.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-02-09 13:24:40 +01:00
Borislav Petkov
5f9c01aa7c x86/microcode: Untangle from BLK_DEV_INITRD
Thomas Voegtle reported that doing oldconfig with a .config which has
CONFIG_MICROCODE enabled but BLK_DEV_INITRD disabled prevents the
microcode loading mechanism from being built.

So untangle it from the BLK_DEV_INITRD dependency so that oldconfig
doesn't turn it off and add an explanatory text to its Kconfig help what
the supported methods for supplying microcode are.

Reported-by: Thomas Voegtle <tv@lio96.de>
Tested-by: Thomas Voegtle <tv@lio96.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # 4.4
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1454499225-21544-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 11:41:15 +01:00
Denys Vlasenko
8dd5032d9c x86/asm/bitops: Force inlining of test_and_set_bit and friends
Sometimes GCC mysteriously doesn't inline very small functions
we expect to be inlined, see:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

Arguably, GCC should do better, but GCC people aren't willing
to invest time into it and are asking to use __always_inline
instead.

With this .config:

  http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os

here's an example of functions getting deinlined many times:

  test_and_set_bit (166 copies, ~1260 calls)
         55                      push   %rbp
         48 89 e5                mov    %rsp,%rbp
         f0 48 0f ab 3e          lock bts %rdi,(%rsi)
         72 04                   jb     <test_and_set_bit+0xf>
         31 c0                   xor    %eax,%eax
         eb 05                   jmp    <test_and_set_bit+0x14>
         b8 01 00 00 00          mov    $0x1,%eax
         5d                      pop    %rbp
         c3                      retq

  test_and_clear_bit (124 copies, ~1000 calls)
         55                      push   %rbp
         48 89 e5                mov    %rsp,%rbp
         f0 48 0f b3 3e          lock btr %rdi,(%rsi)
         72 04                   jb     <test_and_clear_bit+0xf>
         31 c0                   xor    %eax,%eax
         eb 05                   jmp    <test_and_clear_bit+0x14>
         b8 01 00 00 00          mov    $0x1,%eax
         5d                      pop    %rbp
         c3                      retq

  change_bit (3 copies, 8 calls)
         55                      push   %rbp
         48 89 e5                mov    %rsp,%rbp
         f0 48 0f bb 3e          lock btc %rdi,(%rsi)
         5d                      pop    %rbp
         c3                      retq

  clear_bit_unlock (2 copies, 11 calls)
         55                      push   %rbp
         48 89 e5                mov    %rsp,%rbp
         f0 48 0f b3 3e          lock btr %rdi,(%rsi)
         5d                      pop    %rbp
         c3                      retq

This patch works it around via s/inline/__always_inline/.

Code size decrease by ~13.5k after the patch:

      text     data      bss       dec    filename
  92110727 20826144 36417536 149354407    vmlinux.before
  92097234 20826176 36417536 149340946    vmlinux.after

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Graf <tgraf@suug.ch>
Link: http://lkml.kernel.org/r/1454881887-1367-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 10:31:54 +01:00
Ingo Molnar
2a96fd7417 Linux 4.5-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWt9V+AAoJEHm+PkMAQRiGvqUH/iKVFxRB+MpfwRteRv22gfxJ
 8Y5jDuCt2qkePHieN5JoY4FqWIA+hR+UnctHBvOCLC7DNbY79yaYuWeuShfnRVw2
 91Kspk2tgnWkjlZDZ273VNKAEveFYUdhHvHFUlItWubVIlzrKRGKdAcUpecQ33YD
 IERr4VFxdQDgksGPzH2FLJwcAKilWWygQ9ATTsY0xJepTC6PwxAFxrNx5qPiqOoI
 aHJBzczk50y9hHEPkYeeWloyzIZUmf81BCi/VwOsbB30q4ENwkS2MMnebhOmBjLp
 g9Bml8m/Fk4N+h2vbcLj6Bu+rE+bYDZDZ2Fmbupy4NbEkng3x3iaTPkG+R2hErQ=
 =IdcS
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc3' into locking/core, to refresh the tree

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-09 10:26:02 +01:00
Ingo Molnar
b349e9a916 Merge branch 'x86/urgent' into x86/mm, to pick up dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-08 12:13:22 +01:00
Dmitry Vyukov
75edb54a1d x86: Fix KASAN false positives in thread_saved_pc()
thread_saved_pc() reads stack of a potentially running task.
This can cause false KASAN stack-out-of-bounds reports,
because the running task concurrently poisons and unpoisons
own stack.

The same happens in get_wchan(), and get get_wchan() was fixed
by using READ_ONCE_NOCHECK(). Do the same here.

Example KASAN report triggered by sysrq-t:

  BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18
  Read of size 8 by task syz-executor/23839
  [...]
  page dumped because: kasan: bad access detected
  [...]
  Call Trace:
   [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40
   [<ffffffff813e7a26>] sched_show_task+0x306/0x3b0
   [<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0
   [<ffffffff82d2ca00>] fn_show_state+0x10/0x20
   [<ffffffff82d2cf98>] k_spec+0xa8/0xe0
   [<ffffffff82d3354f>] kbd_event+0xb9f/0x4000
   [<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0
   [<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0
   [<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070
   [<ffffffff843d3a47>] input_inject_event+0x237/0x280
   [<ffffffff843e8c28>] evdev_write+0x478/0x680
   [<ffffffff817ac653>] __vfs_write+0x113/0x480
   [<ffffffff817ae0e7>] vfs_write+0x167/0x4a0
   [<ffffffff817b13d1>] SyS_write+0x111/0x220

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: glider@google.com
Cc: kasan-dev@googlegroups.com
Cc: kcc@google.com
Cc: linux-kernel@vger.kernel.org
Cc: ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-05 08:41:52 +01:00
Ingo Molnar
03e075b38e Merge branch 'linus' into efi/core, to refresh the branch and to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-03 11:30:36 +01:00
Aravind Gopalakrishnan
e6c8f1873b x86/mce/AMD: Set MCAX Enable bit
It is required for the OS to acknowledge that it is using the
MCAX register set and its associated fields by setting the
'McaXEnable' bit in each bank's MCi_CONFIG register. If it is
not set, then all UC errors will cause a system panic.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1453750913-4781-9-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01 10:53:59 +01:00
Huaitong Han
16aaa53756 x86/cpufeature: Use enum cpuid_leafs instead of magic numbers
Most of the magic numbers in x86_capability[] have been converted to
'enum cpuid_leafs', and this patch updates the remaining part.

Signed-off-by: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: lguest@lists.ozlabs.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1453750913-4781-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-01 10:46:48 +01:00
Linus Torvalds
d517be5fcf Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A bit on the largish side due to a series of fixes for a regression in
  the x86 vector management which was introduced in 4.3.  This work was
  started in December already, but it took some time to fix all corner
  cases and a couple of older bugs in that area which were detected
  while at it

  Aside of that a few platform updates for intel-mid, quark and UV and
  two fixes for in the mm code:
   - Use proper types for pgprot values to avoid truncation
   - Prevent a size truncation in the pageattr code when setting page
     attributes for large mappings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm/pat: Avoid truncation when converting cpa->numpages to address
  x86/mm: Fix types used in pgprot cacheability flags translations
  x86/platform/quark: Print boundaries correctly
  x86/platform/UV: Remove EFI memmap quirk for UV2+
  x86/platform/intel-mid: Join string and fix SoC name
  x86/platform/intel-mid: Enable 64-bit build
  x86/irq: Plug vector cleanup race
  x86/irq: Call irq_force_move_complete with irq descriptor
  x86/irq: Remove outgoing CPU from vector cleanup mask
  x86/irq: Remove the cpumask allocation from send_cleanup_vector()
  x86/irq: Clear move_in_progress before sending cleanup IPI
  x86/irq: Remove offline cpus from vector cleanup
  x86/irq: Get rid of code duplication
  x86/irq: Copy vectormask instead of an AND operation
  x86/irq: Check vector allocation early
  x86/irq: Reorganize the search in assign_irq_vector
  x86/irq: Reorganize the return path in assign_irq_vector
  x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
  x86/irq: Validate that irq descriptor is still active
  x86/irq: Fix a race in x86_vector_free_irqs()
  ...
2016-01-31 16:17:19 -08:00
Brian Gerst
2476f2fa20 x86/alternatives: Discard dynamic check after init
Move the code to do the dynamic check to the altinstr_aux
section so that it is discarded after alternatives have run and
a static branch has been chosen.

This way we're changing the dynamic branch from C code to
assembly, which makes it *substantially* smaller while avoiding
a completely unnecessary call to an out of line function.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
[ Changed it to do TESTB, as hpa suggested. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1452972124-7380-1-git-send-email-brgerst@gmail.com
Link: http://lkml.kernel.org/r/20160127084525.GC30712@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:22 +01:00
Borislav Petkov
a362bf9f5e x86/cpufeature: Get rid of the non-asm goto variant
I can simply quote hpa from the mail:

  "Get rid of the non-asm goto variant and just fall back to
   dynamic if asm goto is unavailable. It doesn't make any sense,
   really, if it is supposed to be safe, and by now the asm
   goto-capable gcc is in more wide use. (Originally the gcc 3.x
   fallback to pure dynamic didn't exist, either.)"

Booy, am I lazy.

Cleanup the whole CC_HAVE_ASM_GOTO ifdeffery too, while at it.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160127084325.GB30712@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:19 +01:00
Borislav Petkov
bc696ca05f x86/cpufeature: Replace the old static_cpu_has() with safe variant
So the old one didn't work properly before alternatives had run.
And it was supposed to provide an optimized JMP because the
assumption was that the offset it is jumping to is within a
signed byte and thus a two-byte JMP.

So I did an x86_64 allyesconfig build and dumped all possible
sites where static_cpu_has() was used. The optimization amounted
to all in all 12(!) places where static_cpu_has() had generated
a 2-byte JMP. Which has saved us a whopping 36 bytes!

This clearly is not worth the trouble so we can remove it. The
only place where the optimization might count - in __switch_to()
- we will handle differently. But that's not subject of this
patch.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:18 +01:00
Borislav Petkov
cd4d09ec6f x86/cpufeature: Carve out X86_FEATURE_*
Move them to a separate header and have the following
dependency:

  x86/cpufeatures.h <- x86/processor.h <- x86/cpufeature.h

This makes it easier to use the header in asm code and not
include the whole cpufeature.h and add guards for asm.

Suggested-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1453842730-28463-5-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:22:17 +01:00
Ingo Molnar
78726ee5ff Merge branch 'x86/cpu' into x86/asm, to avoid conflict
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-30 11:21:40 +01:00
Ingo Molnar
76b36fa896 Linux 4.5-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWpTzxAAoJEHm+PkMAQRiGKJEH/0vq8pgt1F4UYSMZLZ0bot5B
 iGNq/hPW91xcCVYXf5xfc6LzePd9L1rnKpP0ml+qmTInYw8YaCI/hCY6w32QfhP9
 3V3q1052T2eZJALqQQd0UH+F/ylTB8dHAPB+n8PBRxPEqpHb/ox+Ry70xbZefvaQ
 eOKSNBkZEIOFjURZZfeU0NrIzf8nKti8Dw84utGU2N+OICKGXzUmPLoObR0BiMHn
 2Xu54S4OPFKB49yfnW55PGiI+dawbVD+iSNEJtK4vMk5Ue7lxHXZ1njVeOdXd2Ls
 ggy3PPRt0LhDYLHQvr8Ir9uySLw7vUI6bhpvFm/freN4rxGvgxOZbhoQgtzqG/k=
 =1oU3
 -----END PGP SIGNATURE-----

Merge tag 'v4.5-rc1' into x86/asm, to refresh the branch before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 09:41:18 +01:00
Michael S. Tsirkin
57d9b1b434 locking/x86: Tweak the comment about use of wmb() for IO
On x86, we *do* still use the non-NOP rmb()/wmb() for IO barriers,
but even that is generally questionable.

Leave them around as historial unless somebody can point to a
case where they care about the performance, but tweak the
comment so people don't think they are strictly required in all
cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization <virtualization@lists.linux-foundation.org>
Link: http://lkml.kernel.org/r/1453921746-16178-4-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 09:40:10 +01:00
Michael S. Tsirkin
e37cee133c locking/x86: Drop a comment left over from X86_OOSTORE
The comment about wmb being non-NOP to deal with non-Intel CPUs
is a left over from before the following commit:

  09df7c4c80 ("x86: Remove CONFIG_X86_OOSTORE")

It makes no sense now: in particular, wmb() is not a NOP even for
regular Intel CPUs because of weird use-cases e.g. dealing with
WC memory.

Drop this comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization <virtualization@lists.linux-foundation.org>
Link: http://lkml.kernel.org/r/1453921746-16178-3-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 09:40:10 +01:00
Michael S. Tsirkin
bd922477d9 locking/x86: Add cc clobber for ADDL
ADDL clobbers flags (such as CF) but barrier.h didn't tell this
to GCC. Historically, GCC doesn't need one on x86, and always
considers flags clobbered. We are probably missing the cc
clobber in a *lot* of places for this reason.

But even if not necessary, it's probably a good thing to add for
documentation, and in case GCC semantcs ever change.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization <virtualization@lists.linux-foundation.org>
Link: http://lkml.kernel.org/r/1453921746-16178-2-git-send-email-mst@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-29 09:40:10 +01:00
Jan Beulich
3625c2c234 x86/mm: Fix types used in pgprot cacheability flags translations
For PAE kernels "unsigned long" is not suitable to hold page protection
flags, since _PAGE_NX doesn't fit there. This is the reason for quite a
few W+X pages getting reported as insecure during boot (observed namely
for the entire initrd range).

Fixes: 281d4078be ("x86: Make page cache mode a real type")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <JGross@suse.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/56A7635602000078000CAFF1@prv-mh.provo.novell.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-26 21:05:36 +01:00
Ross Zwisler
3f4a2670de pmem: add wb_cache_pmem() to the PMEM API
__arch_wb_cache_pmem() was already an internal implementation detail of
the x86 PMEM API, but this functionality needs to be exported as part of
the general PMEM API to handle the fsync/msync case for DAX mmaps.

One thing worth noting is that we really do want this to be part of the
PMEM API as opposed to a stand-alone function like clflush_cache_range()
because of ordering restrictions.  By having wb_cache_pmem() as part of
the PMEM API we can leave it unordered, call it multiple times to write
back large amounts of memory, and then order the multiple calls with a
single wmb_pmem().

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-22 17:02:18 -08:00
Linus Torvalds
404a47410c Merge branch 'uaccess' (batched user access infrastructure)
Expose an interface to allow users to mark several accesses together as
being user space accesses, allowing batching of the surrounding user
space access markers (SMAP on x86, PAN on arm64, domain register
switching on arm).

This is currently only used for the user string lenth and copying
functions, where the SMAP overhead on x86 drowned the actual user
accesses (only noticeable on newer microarchitectures that support SMAP
in the first place, of course).

* user access batching branch:
  Use the new batched user accesses in generic user string handling
  Add 'unsafe' user access functions for batched accesses
  x86: reorganize SMAP handling in user space accesses
2016-01-21 13:02:41 -08:00
Linus Torvalds
eae21770b4 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "I'm pretty much done for -rc1 now:

   - the rest of MM, basically

   - lib/ updates

   - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

   - cpu_mask simplifications

   - kexec, rapidio, MAINTAINERS etc, etc.

   - more dma-mapping cleanups/simplifications from hch"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits)
  MAINTAINERS: add/fix git URLs for various subsystems
  mm: memcontrol: add "sock" to cgroup2 memory.stat
  mm: memcontrol: basic memory statistics in cgroup2 memory controller
  mm: memcontrol: do not uncharge old page in page cache replacement
  Documentation: cgroup: add memory.swap.{current,max} description
  mm: free swap cache aggressively if memcg swap is full
  mm: vmscan: do not scan anon pages if memcg swap limit is hit
  swap.h: move memcg related stuff to the end of the file
  mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
  mm: vmscan: pass memcg to get_scan_count()
  mm: memcontrol: charge swap to cgroup2
  mm: memcontrol: clean up alloc, online, offline, free functions
  mm: memcontrol: flatten struct cg_proto
  mm: memcontrol: rein in the CONFIG space madness
  net: drop tcp_memcontrol.c
  mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
  mm: memcontrol: allow to disable kmem accounting for cgroup2
  mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
  mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
  mm: memcontrol: separate kmem code from legacy tcp accounting code
  ...
2016-01-21 12:32:08 -08:00
Linus Torvalds
d43421565b PCI changes for the v4.5 merge window:
Enumeration
     Simplify config space size computation (Bjorn Helgaas)
     Avoid iterating through ROM outside the resource window (Edward O'Callaghan)
     Support PCIe devices with short cfg_size (Jason S. McMullan)
     Add Netronome vendor and device IDs (Jason S. McMullan)
     Limit config space size for Netronome NFP6000 family (Jason S. McMullan)
     Add Netronome NFP4000 PF device ID (Simon Horman)
     Limit config space size for Netronome NFP4000 (Simon Horman)
     Print warnings for all invalid expansion ROM headers (Vladis Dronov)
 
   Resource management
     Fix minimum allocation address overwrite (Christoph Biedl)
 
   PCI device hotplug
     acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King)
     pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck)
     shpchp: Constify hpc_ops structure (Julia Lawall)
     ibmphp: Remove unneeded NULL test (Julia Lawall)
 
   Power management
     Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski)
 
   Virtualization
     Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander)
 
   MSI
     Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas)
     Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko)
     Initialize MSI capability for all architectures (Guilherme G. Piccoli)
     Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang)
 
   ARM Versatile host bridge driver
     Remove unused pci_sys_data structures (Lorenzo Pieralisi)
 
   Broadcom iProc host bridge driver
     Hide CONFIG_PCIE_IPROC (Arnd Bergmann)
     Do not use 0x in front of %pap (Dmitry V. Krivenok)
     Update iProc PCIe device tree binding (Ray Jui)
     Add PAXC interface support (Ray Jui)
     Add iProc PCIe MSI device tree binding (Ray Jui)
     Add iProc PCIe MSI support (Ray Jui)
 
   Freescale i.MX6 host bridge driver
     Use gpio_set_value_cansleep() (Fabio Estevam)
     Add support for active-low reset GPIO (Petr Štetiar)
 
   HiSilicon host bridge driver
     Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni)
 
   Intel VMD host bridge driver
     Export irq_domain_set_info() for module use (Keith Busch)
     x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch)
     Use 32 bit PCI domain numbers (Keith Busch)
     Add driver for Intel Volume Management Device (VMD) (Keith Busch)
 
   Qualcomm host bridge driver
     Document PCIe devicetree bindings (Stanimir Varbanov)
     Add Qualcomm PCIe controller driver (Stanimir Varbanov)
     dts: apq8064: add PCIe devicetree node (Stanimir Varbanov)
     dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov)
 
   Renesas R-Car host bridge driver
     Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa)
     Allow DT to override default window settings (Phil Edworthy)
     Convert to DT resource parsing API (Phil Edworthy)
     Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy)
     Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy)
     Add runtime PM support to pcie-rcar (Phil Edworthy)
     Add Gen2 PHY setup to pcie-rcar (Phil Edworthy)
     Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman)
     Add gen2 fallback compatibility string for pcie-rcar (Simon Horman)
 
   Synopsys DesignWare host bridge driver
     Simplify control flow (Bjorn Helgaas)
     Make config accessor override checking symmetric (Bjorn Helgaas)
     Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov)
 
   Miscellaneous
     Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann)
     Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas)
     Fix all whitespace issues (Bogicevic Sasa)
     x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang)
     Use to_pci_dev() instead of open-coding it (Geliang Tang)
     Use kobj_to_dev() instead of open-coding it (Geliang Tang)
     Use list_for_each_entry() to simplify code (Geliang Tang)
     Fix typos in <linux/msi.h> (Thomas Petazzoni)
     x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWoQIuAAoJEFmIoMA60/r8ckYP/0ZrkANeN1SB5cQVi2k7aceq
 kQb1Hk6ifxohJvgpJ/iwmVCHoApyeBfUBfrC+fUpIC2f7ncPsE5HNyjqpAWzFzj2
 sYWwY029yjBQ9g4mPhkvjBXfha+lNtLthWc+Xxcat5pdcyG63Dg4SfJKWm2ZYnbN
 0GJzyRZXIwAMnNf0KIr61Aqru0nXeHvi5wblyJ08UZ7AcNzCtB0wKLmE3S6SeZVF
 f2fry35zcGu+TFvQ1hAYemfl3XyDBJ87nPiKzJAwYSaKcWPFWt+72PBDPO6X9squ
 6prm4nmAgeG2Oo4Zu0fbkDlB2bEsWUc14/xT0i5Wfs35vcwzF+S1zirJAtVqoNir
 NgC7fSbEHbsS7FZOz0rBOBIvIkbb6NdfLFuZqUFv0X1M5bRFywjo8lZRfAYoGJzK
 Mmus0uKbklx5m6RT5adf9+Plev1YJT6XZW9XrDpGnxrwRyPjHmyvuTWsYkumxY7Q
 CE5Wr3p7q2I2+MtrQVv2D9Nzsb+4zQ6BgHrd2vwR/IxTsfdXLU7+B691wkUDX8No
 UKFxBd0FiVCn+srG96u7lWQvdoUqoNCogTZSVzGR5gFBv3zAN9gi8HS7NbV558Mg
 Io3Xw+6dcbG33uvWdU6jHEDLMQsohZcp05Q5esCgRQNV4cGJbPxBDtOZEO/ezvW4
 FAI7lfgYTFiQK3NzE3Ng
 =z9mQ
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.5 merge window:

  Enumeration:
   - Simplify config space size computation (Bjorn Helgaas)
   - Avoid iterating through ROM outside the resource window (Edward O'Callaghan)
   - Support PCIe devices with short cfg_size (Jason S. McMullan)
   - Add Netronome vendor and device IDs (Jason S. McMullan)
   - Limit config space size for Netronome NFP6000 family (Jason S. McMullan)
   - Add Netronome NFP4000 PF device ID (Simon Horman)
   - Limit config space size for Netronome NFP4000 (Simon Horman)
   - Print warnings for all invalid expansion ROM headers (Vladis Dronov)

  Resource management:
   - Fix minimum allocation address overwrite (Christoph Biedl)

  PCI device hotplug:
   - acpiphp_ibm: Fix null dereferences on null ibm_slot (Colin Ian King)
   - pciehp: Always protect pciehp_disable_slot() with hotplug mutex (Guenter Roeck)
   - shpchp: Constify hpc_ops structure (Julia Lawall)
   - ibmphp: Remove unneeded NULL test (Julia Lawall)

  Power management:
   - Make ASPM sysfs link_state_store() consistent with link_state_show() (Andy Lutomirski)

  Virtualization
   - Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183 (Tim Sander)

  MSI:
   - Remove empty pci_msi_init_pci_dev() (Bjorn Helgaas)
   - Mark PCIe/PCI (MSI) IRQ cascade handlers as IRQF_NO_THREAD (Grygorii Strashko)
   - Initialize MSI capability for all architectures (Guilherme G. Piccoli)
   - Relax msi_domain_alloc() to support parentless MSI irqdomains (Liu Jiang)

  ARM Versatile host bridge driver:
   - Remove unused pci_sys_data structures (Lorenzo Pieralisi)

  Broadcom iProc host bridge driver:
   - Hide CONFIG_PCIE_IPROC (Arnd Bergmann)
   - Do not use 0x in front of %pap (Dmitry V. Krivenok)
   - Update iProc PCIe device tree binding (Ray Jui)
   - Add PAXC interface support (Ray Jui)
   - Add iProc PCIe MSI device tree binding (Ray Jui)
   - Add iProc PCIe MSI support (Ray Jui)

  Freescale i.MX6 host bridge driver:
   - Use gpio_set_value_cansleep() (Fabio Estevam)
   - Add support for active-low reset GPIO (Petr Štetiar)

  HiSilicon host bridge driver:
   - Add support for HiSilicon Hip06 PCIe host controllers (Gabriele Paoloni)

  Intel VMD host bridge driver:
   - Export irq_domain_set_info() for module use (Keith Busch)
   - x86/PCI: Allow DMA ops specific to a PCI domain (Keith Busch)
   - Use 32 bit PCI domain numbers (Keith Busch)
   - Add driver for Intel Volume Management Device (VMD) (Keith Busch)

  Qualcomm host bridge driver:
   - Document PCIe devicetree bindings (Stanimir Varbanov)
   - Add Qualcomm PCIe controller driver (Stanimir Varbanov)
   - dts: apq8064: add PCIe devicetree node (Stanimir Varbanov)
   - dts: ifc6410: enable PCIe DT node for this board (Stanimir Varbanov)

  Renesas R-Car host bridge driver:
   - Add support for R-Car H3 to pcie-rcar (Harunobu Kurokawa)
   - Allow DT to override default window settings (Phil Edworthy)
   - Convert to DT resource parsing API (Phil Edworthy)
   - Revert "PCI: rcar: Build pcie-rcar.c only on ARM" (Phil Edworthy)
   - Remove unused pci_sys_data struct from pcie-rcar (Phil Edworthy)
   - Add runtime PM support to pcie-rcar (Phil Edworthy)
   - Add Gen2 PHY setup to pcie-rcar (Phil Edworthy)
   - Add gen2 fallback compatibility string for pci-rcar-gen2 (Simon Horman)
   - Add gen2 fallback compatibility string for pcie-rcar (Simon Horman)

  Synopsys DesignWare host bridge driver:
   - Simplify control flow (Bjorn Helgaas)
   - Make config accessor override checking symmetric (Bjorn Helgaas)
   - Ensure ATU is enabled before IO/conf space accesses (Stanimir Varbanov)

  Miscellaneous:
   - Add of_pci_get_host_bridge_resources() stub (Arnd Bergmann)
   - Check for PCI_HEADER_TYPE_BRIDGE equality, not bitmask (Bjorn Helgaas)
   - Fix all whitespace issues (Bogicevic Sasa)
   - x86/PCI: Simplify pci_bios_{read,write} (Geliang Tang)
   - Use to_pci_dev() instead of open-coding it (Geliang Tang)
   - Use kobj_to_dev() instead of open-coding it (Geliang Tang)
   - Use list_for_each_entry() to simplify code (Geliang Tang)
   - Fix typos in <linux/msi.h> (Thomas Petazzoni)
   - x86/PCI: Clarify AMD Fam10h config access restrictions comment (Tomasz Nowicki)"

* tag 'pci-v4.5-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (58 commits)
  PCI: Add function 1 DMA alias quirk for Lite-On/Plextor M6e/Marvell 88SS9183
  PCI: Limit config space size for Netronome NFP4000
  PCI: Add Netronome NFP4000 PF device ID
  x86/PCI: Add driver for Intel Volume Management Device (VMD)
  PCI/AER: Use 32 bit PCI domain numbers
  x86/PCI: Allow DMA ops specific to a PCI domain
  irqdomain: Export irq_domain_set_info() for module use
  PCI: host: Add of_pci_get_host_bridge_resources() stub
  genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains
  PCI: rcar: Add Gen2 PHY setup to pcie-rcar
  PCI: rcar: Add runtime PM support to pcie-rcar
  PCI: designware: Make config accessor override checking symmetric
  PCI: ibmphp: Remove unneeded NULL test
  ARM: dts: ifc6410: enable PCIe DT node for this board
  ARM: dts: apq8064: add PCIe devicetree node
  PCI: hotplug: Use list_for_each_entry() to simplify code
  PCI: rcar: Remove unused pci_sys_data struct from pcie-rcar
  PCI: hisi: Add support for HiSilicon Hip06 PCIe host controllers
  PCI: Avoid iterating through memory outside the resource window
  PCI: acpiphp_ibm: Fix null dereferences on null ibm_slot
  ...
2016-01-21 11:52:16 -08:00
Christoph Hellwig
e1c7e32453 dma-mapping: always provide the dma_map_ops based implementation
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Andy Lutomirski
7c360572b4 x86/mm: Make kmap_prot into a #define
The value (once we initialize it) is a foregone conclusion.
Make it a #define to save a tiny amount of text and data size
and to make it more comprehensible.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/0850eb0213de9da88544ff7fae72dc6d06d2b441.1453239349.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-20 11:39:14 +01:00
Linus Torvalds
2b4015e9fb platform-drivers-x86 for 4.5-1
Add intel punit and telemetry driver for APL SoCs.
 Add intel-hid driver for various laptop hotkey support.
 Add asus-wireless radio control driver.
 Keyboard backlight support/improvements for ThinkPads, Vaio, and Toshiba.
 Several hotkey related fixes and improvements for dell and toshiba.
 Fix oops on dual GPU Macs in apple-gmux.
 A few new device IDs and quirks.
 Various minor config related build issues and cleanups.
 
 surface pro 4:
  - fix compare_const_fl.cocci warnings
  - Add support for Surface Pro 4 Buttons
 
 platform/x86:
  - Add Intel Telemetry Debugfs interfaces
  - Add Intel telemetry platform device
  - Add Intel telemetry platform driver
  - Add Intel Telemetry Core Driver
  - add NULL check for input parameters
  - add Intel P-Unit mailbox IPC driver
  - update acpi resource structure for Punit
 
 thinkpad_acpi:
  - Add support for keyboard backlight
 
 dell-wmi:
  - Process only one event on devices with interface version 0
  - Check if Dell WMI descriptor structure is valid
  - Improve unknown hotkey handling
  - Use a C99-style array for bios_to_linux_keycode
 
 tc1100-wmi:
  - fix build warning when CONFIG_PM not enabled
 
 asus-wireless:
  - Add ACPI HID ATK4001
  - Add Asus Wireless Radio Control driver
 
 asus-wmi:
  - drop to_platform_driver macro
 
 intel-hid:
  - new hid event driver for hotkeys
 
 sony-laptop:
  - Keyboard backlight control for some Vaio Fit models
 
 ideapad-laptop:
  - Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list
 
 apple-gmux:
  - Assign apple_gmux_data before registering
 
 toshiba_acpi:
  - Add rfkill dependency to ACPI_TOSHIBA entry
  - Fix keyboard backlight sysfs entries not being updated
  - Add WWAN RFKill support
  - Add support for WWAN devices
  - Fix blank screen at boot if transflective backlight is supported
  - Propagate the hotkey value via genetlink
 
 toshiba_bluetooth:
  - Add missing newline in toshiba_bluetooth_present function
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWnuSYAAoJEKbMaAwKp364xckH/A/INtwr1LzmSw8fsEIe6IGQ
 MKEzCHZOUdSPcZSHcTW/fE/J8kt1j4YdR2kkxNbivtxST5O81E6KfAmo37TRrTNx
 bunVt7Swj7FcPsm6t0m3PQkbWkOy3UMUncRaZjeofzhRl2m+bHEycEIJsHs7OU+e
 7UPjbWZjRz6/LahcryKDH/lRhweMEsgZSFzjNlby8Wzz4LRCu8HLwNGUQa0vQfNI
 2UW+s7g2ZkFXx9fbmWu9twkyP7LwNoaZ3IDGsEffIYjmXeZKe+yTPy/TueYYzpon
 aclJSjhxLNnfWUaO0eUt/H4T5VyUag3v3mp6PVSwm2TA3Cpuh1SDYrJcurls6Mw=
 =b4Wa
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.5-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver updates from Darren Hart:
 "Add intel punit and telemetry driver for APL SoCs.
  Add intel-hid driver for various laptop hotkey support.
  Add asus-wireless radio control driver.
  Keyboard backlight support/improvements for ThinkPads, Vaio, and Toshiba.
  Several hotkey related fixes and improvements for dell and toshiba.
  Fix oops on dual GPU Macs in apple-gmux.
  A few new device IDs and quirks.
  Various minor config related build issues and cleanups.

  surface pro 4:
   - fix compare_const_fl.cocci warnings
   - Add support for Surface Pro 4 Buttons

  platform/x86:
   - Add Intel Telemetry Debugfs interfaces
   - Add Intel telemetry platform device
   - Add Intel telemetry platform driver
   - Add Intel Telemetry Core Driver
   - add NULL check for input parameters
   - add Intel P-Unit mailbox IPC driver
   - update acpi resource structure for Punit

  thinkpad_acpi:
   - Add support for keyboard backlight

  dell-wmi:
   - Process only one event on devices with interface version 0
   - Check if Dell WMI descriptor structure is valid
   - Improve unknown hotkey handling
   - Use a C99-style array for bios_to_linux_keycode

  tc1100-wmi:
   - fix build warning when CONFIG_PM not enabled

  asus-wireless:
   - Add ACPI HID ATK4001
   - Add Asus Wireless Radio Control driver

  asus-wmi:
   - drop to_platform_driver macro

  intel-hid:
   - new hid event driver for hotkeys

  sony-laptop:
   - Keyboard backlight control for some Vaio Fit models

  ideapad-laptop:
   - Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list

  apple-gmux:
   - Assign apple_gmux_data before registering

  toshiba_acpi:
   - Add rfkill dependency to ACPI_TOSHIBA entry
   - Fix keyboard backlight sysfs entries not being updated
   - Add WWAN RFKill support
   - Add support for WWAN devices
   - Fix blank screen at boot if transflective backlight is supported
   - Propagate the hotkey value via genetlink

  toshiba_bluetooth:
   - Add missing newline in toshiba_bluetooth_present function"

* tag 'platform-drivers-x86-v4.5-1' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: (29 commits)
  surface pro 4: fix compare_const_fl.cocci warnings
  surface pro 4: Add support for Surface Pro 4 Buttons
  platform:x86: Add Intel Telemetry Debugfs interfaces
  platform:x86: Add Intel telemetry platform device
  platform:x86: Add Intel telemetry platform driver
  platform/x86: Add Intel Telemetry Core Driver
  intel_punit_ipc: add NULL check for input parameters
  thinkpad_acpi: Add support for keyboard backlight
  dell-wmi: Process only one event on devices with interface version 0
  dell-wmi: Check if Dell WMI descriptor structure is valid
  tc1100-wmi: fix build warning when CONFIG_PM not enabled
  asus-wireless: Add ACPI HID ATK4001
  platform/x86: Add Asus Wireless Radio Control driver
  asus-wmi: drop to_platform_driver macro
  intel-hid: new hid event driver for hotkeys
  Keyboard backlight control for some Vaio Fit models
  platform/x86: Add rfkill dependency to ACPI_TOSHIBA entry
  platform:x86: add Intel P-Unit mailbox IPC driver
  intel_pmc_ipc: update acpi resource structure for Punit
  ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list
  ...
2016-01-19 17:54:15 -08:00
Souvik Kumar Chakravarty
378f956e3f platform/x86: Add Intel Telemetry Core Driver
Intel PM Telemetry is a software mechanism via which various SoC
PM and performance related parameters like PM counters, firmware
trace verbosity, the status of different devices inside the SoC, etc.
can be monitored and analyzed. The different samples that may be
monitored can be configured at runtime via exported APIs.

This patch adds the telemetry core driver that implements basic
exported APIs.

Signed-off-by: Souvik Kumar Chakravarty <souvik.k.chakravarty@intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19 17:35:50 -08:00
Qipeng Zha
fdca4f16f5 platform:x86: add Intel P-Unit mailbox IPC driver
This driver provides support for P-Unit mailbox IPC on Intel platforms.
The heart of the P-Unit is the Foxton microcontroller and its firmware,
which provide mailbox interface for power management usage.

Signed-off-by: Qipeng Zha <qipeng.zha@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
2016-01-19 15:49:36 -08:00
Josh Poimboeuf
ec5186557a x86/asm: Add C versions of frame pointer macros
Add C versions of the frame pointer macros which can be used to
create a stack frame in inline assembly.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f6786a282bf232ede3e2866414eae3cf02c7d662.1450442274.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-19 12:59:07 +01:00
Josh Poimboeuf
997963edd9 x86/asm: Clean up frame pointer macros
The asm macros for setting up and restoring the frame pointer
aren't currently being used.  However, they will be needed soon
to help asm functions to comply with stacktool.

Rename FRAME/ENDFRAME to FRAME_BEGIN/FRAME_END for more
symmetry.  Also make the code more readable and improve the
comments.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Chris J Arges <chris.j.arges@canonical.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3f488a8e3bfc8ac7d4d3d350953e664e7182b044.1450442274.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-19 12:59:07 +01:00
Borislav Petkov
a1ff572608 x86/cpufeature: Add AMD AVIC bit
CPUID Fn8000_000A_EDX[13] denotes support for AMD's Virtual
Interrupt controller, i.e., APIC virtualization.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Kaplan <david.kaplan@amd.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Link: http://lkml.kernel.org/r/1452938292-12327-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-19 08:29:28 +01:00
Linus Torvalds
a200dcb346 virtio: barrier rework+fixes
This adds a new kind of barrier, and reworks virtio and xen
 to use it.
 Plus some fixes here and there.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z
 DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H
 YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq
 ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz
 k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4
 gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc=
 =ukaV
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio barrier rework+fixes from Michael Tsirkin:
 "This adds a new kind of barrier, and reworks virtio and xen to use it.

  Plus some fixes here and there"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
  checkpatch: add virt barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch.pl: add missing memory barriers
  virtio: make find_vqs() checkpatch.pl-friendly
  virtio_balloon: fix race between migration and ballooning
  virtio_balloon: fix race by fill and leak
  s390: more efficient smp barriers
  s390: use generic memory barriers
  xen/events: use virt_xxx barriers
  xen/io: use virt_xxx barriers
  xenbus: use virt_xxx barriers
  virtio_ring: use virt_store_mb
  sh: move xchg_cmpxchg to a header by itself
  sh: support 1 and 2 byte xchg
  virtio_ring: update weak barriers to use virt_xxx
  Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
  asm-generic: implement virt_xxx memory barriers
  x86: define __smp_xxx
  xtensa: define __smp_xxx
  tile: define __smp_xxx
  ...
2016-01-18 16:44:24 -08:00
Miroslav Benes
383bf44d1a livepatch: change the error message in asm/livepatch.h header files
If anyone includes asm/livepatch.h when CONFIG_LIVEPATCH=n the build
fails with the existing error message. Change it to something saner.

[jkosina@suse.cz: fixed changelog typo spotted by Josh]
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-01-18 21:35:43 +01:00
Linus Torvalds
0cbeafb245 Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - more MM stuff:

    - Kirill's page-flags rework

    - Kirill's now-allegedly-fixed THP rework

    - MADV_FREE implementation

    - DAX feature work (msync/fsync).  This isn't quite complete but DAX
      is new and it's good enough and the guys have a handle on what
      needs to be done - I expect this to be wrapped in the next week or
      two.

  - some vsprintf maintenance work

  - various other misc bits

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (145 commits)
  printk: change recursion_bug type to bool
  lib/vsprintf: factor out %pN[F] handler as netdev_bits()
  lib/vsprintf: refactor duplicate code to special_hex_number()
  printk-formats.txt: remove unimplemented %pT
  printk: help pr_debug and pr_devel to optimize out arguments
  lib/test_printf.c: test dentry printing
  lib/test_printf.c: add test for large bitmaps
  lib/test_printf.c: account for kvasprintf tests
  lib/test_printf.c: add a few number() tests
  lib/test_printf.c: test precision quirks
  lib/test_printf.c: check for out-of-bound writes
  lib/test_printf.c: don't BUG
  lib/kasprintf.c: add sanity check to kvasprintf
  lib/vsprintf.c: warn about too large precisions and field widths
  lib/vsprintf.c: help gcc make number() smaller
  lib/vsprintf.c: expand field_width to 24 bits
  lib/vsprintf.c: eliminate potential race in string()
  lib/vsprintf.c: move string() below widen_string()
  lib/vsprintf.c: pull out padding code from dentry_name()
  printk: do cond_resched() between lines while outputting to consoles
  ...
2016-01-17 12:58:52 -08:00
Linus Torvalds
a016af2e70 sound updates for 4.5-rc1
We've had quite busy weeks in this cycle.  Looking at ALSA core, the
 significant changes are a few fixes wrt timer and sequencer ioctls
 that have been revealed by fuzzer recently.  Other than that, ASoC
 core got a few updates about DAI link handling, but these are rather
 straightforward refactoring.
 
 In drivers scene, ASoC received quite lots of new drivers in addition
 to bunch of updates for still ongoing Intel Skylake support and
 topology API.  HD-audio gained a new HDMI/DP hotplug notification via
 component.  FireWire got a pile of code refactoring/updates with
 SCS.1x driver integration.
 
 More highlights are shown below.
 
 [NOTE: this contains also many commits for DRM.  This is due to the
  pull of drm stable branch into sound tree, as the base of i915 audio
  component work for HD-audio.  The highlights below don't contain
  these DRM changes, as these are supposed to be pulled via drm tree in
  anyway sooner or later.]
 
 Core
  - Handful fixes to harden ALSA timer and sequencer ioctls against
    races reported by syzkaller fuzzer
  - Irq description string can be unique to each card; only for
    HD-audio for now
 
 ASoC
  - Conversion of the array of DAI links to a list for supporting
    dynamically adding and removing DAI links
  - Topology API enhancements to make everything more component based
    and being able to specify PCM links via topology
  - Some more fixes for the topology code, though it is still not final
    and ready for enabling in production; we really need to get to the
    point where that can be done
  - A pile of changes for Intel SkyLake drivers which hopefully deliver
    some useful initial functionality for systems with this chipset,
    though there is more work still to come
  - Lots of new features and cleanups for the Renesas drivers
  - ANC support for WM5110
  - New drivers: Imagination Technologies IPs, Atmel class D speaker,
    Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and
    RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP
  - Rename PCM1792a driver to be generic pcm179x
 
 HD-Audio
  - Use audio component for i915 HDMI/DP hotplug handling
  - On-demand binding with i915 driver
  - bdl_pos_adj parameter adjustment for Baytrail controllers
  - Enable power_save_node for CX20722; this shouldn't lead to
    regression, hopefully
  - Kabylake HDMI/DP codec support
  - Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell
    machines
  - A few code refactoring
 
 FireWire
  - Lots of code cleanup and refactoring
  - Integrate the support of SCS.1x devices into snd-oxfw driver;
    snd-scs1x driver is obsoleted
 
 USB-audio
  - Fix possible NULL dereference at disconnection
  - A regression fix for Native Instruments devices
 
 Misc
  - A few code cleanups of fm801 driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWmmhNAAoJEGwxgFQ9KSmk/wsP/3eO+giAT9VRPa6qxR6VdT6I
 dZwTxcp4ZzUrgLxk9k5VYjqey6QL+1xWfl3Abrd+NzXDj1wo4KsDh2XCKG1btO9K
 UpIZf76Nzt7o91pzHbsU6mrjDeoVNqloZoGbg1utAmmegaXH3owd18p/ZHfE3sz2
 BbaHmYW/R8lnaBgBhzqJB97+zRaLJmMWpWHfpHaIPjdfw8/V4j76jtPnpmv2hDZl
 BHXVHcQXjVGunFRzxdzBLuTC+FmhzUeTAbbAdOT4fEoOCv5MtZqYppNxdhj+b9l5
 mrsXe5FBTNmrt9Z5TtfCuzgJPkzoDperFb0aKd7wI1jVMtLzkNCMlanHr9U6B6fr
 jSrs6l25xrpF1BBfRMfHjNudA5vng/XC5dtW00JofXSrIxtwPNUoDDiqJgw7xVm5
 aVWK7KkQIjRbHdCQaeTymv70oHHKei92hbCrXUobXZ7wLeJMXNVPT25ttChWrgAI
 7cu5h+K5PjReI/sJFTMPL4aHZ+jAn9quQl7vK8EXiL9E6G8lLiuBiVW6hjGd9At+
 Z6UyGV+nCM6O3qZcyParMuLkNtWx9uT7Pcn8oTZAdKPngNhsf8+yl9qmsFkNLDC4
 LKPx0+rdCjtMKn2du3krsHhG3EN9pLDrE6g5U3d6Cz83e69Y7fCuSjl31SjD91H0
 bZDcM/ejYSbid3yKN4TL
 =Gvgb
 -----END PGP SIGNATURE-----

Merge tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound updates from Takashi Iwai:
 "We've had quite busy weeks in this cycle.  Looking at ALSA core, the
  significant changes are a few fixes wrt timer and sequencer ioctls
  that have been revealed by fuzzer recently.  Other than that, ASoC
  core got a few updates about DAI link handling, but these are rather
  straightforward refactoring.

  In drivers scene, ASoC received quite lots of new drivers in addition
  to bunch of updates for still ongoing Intel Skylake support and
  topology API.  HD-audio gained a new HDMI/DP hotplug notification via
  component.  FireWire got a pile of code refactoring/updates with
  SCS.1x driver integration.

  More highlights are shown below.

  [ NOTE: this contains also many commits for DRM.  This is due to the
    pull of drm stable branch into sound tree, as the base of i915 audio
    component work for HD-audio.  The highlights below don't contain
    these DRM changes, as these are supposed to be pulled via drm tree
    in anyway sooner or later.  ]

  Core:
   - Handful fixes to harden ALSA timer and sequencer ioctls against
     races reported by syzkaller fuzzer
   - Irq description string can be unique to each card; only for
     HD-audio for now

  ASoC:
   - Conversion of the array of DAI links to a list for supporting
     dynamically adding and removing DAI links
   - Topology API enhancements to make everything more component based
     and being able to specify PCM links via topology
   - Some more fixes for the topology code, though it is still not final
     and ready for enabling in production; we really need to get to the
     point where that can be done
   - A pile of changes for Intel SkyLake drivers which hopefully deliver
     some useful initial functionality for systems with this chipset,
     though there is more work still to come
   - Lots of new features and cleanups for the Renesas drivers
   - ANC support for WM5110
   - New drivers: Imagination Technologies IPs, Atmel class D speaker,
     Cirrus CS47L24 and WM1831, Dialog DA7128, Realtek RT5659 and
     RT56156, Rockchip RK3036, TI PC3168A, and AMD ACP
   - Rename PCM1792a driver to be generic pcm179x

  HD-Audio:
   - Use audio component for i915 HDMI/DP hotplug handling
   - On-demand binding with i915 driver
   - bdl_pos_adj parameter adjustment for Baytrail controllers
   - Enable power_save_node for CX20722; this shouldn't lead to
     regression, hopefully
   - Kabylake HDMI/DP codec support
   - Quirks for Lenovo E50-80, Dell Latitude E-series, and other Dell
     machines
   - A few code refactoring

  FireWire:
   - Lots of code cleanup and refactoring
   - Integrate the support of SCS.1x devices into snd-oxfw driver;
     snd-scs1x driver is obsoleted

  USB-audio:
   - Fix possible NULL dereference at disconnection
   - A regression fix for Native Instruments devices

  Misc:
   - A few code cleanups of fm801 driver"

* tag 'sound-4.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (722 commits)
  ALSA: timer: Code cleanup
  ALSA: timer: Harden slave timer list handling
  ALSA: hda - Add fixup for Dell Latitidue E6540
  ALSA: timer: Fix race among timer ioctls
  ALSA: hda - add codec support for Kabylake display audio codec
  ALSA: timer: Fix double unlink of active_list
  ALSA: usb-audio: Fix mixer ctl regression of Native Instrument devices
  ALSA: hda - fix the headset mic detection problem for a Dell laptop
  ALSA: hda - Fix white noise on Dell Latitude E5550
  ALSA: hda_intel: add card number to irq description
  ALSA: seq: Fix race at timer setup and close
  ALSA: seq: Fix missing NULL check at remove_events ioctl
  ALSA: usb-audio: Avoid calling usb_autopm_put_interface() at disconnect
  ASoC: hdac_hdmi: remove unused hdac_hdmi_query_pin_connlist
  ASoC: AMD: Add missing include file
  ALSA: hda - Fixup inverted internal mic for Lenovo E50-80
  ALSA: usb: Add native DSD support for Oppo HA-1
  ASoC: Make aux_dev more like a generic component
  ASoC: bcm2835: cleanup includes by ordering them alphabetically
  ASoC: AMD: Manage ACP 2.x SRAM banks power
  ...
2016-01-17 12:05:31 -08:00
Dan Williams
3565fce3a6 mm, x86: get_user_pages() for dax mappings
A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
struct dev_pagemap instance to keep the result of pfn_to_page() valid
until put_page().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dan Williams
5c7fb56e5e mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd
A dax-huge-page mapping while it uses some thp helpers is ultimately not
a transparent huge page.  The distinction is especially important in the
get_user_pages() path.  pmd_devmap() is used to distinguish dax-pmds
from pmd_huge() and pmd_trans_huge() which have slightly different
semantics.

Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
pmd_devmap() checks.

[kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in  __split_huge_pmd()]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dan Williams
f25748e3c3 mm, dax: convert vmf_insert_pfn_pmd() to pfn_t
Similar to the conversion of vm_insert_mixed() use pfn_t in the
vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
pfn is backed by a devm_memremap_pages() mapping.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dan Williams
01c8f1c44b mm, dax, gpu: convert vm_insert_mixed to pfn_t
Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
evaluating the PFN_MAP and PFN_DEV flags.  When both are set it triggers
_PAGE_DEVMAP to be set in the resulting pte.

There are no functional changes to the gpu drivers as a result of this
conversion.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dan Williams
69660fd797 x86, mm: introduce _PAGE_DEVMAP
_PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the
get_user_pages() path to identify pfns backed by the dynamic allocation
established by devm_memremap_pages.  Upon seeing that bit the gup path
will lookup and pin the allocation while the pages are in use.

Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a
pteval_t to allow pmd_flags() usage in the realmode boot code to build.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Dan Williams
52db400fcd pmem, dax: clean up clear_pmem()
To date, we have implemented two I/O usage models for persistent memory,
PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
to be the target of direct-i/o.  It allows userspace to coordinate
DMA/RDMA from/to persistent memory.

The implementation leverages the ZONE_DEVICE mm-zone that went into
4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
and dynamically mapped by a device driver.  The pmem driver, after
mapping a persistent memory range into the system memmap via
devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
page-backed pmem-pfns via flags in the new pfn_t type.

The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
resulting pte(s) inserted into the process page tables with a new
_PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
off _PAGE_DEVMAP to pin the device hosting the page range active.
Finally, get_page() and put_page() are modified to take references
against the device driver established page mapping.

Finally, this need for "struct page" for persistent memory requires
memory capacity to store the memmap array.  Given the memmap array for a
large pool of persistent may exhaust available DRAM introduce a
mechanism to allocate the memmap from persistent memory.  The new
"struct vmem_altmap *" parameter to devm_memremap_pages() enables
arch_add_memory() to use reserved pmem capacity rather than the page
allocator.

This patch (of 25):

Both __dax_pmd_fault, and clear_pmem() were taking special steps to
clear memory a page at a time to take advantage of non-temporal
clear_page() implementations.  However, x86_64 does not use non-temporal
instructions for clear_page(), and arch_clear_pmem() was always
incurring the cost of __arch_wb_cache_pmem().

Clean up the assumption that doing clear_pmem() a page at a time is more
performant.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Minchan Kim
590a471ce9 arch/x86/include/asm/pgtable.h: add pmd_[dirty|mkclean] for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov
1f19617d77 x86, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Bjorn Helgaas
3a6384ba10 Merge branch 'pci/host-vmd' into next
* pci/host-vmd:
  x86/PCI: Add driver for Intel Volume Management Device (VMD)
  PCI/AER: Use 32 bit PCI domain numbers
  x86/PCI: Allow DMA ops specific to a PCI domain
  irqdomain: Export irq_domain_set_info() for module use
  genirq/MSI: Relax msi_domain_alloc() to support parentless MSI irqdomains
2016-01-15 16:14:39 -06:00
Keith Busch
185a383ada x86/PCI: Add driver for Intel Volume Management Device (VMD)
The Intel Volume Management Device (VMD) is a Root Complex Integrated
Endpoint that acts as a host bridge to a secondary PCIe domain.  BIOS can
reassign one or more Root Ports to appear within a VMD domain instead of
the primary domain.  The immediate benefit is that additional PCIe domains
allow more than 256 buses in a system by letting bus numbers be reused
across different domains.

VMD domains do not define ACPI _SEG, so to avoid domain clashing with host
bridges defining this segment, VMD domains start at 0x10000, which is
greater than the highest possible 16-bit ACPI defined _SEG.

This driver enumerates and enables the domain using the root bus
configuration interface provided by the PCI subsystem.  The driver provides
configuration space accessor functions (pci_ops), bus and memory resources,
an MSI IRQ domain with irq_chip implementation, and DMA operations
necessary to use devices through the VMD endpoint's interface.

VMD routes I/O as follows:

   1) Configuration Space: BAR 0 ("CFGBAR") of VMD provides the base
   address and size for configuration space register access to VMD-owned
   root ports.  It works similarly to MMCONFIG for extended configuration
   space.  Bus numbering is independent and does not conflict with the
   primary domain.

   2) MMIO Space: BARs 2 and 4 ("MEMBAR1" and "MEMBAR2") of VMD provide the
   base address, size, and type for MMIO register access.  These addresses
   are not translated by VMD hardware; they are simply reservations to be
   distributed to root ports' memory base/limit registers and subdivided
   among devices downstream.

   3) DMA: To interact appropriately with an IOMMU, the source ID DMA read
   and write requests are translated to the bus-device-function of the VMD
   endpoint.  Otherwise, DMA operates normally without VMD-specific address
   translation.

   4) Interrupts: Part of VMD's BAR 4 is reserved for VMD's MSI-X Table and
   PBA.  MSIs from VMD domain devices and ports are remapped to appear as
   if they were issued using one of VMD's MSI-X table entries.  Each MSI
   and MSI-X address of VMD-owned devices and ports has a special format
   where the address refers to specific entries in the VMD's MSI-X table.
   As with DMA, the interrupt source ID is translated to VMD's
   bus-device-function.

   The driver provides its own MSI and MSI-X configuration functions
   specific to how MSI messages are used within the VMD domain, and
   provides an irq_chip for independent IRQ allocation to relay interrupts
   from VMD's interrupt handler to the appropriate device driver's handler.

   5) Errors: PCIe error message are intercepted by the root ports normally
   (e.g., AER), except with VMD, system errors (i.e., firmware first) are
   disabled by default.  AER and hotplug interrupts are translated in the
   same way as endpoint interrupts.

   6) VMD does not support INTx interrupts or IO ports.  Devices or drivers
   requiring these features should either not be placed below VMD-owned
   root ports, or VMD should be disabled by BIOS for such endpoints.

[bhelgaas: add VMD BAR #defines, factor out vmd_cfg_addr(), rework VMD
resource setup, whitespace, changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de> (IRQ-related parts)
2016-01-15 13:54:55 -06:00
Keith Busch
d9c3d6ff22 x86/PCI: Allow DMA ops specific to a PCI domain
The Intel Volume Management Device (VMD) is a PCIe endpoint that acts as a
host bridge to another PCI domain.  When devices below the VMD perform DMA,
the VMD replaces their DMA source IDs with its own source ID.  Therefore,
those devices require special DMA ops.

Add interfaces to allow the VMD driver to set up dma_ops for the devices
below it.

[bhelgaas: remove "extern", add "static", changelog]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-01-15 13:54:55 -06:00
Thomas Gleixner
90a2282e23 x86/irq: Call irq_force_move_complete with irq descriptor
First of all there is no point in looking up the irq descriptor again, but we
also need the descriptor for the final cleanup race fix in the next
patch. Make that change seperate. No functional difference.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Tested-by: Joe Lawrence <joe.lawrence@stratus.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jeremiah Mahler <jmmahler@gmail.com>
Cc: andy.shevchenko@gmail.com
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: stable@vger.kernel.org #4.3+
Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-15 13:44:01 +01:00
Linus Torvalds
10a0c0f059 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc changes:
   - fix lguest bug
   - fix /proc/meminfo output on certain configs
   - fix pvclock bug
   - fix reboot on certain iMacs by adding new reboot quirk
   - fix bootup crash
   - fix FPU boot line option parsing
   - add more x86 self-tests
   - small cleanups, documentation improvements, etc"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove an unneeded condition in srat_detect_node()
  x86/vdso/pvclock: Protect STABLE check with the seqcount
  x86/mm: Improve switch_mm() barrier comments
  selftests/x86: Test __kernel_sigreturn and __kernel_rt_sigreturn
  x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]
  lguest: Map switcher text R/O
  x86/boot: Hide local labels in verify_cpu()
  x86/fpu: Disable AVX when eagerfpu is off
  x86/fpu: Disable MPX when eagerfpu is off
  x86/fpu: Disable XGETBV1 when no XSAVE
  x86/fpu: Fix early FPU command-line parsing
  x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNED
  selftests/x86: Disable the ldt_gdt_64 test for now
  x86/mm/pat: Make split_page_count() check for empty levels to fix /proc/meminfo output
  x86/boot: Double BOOT_HEAP_SIZE to 64KB
  x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
2016-01-14 11:57:22 -08:00
Andy Lutomirski
4eaffdd5a5 x86/mm: Improve switch_mm() barrier comments
My previous comments were still a bit confusing and there was a
typo. Fix it up.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 71b3c126e6 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization")
Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-13 10:42:49 +01:00
Linus Torvalds
67990608c8 Power management and ACPI updates for v4.5-rc1
- Add a debugfs-based interface for interacting with the ACPICA's
    AML debugger introduced in the previous cycle and a new user
    space tool for that, fix some bugs related to the AML debugger
    and clean up the code in question (Lv Zheng, Dan Carpenter,
    Colin Ian King, Markus Elfring).
 
  - Update ACPICA to upstream revision 20151218 including a number
    of fixes and cleanups in the ACPICA core (Bob Moore, Lv Zheng,
    Labbe Corentin, Prarit Bhargava, Colin Ian King, David E Box,
    Rafael Wysocki).
 
    In particular, the previously added erroneous support for the
    _SUB object is dropped, the concatenate operator will support
    all ACPI objects now, the Debug Object handling is improved,
    the SuperName handling of parameters being control methods is
    fixed, the ObjectType operator handling is updated to follow
    ACPI 5.0A and the handling of CondRefOf and RefOf is updated
    accordingly, module-level code will be executed after loading
    each ACPI table now (instead of being run once after all tables
    containing AML have been loaded), the Operation Region handlers
    management is updated to fix some reported problems and a the
    ACPICA code in the kernel is more in line with the upstream
    now.
 
  - Update the ACPI backlight driver to provide information on
    whether or not it will generate key-presses for brightness
    change hotkeys and update some platform drivers (dell-wmi,
    thinkpad_acpi) to use that information to avoid sending double
    key-events to users pace for these, add new ACPI backlight
    quirks (Hans de Goede, Aaron Lu, Adrien Schildknecht).
 
  - Improve the ACPI handling of interrupt GPIOs (Christophe Ricard).
 
  - Fix the handling of the list of device IDs of device objects
    found in the ACPI namespace and add a helper for checking if
    there is a device object for a given device ID (Lukas Wunner).
 
  - Change the logic in the ACPI namespace scanning code to create
    struct acpi_device objects for all ACPI device objects found in
    the namespace even if _STA fails for them which helps to avoid
    device enumeration problems on Microsoft Surface 3 (Aaron Lu).
 
  - Add support for the APM X-Gene ACPI I2C device to the ACPI
    driver for AMD SoCs (Loc Ho).
 
  - Fix the long-standing issue with the DMA controller on Intel
    SoCs where ACPI tables have no power management support for
    the DMA controller itself, but it can be powered off automatically
    when the last (other) device on the SoC is powered off via ACPI
    and clean up the ACPI driver for Intel SoCs (acpi-lpss) after
    previous attempts to fix that problem (Andy Shevchenko).
 
  - Assorted ACPI fixes and cleanups (Andy Lutomirski, Colin Ian King,
    Javier Martinez Canillas, Ken Xue, Mathias Krause, Rafael Wysocki,
    Sinan Kaya).
 
  - Update the device properties framework for better handling of
    built-in properties, add support for built-in properties to
    the platform bus type, update the MFD subsystem's handling
    of device properties and add support for passing default
    configuration data as device properties to the intel-lpss MFD
    drivers, convert the designware I2C driver to use the unified
    device properties API and add a fallback mechanism for using
    default built-in properties if the platform firmware fails
    to provide the properties as expected by drivers (Andy Shevchenko,
    Mika Westerberg, Heikki Krogerus, Andrew Morton).
 
  - Add new Device Tree bindings to the Operating Performance Points
    (OPP) framework and update the exynos4412 DT binding accordingly,
    introduce debugfs support for the OPP framework (Viresh Kumar,
    Bartlomiej Zolnierkiewicz).
 
  - Migrate the mt8173 cpufreq driver to the new OPP bindings
    (Pi-Cheng Chen).
 
  - Update the cpufreq core to make the handling of governors
    more efficient, especially on systems where policy objects
    are shared between multiple CPUs (Viresh Kumar, Rafael Wysocki).
 
  - Fix cpufreq governor handling on configurations with
    CONFIG_HZ_PERIODIC set (Chen Yu).
 
  - Clean up the cpufreq core code related to the boost sysfs knob
    support and update the ACPI cpufreq driver accordingly (Rafael
    Wysocki).
 
  - Add a new cpufreq driver for ST platforms and corresponding
    Device Tree bindings (Lee Jones).
 
  - Update the intel_pstate driver to allow the P-state selection
    algorithm used by it to depend on the CPU ID of the processor it
    is running on, make it use a special P-state selection algorithm
    (with an IO wait time compensation tweak) on Atom CPUs based on
    the Airmont and Silvermont cores so as to reduce their energy
    consumption and improve intel_pstate documentation (Philippe
    Longepe, Srinivas Pandruvada).
 
  - Update the cpufreq-dt driver to support registering cooling
    devices that use the (P * V^2 * f) dynamic power draw formula
    where V is the voltage, f is the frequency and P is a constant
    coefficient provided by Device Tree and update the arm_big_little
    cpufreq driver to use that support (Punit Agrawal).
 
  - Assorted cpufreq driver (cpufreq-dt, qoriq, pcc-cpufreq,
    blackfin-cpufreq) updates (Andrzej Hajda, Hongtao Jia,
    Jacob Tanenbaum, Markus Elfring).
 
  - cpuidle core tweaks related to polling and measured_us
    calculation (Rik van Riel).
 
  - Removal of modularity from a few cpuidle drivers (clps711x,
    ux500, exynos) that cannot be built as modules in practice
    (Paul Gortmaker).
 
  - PM core update to prevent devices from being probed during
    system suspend/resume which is generally problematic and may
    lead to inconsistent behavior (Grygorii Strashko).
 
  - Assorted updates of the PM core and related code (Julia Lawall,
    Manuel Pégourié-Gonnard, Maruthi Bayyavarapu, Rafael Wysocki,
    Ulf Hansson).
 
  - PNP bus type updates (Christophe Le Roy, Heiner Kallweit).
 
  - PCI PM code cleanups (Jarkko Nikula, Julia Lawall).
 
  - cpupower tool updates (Jacob Tanenbaum, Thomas Renninger).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWlZOmAAoJEILEb/54YlRxxtEP/ioR0xMOJQcWd5F6Oyj1PZsx
 vJeXsmL3fXFAlr6riaE966QqclhUTDhhex3kbFmNQvM8WukxOmBWy5UMSjRg2UmM
 PHrogc/KrrE+xb8hjGZPgqVr+/L9O3C6lZmM+AUciT0hWZJckYgRh5TpHb1xN/Kx
 MptvtSXRBM62LWytug+EwA4SHt7OFS0yJ/CI1pKvODVtLaYDIPI5k+4ilPU7y6Be
 vfoysvmUozNTEYxgPOPXfoQqW2P5t2df32Re31uKtLenLXbc8KW0wIYm24DXgSK6
 V/TyDVZTNaZk6OpTqWrjqFbedpGvcBpViwYEY7yv33GDCpXGdHQl3ga+Jy6PAUem
 7oGDZtA+5Di/8szhH/wSdpXwSaKEeUdFiaj6Uw2MAwiY4wzv5+WmLRcuIjQFDAxT
 elrTbQhAgaMlMsUkQ9NV4GC7ByUeeQX2NpCielsHngOQgKdYRQHyYUgGXc2Wgjdq
 UnVrIWRHzXSED0RtPI7IT0Y4PSxkM9UoSEiVUwt3srCue2CFzuENs23qaDgAzeDa
 5uwnDl4RhI2BrLVT1WhioIFgFE5Yh5Xx6dSGC+jcU2ss8r2oN6DdUbqOzWAa1iR4
 sFhgwwwizpCCfB6pSqEuDdg8W56HjvE9kQY9kcTPPNPbktL0VImC+iiSN/CgZJv9
 MH9NbQM8uHkfNcpjsN7V
 =OlYA
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.5-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull oower management and ACPI updates from Rafael Wysocki:
 "As far as the number of commits goes, ACPICA takes the lead this time,
  followed by cpufreq and the device properties framework changes.

  The most significant new feature is the debugfs-based interface to the
  ACPICA's AML debugger added in the previous cycle and a new user space
  tool for accessing it.

  On the cpufreq front, the core is updated to handle governors more
  efficiently, particularly on systems where a single cpufreq policy
  object is shared between multiple CPUs, and there are quite a few
  changes in drivers (intel_pstate, cpufreq-dt etc).

  The device properties framework is updated to handle built-in (ie
  included in the kernel itself) device properties better, among other
  things by adding a fallback mechanism that will allow drivers to
  provide default properties to be used in case the plaform firmware
  doesn't provide the properties expected by them.

  The Operating Performance Points (OPP) framework gets new DT bindings
  and debugfs support.

  A new cpufreq driver for ST platforms is added and the ACPI driver for
  AMD SoCs will now support the APM X-Gene ACPI I2C device.

  The rest is mostly fixes and cleanups all over.

  Specifics:

   - Add a debugfs-based interface for interacting with the ACPICA's AML
     debugger introduced in the previous cycle and a new user space tool
     for that, fix some bugs related to the AML debugger and clean up
     the code in question (Lv Zheng, Dan Carpenter, Colin Ian King,
     Markus Elfring).

   - Update ACPICA to upstream revision 20151218 including a number of
     fixes and cleanups in the ACPICA core (Bob Moore, Lv Zheng, Labbe
     Corentin, Prarit Bhargava, Colin Ian King, David E Box, Rafael
     Wysocki).

     In particular, the previously added erroneous support for the _SUB
     object is dropped, the concatenate operator will support all ACPI
     objects now, the Debug Object handling is improved, the SuperName
     handling of parameters being control methods is fixed, the
     ObjectType operator handling is updated to follow ACPI 5.0A and the
     handling of CondRefOf and RefOf is updated accordingly, module-
     level code will be executed after loading each ACPI table now
     (instead of being run once after all tables containing AML have
     been loaded), the Operation Region handlers management is updated
     to fix some reported problems and a the ACPICA code in the kernel
     is more in line with the upstream now.

   - Update the ACPI backlight driver to provide information on whether
     or not it will generate key-presses for brightness change hotkeys
     and update some platform drivers (dell-wmi, thinkpad_acpi) to use
     that information to avoid sending double key-events to users pace
     for these, add new ACPI backlight quirks (Hans de Goede, Aaron Lu,
     Adrien Schildknecht).

   - Improve the ACPI handling of interrupt GPIOs (Christophe Ricard).

   - Fix the handling of the list of device IDs of device objects found
     in the ACPI namespace and add a helper for checking if there is a
     device object for a given device ID (Lukas Wunner).

   - Change the logic in the ACPI namespace scanning code to create
     struct acpi_device objects for all ACPI device objects found in the
     namespace even if _STA fails for them which helps to avoid device
     enumeration problems on Microsoft Surface 3 (Aaron Lu).

   - Add support for the APM X-Gene ACPI I2C device to the ACPI driver
     for AMD SoCs (Loc Ho).

   - Fix the long-standing issue with the DMA controller on Intel SoCs
     where ACPI tables have no power management support for the DMA
     controller itself, but it can be powered off automatically when the
     last (other) device on the SoC is powered off via ACPI and clean up
     the ACPI driver for Intel SoCs (acpi-lpss) after previous attempts
     to fix that problem (Andy Shevchenko).

   - Assorted ACPI fixes and cleanups (Andy Lutomirski, Colin Ian King,
     Javier Martinez Canillas, Ken Xue, Mathias Krause, Rafael Wysocki,
     Sinan Kaya).

   - Update the device properties framework for better handling of
     built-in properties, add support for built-in properties to the
     platform bus type, update the MFD subsystem's handling of device
     properties and add support for passing default configuration data
     as device properties to the intel-lpss MFD drivers, convert the
     designware I2C driver to use the unified device properties API and
     add a fallback mechanism for using default built-in properties if
     the platform firmware fails to provide the properties as expected
     by drivers (Andy Shevchenko, Mika Westerberg, Heikki Krogerus,
     Andrew Morton).

   - Add new Device Tree bindings to the Operating Performance Points
     (OPP) framework and update the exynos4412 DT binding accordingly,
     introduce debugfs support for the OPP framework (Viresh Kumar,
     Bartlomiej Zolnierkiewicz).

   - Migrate the mt8173 cpufreq driver to the new OPP bindings (Pi-Cheng
     Chen).

   - Update the cpufreq core to make the handling of governors more
     efficient, especially on systems where policy objects are shared
     between multiple CPUs (Viresh Kumar, Rafael Wysocki).

   - Fix cpufreq governor handling on configurations with
     CONFIG_HZ_PERIODIC set (Chen Yu).

   - Clean up the cpufreq core code related to the boost sysfs knob
     support and update the ACPI cpufreq driver accordingly (Rafael
     Wysocki).

   - Add a new cpufreq driver for ST platforms and corresponding Device
     Tree bindings (Lee Jones).

   - Update the intel_pstate driver to allow the P-state selection
     algorithm used by it to depend on the CPU ID of the processor it is
     running on, make it use a special P-state selection algorithm (with
     an IO wait time compensation tweak) on Atom CPUs based on the
     Airmont and Silvermont cores so as to reduce their energy
     consumption and improve intel_pstate documentation (Philippe
     Longepe, Srinivas Pandruvada).

   - Update the cpufreq-dt driver to support registering cooling devices
     that use the (P * V^2 * f) dynamic power draw formula where V is
     the voltage, f is the frequency and P is a constant coefficient
     provided by Device Tree and update the arm_big_little cpufreq
     driver to use that support (Punit Agrawal).

   - Assorted cpufreq driver (cpufreq-dt, qoriq, pcc-cpufreq,
     blackfin-cpufreq) updates (Andrzej Hajda, Hongtao Jia, Jacob
     Tanenbaum, Markus Elfring).

   - cpuidle core tweaks related to polling and measured_us calculation
     (Rik van Riel).

   - Removal of modularity from a few cpuidle drivers (clps711x, ux500,
     exynos) that cannot be built as modules in practice (Paul
     Gortmaker).

   - PM core update to prevent devices from being probed during system
     suspend/resume which is generally problematic and may lead to
     inconsistent behavior (Grygorii Strashko).

   - Assorted updates of the PM core and related code (Julia Lawall,
     Manuel Pégourié-Gonnard, Maruthi Bayyavarapu, Rafael Wysocki, Ulf
     Hansson).

   - PNP bus type updates (Christophe Le Roy, Heiner Kallweit).

   - PCI PM code cleanups (Jarkko Nikula, Julia Lawall).

   - cpupower tool updates (Jacob Tanenbaum, Thomas Renninger)"

* tag 'pm+acpi-4.5-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (177 commits)
  PM / clk: don't leave clocks enabled when driver not bound
  i2c: dw: Add APM X-Gene ACPI I2C device support
  ACPI / APD: Add APM X-Gene ACPI I2C device support
  ACPI / LPSS: change 'does not have' to 'has' in comment
  Revert "dmaengine: dw: platform: provide platform data for Intel"
  dmaengine: dw: return immediately from IRQ when DMA isn't in use
  dmaengine: dw: platform: power on device on shutdown
  ACPI / LPSS: override power state for LPSS DMA device
  PM / OPP: Use snprintf() instead of sprintf()
  Documentation: cpufreq: intel_pstate: enhance documentation
  ACPI, PCI, irq: remove redundant check for null string pointer
  ACPI / video: driver must be registered before checking for keypresses
  cpufreq-dt: fix handling regulator_get_voltage() result
  cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC
  PM / sleep: Add support for read-only sysfs attributes
  ACPI: Fix white space in a structure definition
  ACPI / SBS: fix inconsistent indenting inside if statement
  PNP: respect PNP_DRIVER_RES_DO_NOT_CHANGE when detaching
  ACPI / PNP: constify device IDs
  ACPI / PCI: Simplify acpi_penalize_isa_irq()
  ...
2016-01-12 20:25:09 -08:00
Linus Torvalds
1baa5efbeb * s390: Support for runtime instrumentation within guests,
support of 248 VCPUs.
 
 * ARM: rewrite of the arm64 world switch in C, support for
 16-bit VM identifiers.  Performance counter virtualization
 missed the boat.
 
 * x86: Support for more Hyper-V features (synthetic interrupt
 controller), MMU cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu
 kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT
 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD
 /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY
 YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i
 cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ=
 =PIj1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC changes will come next week.

   - s390: Support for runtime instrumentation within guests, support of
     248 VCPUs.

   - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM
     identifiers.  Performance counter virtualization missed the boat.

   - x86: Support for more Hyper-V features (synthetic interrupt
     controller), MMU cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits)
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  kvm/x86: Hyper-V SynIC timers tracepoints
  kvm/x86: Hyper-V SynIC tracepoints
  kvm/x86: Update SynIC timers on guest entry only
  kvm/x86: Skip SynIC vector check for QEMU side
  kvm/x86: Hyper-V fix SynIC timer disabling condition
  kvm/x86: Reorg stimer_expiration() to better control timer restart
  kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
  kvm/x86: Drop stimer_stop() function
  kvm/x86: Hyper-V timers fix incorrect logical operation
  KVM: move architecture-dependent requests to arch/
  KVM: renumber vcpu->request bits
  KVM: document which architecture uses each request bit
  KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests
  kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock
  KVM: s390: implement the RI support of guest
  kvm/s390: drop unpaired smp_mb
  kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa
  KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
  arm/arm64: KVM: Detect vGIC presence at runtime
  ...
2016-01-12 13:22:12 -08:00
Linus Torvalds
c9bed1cf51 xen: features and fixes for 4.5-rc0
- Stolen ticks and PV wallclock support for arm/arm64.
 - Add grant copy ioctl to gntdev device.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWk5IUAAoJEFxbo/MsZsTRLxwH/1BDcrbQDRc5hxUOG9JEYSUt
 H/lMjvZRShPkzweijdNon95ywAXhcSbkS9IV2Mp0+CZV7VyeymW7QIW/g4+G6iRg
 +LnoV77PAhPv/cmsr1pENXqRCclvemlxQOf7UyWLezuKhB71LC+oNaEnpk/tPIZS
 et/qef+m/SgSP5R91nO0Esv2KfP7za0UrgJf3Ee4GzjSeDkya0Hko06Cy3yc1/RT
 082kHpQ1/KFcHHh2qhdCQwyzhq/cwFkuDA6ksKYJoxC6YAVC2mvvkuIOZYbloHDL
 c/dzuP9qjjxOZ7Gblv2cmg+RE4UqRfBhxmMycxSCcwW/Mt5LaftCpAxpBQKq2/8=
 =6F/q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Xen features and fixes for 4.5-rc0:

   - Stolen ticks and PV wallclock support for arm/arm64

   - Add grant copy ioctl to gntdev device"

* tag 'for-linus-4.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/gntdev: add ioctl for grant copy
  x86/xen: don't reset vcpu_info on a cancelled suspend
  xen/gntdev: constify mmu_notifier_ops structures
  xen/grant-table: constify gnttab_ops structure
  xen/time: use READ_ONCE
  xen/x86: convert remaining timespec to timespec64 in xen_pvclock_gtod_notify
  xen/x86: support XENPF_settime64
  xen/arm: set the system time in Xen via the XENPF_settime64 hypercall
  xen/arm: introduce xen_read_wallclock
  arm: extend pvclock_wall_clock with sec_hi
  xen: introduce XENPF_settime64
  xen/arm: introduce HYPERVISOR_platform_op on arm and arm64
  xen: rename dom0_op to platform_op
  xen/arm: account for stolen ticks
  arm64: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops
  arm: introduce CONFIG_PARAVIRT, PARAVIRT_TIME_ACCOUNTING and pv_time_ops
  missing include asm/paravirt.h in cputime.c
  xen: move xen_setup_runstate_info and get_runstate_snapshot to drivers/xen/time.c
2016-01-12 13:05:36 -08:00
Michael S. Tsirkin
1638fb7207 x86: define __smp_xxx
This defines __smp_xxx barriers for x86,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-12 20:46:59 +02:00
Michael S. Tsirkin
300b06d455 x86: reuse asm-generic/barrier.h
As on most architectures, on x86 read_barrier_depends and
smp_read_barrier_depends are empty.  Drop the local definitions and pull
the generic ones from asm-generic/barrier.h instead: they are identical.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-12 20:46:52 +02:00
Rusty Russell
e27d90e8be lguest: Map switcher text R/O
Pavel noted that lguest maps the switcher code executable and
read-write.  This is a bad idea for any kernel text, but
particularly for text mapped at a fixed address.

Create two vmas, one for the text (PAGE_KERNEL_RX) and another
for the stacks (PAGE_KERNEL).  Use VM_NO_GUARD to map them
adjacent (as expected by the rest of the code).

Reported-by: Pavel Machek <pavel@ucw.cz>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 12:17:28 +01:00
Andy Lutomirski
bd902c5362 x86/vdso: Disallow vvar access to vclock IO for never-used vclocks
It makes me uncomfortable that even modern systems grant every
process direct read access to the HPET.

While fixing this for real without regressing anything is a mess
(unmapping the HPET is tricky because we don't adequately track
all the mappings), we can do almost as well by tracking which
vclocks have ever been used and only allowing pages associated
with used vclocks to be faulted in.

This will cause rogue programs that try to peek at the HPET to
get SIGBUS instead on most systems.

We can't restrict faults to vclock pages that are associated
with the currently selected vclock due to a race: a process
could start to access the HPET for the first time and race
against a switch away from the HPET as the current clocksource.
We can't segfault the process trying to peek at the HPET in this
case, even though the process isn't going to do anything useful
with the data.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e79d06295625c02512277737ab55085a498ac5d8.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:35 +01:00
Andy Lutomirski
05ef76b20f x86/vdso: Use .fault for the vDSO text mapping
The old scheme for mapping the vDSO text is rather complicated.
vdso2c generates a struct vm_special_mapping and a blank .pages
array of the correct size for each vdso image.  Init code in
vdso/vma.c populates the .pages array for each vDSO image, and
the mapping code selects the appropriate struct
vm_special_mapping.

With .fault, we can use a less roundabout approach: vdso_fault()
just returns the appropriate page for the selected vDSO image.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f886954c186bafd74e1b967c8931d852ae199aa2.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:34 +01:00
Andy Lutomirski
352b78c62f x86/vdso: Track each mm's loaded vDSO image as well as its base
As we start to do more intelligent things with the vDSO at
runtime (as opposed to just at mm initialization time), we'll
need to know which vDSO is in use.

In principle, we could guess based on the mm type, but that's
over-complicated and error-prone.  Instead, just track it in the
mmu context.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/c99ac48681bad709ca7ad5ee899d9042a3af6b00.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:59:34 +01:00
yu-cheng yu
394db20ca2 x86/fpu: Disable AVX when eagerfpu is off
When "eagerfpu=off" is given as a command-line input, the kernel
should disable AVX support.

The Task Switched bit used for lazy context switching does not
support AVX. If AVX is enabled without eagerfpu context
switching, one task's AVX state could become corrupted or leak
to other tasks. This is a bug and has bad security implications.

This only affects systems that have AVX/AVX2/AVX512 and this
issue will be found only when one actually uses AVX/AVX2/AVX512
_AND_ does eagerfpu=off.

Reference: Intel Software Developer's Manual Vol. 3A

Sec. 2.5 Control Registers:
TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the
x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch
to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4
instruction is actually executed by the new task.

Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87
FPU and SSE State
When the TS flag is set, the processor monitors the instruction
stream for x87 FPU, MMX, SSE instructions. When the processor
detects one of these instructions, it raises a
device-not-available exeception (#NM) prior to executing the
instruction.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:21 +01:00
yu-cheng yu
a5fe93a549 x86/fpu: Disable MPX when eagerfpu is off
This issue is a fallout from the command-line parsing move.

When "eagerfpu=off" is given as a command-line input, the kernel
should disable MPX support. The decision for turning off MPX was
made in fpu__init_system_ctx_switch(), which is after the
selection of the XSAVE format. This patch fixes it by getting
that decision done earlier in fpu__init_system_xstate().

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: yu-cheng yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:51:21 +01:00
Ingo Molnar
c0c57019a6 Merge commit 'linus' into x86/urgent, to pick up recent x86 changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-12 11:08:13 +01:00
Linus Torvalds
ae8a52185e Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
 "Two changes:

   - one to quirk-save/restore certain system MSRs across
     suspend/resume, to make certain Intel systems work better
     (Chen Yu)

   - and also to constify a read only structure (Julia Lawall)"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform/calgary: Constify cal_chipset_ops structures
  x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
2016-01-11 17:45:32 -08:00
Linus Torvalds
6896d9f7e7 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Ingo Molnar:
 "This cleans up the FPU fault handling methods to be more robust, and
  moves eligible variables to .init.data"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Put a few variables in .init.data
  x86/fpu: Get rid of xstate_fault()
  x86/fpu: Add an XSTATE_OP() macro
2016-01-11 16:56:38 -08:00
Linus Torvalds
671d5532aa Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Improved CPU ID handling code and related enhancements (Borislav
     Petkov)

   - RDRAND fix (Len Brown)"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Replace RDRAND forced-reseed with simple sanity check
  x86/MSR: Chop off lower 32-bit value
  x86/cpu: Fix MSR value truncation issue
  x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
  kvm: Add accessors for guest CPU's family, model, stepping
  x86/cpu: Unify CPU family, model, stepping calculation
2016-01-11 16:46:20 -08:00
Linus Torvalds
67c707e451 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
 "The main changes in this cycle were:

   - code patching and cpu_has cleanups (Borislav Petkov)

   - paravirt cleanups (Juergen Gross)

   - TSC cleanup (Thomas Gleixner)

   - ptrace cleanup (Chen Gang)"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arch/x86/kernel/ptrace.c: Remove unused arg_offs_table
  x86/mm: Align macro defines
  x86/cpu: Provide a config option to disable static_cpu_has
  x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
  x86/cpufeature: Cleanup get_cpu_cap()
  x86/cpufeature: Move some of the scattered feature bits to x86_capability
  x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
  x86/paravirt: Remove unused pv_apic_ops structure
  x86/tsc: Remove unused tsc_pre_init() hook
  x86: Remove unused function cpu_has_ht_siblings()
  x86/paravirt: Kill some unused patching functions
2016-01-11 16:26:03 -08:00
Rafael J. Wysocki
1e3f28a552 Merge branch 'acpi-soc'
* acpi-soc:
  PM / clk: don't leave clocks enabled when driver not bound
  i2c: dw: Add APM X-Gene ACPI I2C device support
  ACPI / APD: Add APM X-Gene ACPI I2C device support
  ACPI / LPSS: change 'does not have' to 'has' in comment
  Revert "dmaengine: dw: platform: provide platform data for Intel"
  dmaengine: dw: return immediately from IRQ when DMA isn't in use
  dmaengine: dw: platform: power on device on shutdown
  ACPI / LPSS: override power state for LPSS DMA device
  ACPI / LPSS: power on when probe() and otherwise when remove()
  ACPI / LPSS: do delay for all LPSS devices when D3->D0
  ACPI / LPSS: allow to use specific PM domain during ->probe()
  Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
  device core: add BUS_NOTIFY_DRIVER_NOT_BOUND notification
  x86/platform/iosf_mbi: Remove duplicate definitions

Conflicts:
	drivers/i2c/busses/i2c-designware-platdrv.c
2016-01-12 01:08:47 +01:00
Linus Torvalds
88cbfd0711 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - vDSO and asm entry improvements (Andy Lutomirski)

   - Xen paravirt entry enhancements (Boris Ostrovsky)

   - asm entry labels enhancement (Borislav Petkov)

   - and other misc changes (Thomas Gleixner, me)"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
  Revert "x86/kvm: On KVM re-enable (e.g. after suspend), update clocks"
  x86/entry/64_compat: Make labels local
  x86/platform/uv: Include clocksource.h for clocksource_touch_watchdog()
  x86/vdso: Enable vdso pvclock access on all vdso variants
  x86/vdso: Remove pvclock fixmap machinery
  x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
  x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
  x86/kvm: On KVM re-enable (e.g. after suspend), update clocks
  x86/entry/64: Bypass enter_from_user_mode on non-context-tracking boots
  x86/asm: Add asm macros for static keys/jump labels
  x86/asm: Error out if asm/jump_label.h is included inappropriately
  context_tracking: Switch to new static_branch API
  x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
  x86/paravirt: Remove the unused irq_enable_sysexit pv op
  x86/xen: Avoid fast syscall path for Xen PV guests
2016-01-11 15:58:16 -08:00
Linus Torvalds
4f19b8803b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Ingo Molnar:
 "The main changes in this cycle were:

   - introduce optimized single IPI sending methods on modern APICs
     (Linus Torvalds, Thomas Gleixner)

   - kexec/crash APIC handling fixes and enhancements (Hidehiro Kawai)

   - extend lapic vector saving/restoring to the CMCI (MCE) vector as
     well (Juergen Gross)

   - various fixes and enhancements (Jake Oshins, Len Brown)"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/irq: Export functions to allow MSI domains in modules
  Documentation: Document kernel.panic_on_io_nmi sysctl
  x86/nmi: Save regs in crash dump on external NMI
  x86/apic: Introduce apic_extnmi command line parameter
  kexec: Fix race between panic() and crash_kexec()
  panic, x86: Allow CPUs to save registers even if looping in NMI context
  panic, x86: Fix re-entrance problem due to panic on NMI
  x86/apic: Fix the saving and restoring of lapic vectors during suspend/resume
  x86/smpboot: Re-enable init_udelay=0 by default on modern CPUs
  x86/smp: Remove single IPI wrapper
  x86/apic: Use default send single IPI wrapper
  x86/apic: Provide default send single IPI wrapper
  x86/apic: Implement single IPI for apic_noop
  x86/apic: Wire up single IPI for apic_numachip
  x86/apic: Wire up single IPI for x2apic_uv
  x86/apic: Implement single IPI for x2apic_phys
  x86/apic: Wire up single IPI for bigsmp_apic
  x86/apic: Remove pointless indirections from bigsmp_apic
  x86/apic: Wire up single IPI for apic_physflat
  x86/apic: Remove pointless indirections from apic_physflat
  ...
2016-01-11 15:37:06 -08:00
Linus Torvalds
5cb52b5e16 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - Intel Knights Landing support.  (Harish Chegondi)

   - Intel Broadwell-EP uncore PMU support.  (Kan Liang)

   - Core code improvements.  (Peter Zijlstra.)

   - Event filter, LBR and PEBS fixes.  (Stephane Eranian)

   - Enable cycles:pp on Intel Atom.  (Stephane Eranian)

   - Add cycles:ppp support for Skylake.  (Andi Kleen)

   - Various x86 NMI overhead optimizations.  (Andi Kleen)

   - Intel PT enhancements.  (Takao Indoh)

   - AMD cache events fix.  (Vince Weaver)

  Tons of tooling changes:

   - Show random perf tool tips in the 'perf report' bottom line
     (Namhyung Kim)

   - perf report now defaults to --group if the perf.data file has
     grouped events, try it with:

      # perf record -e '{cycles,instructions}' -a sleep 1
      [ perf record: Woken up 1 times to write data ]
      [ perf record: Captured and wrote 1.093 MB perf.data (1247 samples) ]
      # perf report
      # Samples: 1K of event 'anon group { cycles, instructions }'
      # Event count (approx.): 1955219195
      #
      #       Overhead  Command     Shared Object      Symbol

         2.86%   0.22%  swapper     [kernel.kallsyms]  [k] intel_idle
         1.05%   0.33%  firefox     libxul.so          [.] js::SetObjectElement
         1.05%   0.00%  kworker/0:3 [kernel.kallsyms]  [k] gen6_ring_get_seqno
         0.88%   0.17%  chrome      chrome             [.] 0x0000000000ee27ab
         0.65%   0.86%  firefox     libxul.so          [.] js::ValueToId<(js::AllowGC)1>
         0.64%   0.23%  JS Helper   libxul.so          [.] js::SplayTree<js::jit::LiveRange*, js::jit::LiveRange>::splay
         0.62%   1.27%  firefox     libxul.so          [.] js::GetIterator
         0.61%   1.74%  firefox     libxul.so          [.] js::NativeSetProperty
         0.61%   0.31%  firefox     libxul.so          [.] js::SetPropertyByDefining

   - Introduce the 'perf stat record/report' workflow:

     Generate perf.data files from 'perf stat', to tap into the
     scripting capabilities perf has instead of defining a 'perf stat'
     specific scripting support to calculate event ratios, etc.

     Simple example:

        $ perf stat record -e cycles usleep 1

         Performance counter stats for 'usleep 1':

               1,134,996      cycles

             0.000670644 seconds time elapsed

        $ perf stat report

         Performance counter stats for '/home/acme/bin/perf stat record -e cycles usleep 1':

               1,134,996      cycles

             0.000670644 seconds time elapsed

        $

     It generates PERF_RECORD_ userspace records to store the details:

        $ perf report -D | grep PERF_RECORD
        0xf0 [0x28]: PERF_RECORD_THREAD_MAP nr: 1 thread: 27637
        0x118 [0x12]: PERF_RECORD_CPU_MAP nr: 1 cpu: 65535
        0x12a [0x40]: PERF_RECORD_STAT_CONFIG
        0x16a [0x30]: PERF_RECORD_STAT
        -1 -1 0x19a [0x40]: PERF_RECORD_MMAP -1/0: [0xffffffff81000000(0x1f000000) @ 0xffffffff81000000]: x [kernel.kallsyms]_text
        0x1da [0x18]: PERF_RECORD_STAT_ROUND
        [acme@ssdandy linux]$

     An effort was made to make perf.data files generated like this to
     not generate cryptic messages when processed by older tools.

     The 'perf script' bits need rebasing, will go up later.

   - Make command line options always available, even when they depend
     on some feature being enabled, warning the user about use of such
     options (Wang Nan)

   - Support hw breakpoint events (mem:0xAddress) in the default output
     mode in 'perf script' (Wang Nan)

   - Fixes and improvements for supporting annotating ARM binaries,
     support ARM call and jump instructions, more work needed to have
     arch specific stuff separated into tools/perf/arch/*/annotate/
     (Russell King)

   - Add initial 'perf config' command, for now just with a --list
     command to the contents of the configuration file in use and a
     basic man page describing its format, commands for doing edits and
     detailed documentation are being reviewed and proof-read.  (Taeung
     Song)

   - Allows BPF scriptlets specify arguments to be fetched using DWARF
     info, using a prologue generated at compile/build time (He Kuang,
     Wang Nan)

   - Allow attaching BPF scriptlets to module symbols (Wang Nan)

   - Allow attaching BPF scriptlets to userspace code using uprobe (Wang
     Nan)

   - BPF programs now can specify 'perf probe' tunables via its section
     name, separating key=val values using semicolons (Wang Nan)

     Testing some of these new BPF features:

        Use case: get callchains when receiving SSL packets, filter then in the
                  kernel, at arbitrary place.

        # cat ssl.bpf.c
        #define SEC(NAME) __attribute__((section(NAME), used))

        struct pt_regs;

        SEC("func=__inet_lookup_established hnum")
        int func(struct pt_regs *ctx, int err, unsigned short port)
        {
                return err == 0 && port == 443;
        }

        char _license[] SEC("license") = "GPL";
        int  _version   SEC("version") = LINUX_VERSION_CODE;
        #
        # perf record -a -g -e ssl.bpf.c
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.787 MB perf.data (3 samples) ]
        # perf script | head -30
        swapper     0 [000] 58783.268118: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
           8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
           896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
           8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
           855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
           8572a8 process_backlog (/lib/modules/4.3.0+/build/vmlinux)
           856b11 net_rx_action (/lib/modules/4.3.0+/build/vmlinux)
           2a284b __do_softirq (/lib/modules/4.3.0+/build/vmlinux)
           2a2ba3 irq_exit (/lib/modules/4.3.0+/build/vmlinux)
           96b7a4 do_IRQ (/lib/modules/4.3.0+/build/vmlinux)
           969807 ret_from_intr (/lib/modules/4.3.0+/build/vmlinux)
           2dede5 cpu_startup_entry (/lib/modules/4.3.0+/build/vmlinux)
           95d5bc rest_init (/lib/modules/4.3.0+/build/vmlinux)
          1163ffa start_kernel ([kernel.vmlinux].init.text)
          11634d7 x86_64_start_reservations ([kernel.vmlinux].init.text)
          1163623 x86_64_start_kernel ([kernel.vmlinux].init.text)

        qemu-system-x86  9178 [003] 58785.792417: perf_bpf_probe:func: (ffffffff816a0f60) hnum=0x1bb
           8a0f61 __inet_lookup_established (/lib/modules/4.3.0+/build/vmlinux)
           896def ip_rcv_finish (/lib/modules/4.3.0+/build/vmlinux)
           8976c2 ip_rcv (/lib/modules/4.3.0+/build/vmlinux)
           855eba __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
           856660 netif_receive_skb_internal (/lib/modules/4.3.0+/build/vmlinux)
           8566ec netif_receive_skb_sk (/lib/modules/4.3.0+/build/vmlinux)
             430a br_handle_frame_finish ([bridge])
             48bc br_handle_frame ([bridge])
           855f44 __netif_receive_skb_core (/lib/modules/4.3.0+/build/vmlinux)
           8565d8 __netif_receive_skb (/lib/modules/4.3.0+/build/vmlinux)
        #

   - Use 'perf probe' various options to list functions, see what
     variables can be collected at any given point, experiment first
     collecting without a filter, then filter, use it together with
     'perf trace', 'perf top', with or without callchains, if it
     explodes, please tell us!

   - Introduce a new callchain mode: "folded", that will list per line
     representations of all callchains for a give histogram entry,
     facilitating 'perf report' output processing by other tools, such
     as Brendan Gregg's flamegraph tools (Namhyung Kim)

     E.g:

        # perf report | grep -v ^# | head
           18.37%     0.00%  swapper  [kernel.kallsyms]   [k] cpu_startup_entry
                           |
                           ---cpu_startup_entry
                              |
                              |--12.07%--start_secondary
                              |
                               --6.30%--rest_init
                                         start_kernel
                                         x86_64_start_reservations
                                         x86_64_start_kernel
         #

     Becomes, in "folded" mode:

        # perf report -g folded | grep -v ^# | head -5
            18.37%     0.00%  swapper [kernel.kallsyms]   [k] cpu_startup_entry
          12.07% cpu_startup_entry;start_secondary
           6.30% cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            16.90%     0.00%  swapper [kernel.kallsyms]   [k] call_cpuidle
          11.23% call_cpuidle;cpu_startup_entry;start_secondary
           5.67% call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            16.90%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter
          11.23% cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
           5.67% cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
            15.12%     0.00%  swapper [kernel.kallsyms]   [k] cpuidle_enter_state
         #

     The user can also select one of "count", "period" or "percent" as
     the first column.

  ... and lots of infrastructure enhancements, plus fixes and other
  changes, features I failed to list - see the shortlog and the git log
  for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (271 commits)
  perf evlist: Add --trace-fields option to show trace fields
  perf record: Store data mmaps for dwarf unwind
  perf libdw: Check for mmaps also in MAP__VARIABLE tree
  perf unwind: Check for mmaps also in MAP__VARIABLE tree
  perf unwind: Use find_map function in access_dso_mem
  perf evlist: Remove perf_evlist__(enable|disable)_event functions
  perf evlist: Make perf_evlist__open() open evsels with their cpus and threads (like perf record does)
  perf report: Show random usage tip on the help line
  perf hists: Export a couple of hist functions
  perf diff: Use perf_hpp__register_sort_field interface
  perf tools: Add overhead/overhead_children keys defaults via string
  perf tools: Remove list entry from struct sort_entry
  perf tools: Include all tools/lib directory for tags/cscope/TAGS targets
  perf script: Align event name properly
  perf tools: Add missing headers in perf's MANIFEST
  perf tools: Do not show trace command if it's not compiled in
  perf report: Change default to use event group view
  perf top: Decay periods in callchains
  tools lib: Move bitmap.[ch] from tools/perf/ to tools/{lib,include}/
  tools lib: Sync tools/lib/find_bit.c with the kernel
  ...
2016-01-11 14:39:17 -08:00
Linus Torvalds
24af98c4cf Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "So we have a laundry list of locking subsystem changes:

   - continuing barrier API and code improvements

   - futex enhancements

   - atomics API improvements

   - pvqspinlock enhancements: in particular lock stealing and adaptive
     spinning

   - qspinlock micro-enhancements"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op
  futex: Cleanup the goto confusion in requeue_pi()
  futex: Remove pointless put_pi_state calls in requeue()
  futex: Document pi_state refcounting in requeue code
  futex: Rename free_pi_state() to put_pi_state()
  futex: Drop refcount if requeue_pi() acquired the rtmutex
  locking/barriers, arch: Remove ambiguous statement in the smp_store_mb() documentation
  lcoking/barriers, arch: Use smp barriers in smp_store_release()
  locking/cmpxchg, arch: Remove tas() definitions
  locking/pvqspinlock: Queue node adaptive spinning
  locking/pvqspinlock: Allow limited lock stealing
  locking/pvqspinlock: Collect slowpath lock statistics
  sched/core, locking: Document Program-Order guarantees
  locking, sched: Introduce smp_cond_acquire() and use it
  locking/pvqspinlock, x86: Optimize the PV unlock code path
  locking/qspinlock: Avoid redundant read of next pointer
  locking/qspinlock: Prefetch the next node cacheline
  locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
  atomics: Add test for atomic operations with _relaxed variants
2016-01-11 14:18:38 -08:00
H.J. Lu
8c31902cff x86/boot: Double BOOT_HEAP_SIZE to 64KB
When decompressing kernel image during x86 bootup, malloc memory
for ELF program headers may run out of heap space, which leads
to system halt.  This patch doubles BOOT_HEAP_SIZE to 64KB.

Tested with 32-bit kernel which failed to boot without this patch.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-11 12:30:50 +01:00
Andy Lutomirski
71b3c126e6 x86/mm: Add barriers and document switch_mm()-vs-flush synchronization
When switch_mm() activates a new PGD, it also sets a bit that
tells other CPUs that the PGD is in use so that TLB flush IPIs
will be sent.  In order for that to work correctly, the bit
needs to be visible prior to loading the PGD and therefore
starting to fill the local TLB.

Document all the barriers that make this work correctly and add
a couple that were missing.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-11 12:03:15 +01:00
Paolo Bonzini
2860c4b167 KVM: move architecture-dependent requests to arch/
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08 19:04:36 +01:00
Andy Shevchenko
eebb3e8d8a ACPI / LPSS: override power state for LPSS DMA device
This is a third approach to workaround long standing issue with LPSS on
BayTrail. First one [1] was reverted since it didn't resolve the issue
comprehensively. Second one [2] was rejected by internal review.

The LPSS DMA controller does not have neither _PS0 nor _PS3 method. Moreover it
can be powered off automatically whenever the last LPSS device goes down. In
case of no power any access to the DMA controller will hang the system. The
behaviour is reproduced on some HP laptops based on Intel BayTrail [3,4] as
well as on ASuS T100TA transformer.

Power on the LPSS island through the registers accessible in a specific way.

[1] http://www.spinics.net/lists/linux-acpi/msg53963.html
[2] https://bugzilla.redhat.com/attachment.cgi?id=1066779&action=diff
[3] https://bugzilla.redhat.com/show_bug.cgi?id=1184273
[4] http://www.spinics.net/lists/dmaengine/msg01514.html

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-01-07 14:11:32 +01:00
Paolo Bonzini
def840ede3 KVM/ARM changes for Linux v4.5
- Complete rewrite of the arm64 world switch in C, hopefully
   paving the way for more sharing with the 32bit code, better
   maintainability and easier integration of new features.
   Also smaller and slightly faster in some cases...
 - Support for 16bit VM identifiers
 - Various cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWe80IAAoJECPQ0LrRPXpDdt8P/ittxzklIT7rsZxdOiIY6vLQ
 i2hWGo1KdZR+8rsNyQEeGyg2Ocdja0Vld9ciBKgXKeKtc6x6AHfq0x6eyGRbF2jJ
 Wdkd2a9lLJVJJIf1LBhOQuwjNiEvAgvqO5nwXL77s/rqx3Ur5OlyohSvRFBy7Pqp
 8cdnCV/43I/y7k0iGhitFVrEC9TL0cfeJmM7YhXkt8IcpkcpCDfgdI7wgIb0ntvv
 dqvrRmfp+Q3/hJ6SVRsy6uzOrjFjRE8hIIG6TiqfRd/FgI5x2xvGkSAE3Wx2YdRM
 myPDiAuY2wOyALZpn9zD7qFMOfI2wX8kaX5S/ctnbvLelkmQblI39/zfYuxJ36xC
 Mo2yMKcvT37AIMz2fxx3mGnIR7NZBNXVQGJmv/1p9vMQ8RbUXUhT0w6hP5SH9S7m
 RDoOkfd37wQugQ7bgI7cqg4hodMRlybGPq8QaKp80y0Ej3cPblM+y0fbR153SSbS
 6nCwYceFLdWJEV+tTFpKD5cvxOGeYfoC/8LcVRYRcWg6nlr4+qo61MHevyCe/Qxw
 Pw+z9wFpoVKumRT72KmzFUxFQqRnTshE3KJqNJdsqMPM8ZuW5TJ/MtUa+JdAWmSH
 dEAqd7Hy5W1CqgGEk+los+QlKw+5uFepcZ7SOuQ2ehME/Ae3xI8s9+UE6yqeHx6Y
 PV2s5lyJRCPfm3qOu7TS
 =wuir
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next

KVM/ARM changes for Linux v4.5

- Complete rewrite of the arm64 world switch in C, hopefully
  paving the way for more sharing with the 32bit code, better
  maintainability and easier integration of new features.
  Also smaller and slightly faster in some cases...
- Support for 16bit VM identifiers
- Various cleanups
2016-01-07 11:00:57 +01:00
Andy Lutomirski
8705d603ed x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
arch/x86/built-in.o: In function `arch_setup_additional_pages':
 (.text+0x587): undefined reference to `pvclock_pvti_cpu0_va'

KVM_GUEST selects PARAVIRT_CLOCK, so we can make pvclock_pvti_cpu0_va depend
on KVM_GUEST.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Borislav Petkov <bp@alien8.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/444d38a9bcba832685740ea1401b569861d09a72.1451446564.git.luto@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-01-06 10:49:53 +01:00
Mark Brown
b9546d09b1 Merge remote-tracking branches 'asoc/topic/fsl-spdif', 'asoc/topic/img' and 'asoc/topic/intel' into asoc-next 2015-12-23 00:23:43 +00:00
Stefano Stabellini
cfafae9403 xen: rename dom0_op to platform_op
The dom0_op hypercall has been renamed to platform_op since Xen 3.2,
which is ancient, and modern upstream Linux kernels cannot run as dom0
and it anymore anyway.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-12-21 14:40:55 +00:00
Jake Oshins
c8f3e518d3 x86/irq: Export functions to allow MSI domains in modules
The Linux kernel already has the concept of IRQ domain, wherein a
component can expose a set of IRQs which are managed by a particular
interrupt controller chip or other subsystem. The PCI driver exposes
the notion of an IRQ domain for Message-Signaled Interrupts (MSI) from
PCI Express devices. This patch exposes the functions which are
necessary for creating a MSI IRQ domain within a module.

[ tglx: Split it into x86 and core irq parts ]

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Cc: gregkh@linuxfoundation.org
Cc: kys@microsoft.com
Cc: devel@linuxdriverproject.org
Cc: olaf@aepfle.de
Cc: apw@canonical.com
Cc: vkuznets@redhat.com
Cc: haiyangz@microsoft.com
Cc: marc.zyngier@arm.com
Cc: bhelgaas@google.com
Link: http://lkml.kernel.org/r/1449769983-12948-4-git-send-email-jakeo@microsoft.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-20 12:40:49 +01:00
David Vrabel
d8c98a1d14 x86/paravirt: Prevent rtc_cmos platform device init on PV guests
Adding the rtc platform device in non-privileged Xen PV guests causes
an IRQ conflict because these guests do not have legacy PIC and may
allocate irqs in the legacy range.

In a single VCPU Xen PV guest we should have:

/proc/interrupts:
           CPU0
  0:       4934  xen-percpu-virq      timer0
  1:          0  xen-percpu-ipi       spinlock0
  2:          0  xen-percpu-ipi       resched0
  3:          0  xen-percpu-ipi       callfunc0
  4:          0  xen-percpu-virq      debug0
  5:          0  xen-percpu-ipi       callfuncsingle0
  6:          0  xen-percpu-ipi       irqwork0
  7:        321   xen-dyn-event     xenbus
  8:         90   xen-dyn-event     hvc_console
  ...

But hvc_console cannot get its interrupt because it is already in use
by rtc0 and the console does not work.

  genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)

We can avoid this problem by realizing that unprivileged PV guests (both
Xen and lguests) are not supposed to have rtc_cmos device and so
adding it is not necessary.

Privileged guests (i.e. Xen's dom0) do use it but they should not have
irq conflicts since they allocate irqs above legacy range (above
gsi_top, in fact).

Instead of explicitly testing whether the guest is privileged we can
extend pv_info structure to include information about guest's RTC
support.

Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: vkuznets@redhat.com
Cc: xen-devel@lists.xenproject.org
Cc: konrad.wilk@oracle.com
Cc: stable@vger.kernel.org # 4.2+
Link: http://lkml.kernel.org/r/1449842873-2613-1-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 21:35:13 +01:00
Pierre-Louis Bossart
8788f83929 ASoc: Intel: Atom: add deep buffer definitions for atom platforms
Add definitions for MERR_DPCM_DEEP_BUFFER AND PIPE_MEDIA3_IN
Add relevant cpu-dai and dai link names

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-12-19 11:49:56 +00:00
Borislav Petkov
4baf7fe407 x86/mm: Align macro defines
Bring PAGE_{SHIFT,SIZE,MASK} to the same indentation level as the rest
of the header.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449480268-26583-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:53:40 +01:00
Borislav Petkov
6e1315fe82 x86/cpu: Provide a config option to disable static_cpu_has
This brings .text savings of about ~1.6K when building a tinyconfig. It
is off by default so nothing changes for the default.

Kconfig help text from Josh.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Link: http://lkml.kernel.org/r/1449481182-27541-5-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Borislav Petkov
362f924b64 x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros
Those are stupid and code should use static_cpu_has_safe() or
boot_cpu_has() instead. Kill the least used and unused ones.

The remaining ones need more careful inspection before a conversion can
happen. On the TODO.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-4-git-send-email-bp@alien8.de
Cc: David Sterba <dsterba@suse.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:55 +01:00
Borislav Petkov
39c06df4dc x86/cpufeature: Cleanup get_cpu_cap()
Add an enum for the ->x86_capability array indices and cleanup
get_cpu_cap() by killing some redundant local vars.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-3-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:54 +01:00
Borislav Petkov
2ccd71f1b2 x86/cpufeature: Move some of the scattered feature bits to x86_capability
Turn the CPUID leafs which are proper CPUID feature bit leafs into
separate ->x86_capability words.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1449481182-27541-2-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:49:53 +01:00
Thomas Gleixner
0fa85119cd Merge branch 'linus' into x86/cleanups
Pull in upstream changes so we can apply depending patches.
2015-12-19 11:49:13 +01:00
Hidehiro Kawai
b279d67df8 x86/nmi: Save regs in crash dump on external NMI
Now, multiple CPUs can receive an external NMI simultaneously by
specifying the "apic_extnmi=all" command line parameter. When we take
a crash dump by using external NMI with this option, we fail to save
registers into the crash dump. This happens as follows:

  CPU 0                              CPU 1
  ================================   =============================
  receive an external NMI
  default_do_nmi()                   receive an external NMI
    spin_lock(&nmi_reason_lock)      default_do_nmi()
    io_check_error()                   spin_lock(&nmi_reason_lock)
      panic()                            busy loop
      ...
        kdump_nmi_shootdown_cpus()
          issue NMI IPI -----------> blocked until IRET
                                         busy loop...

Here, since CPU 1 is in NMI context, an additional NMI from CPU 0
remains unhandled until CPU 1 IRETs. However, CPU 1 will never execute
IRET so the NMI is not handled and the callback function to save
registers is never called.

To solve this issue, we check if the IPI for crash dumping was issued
while waiting for nmi_reason_lock to be released, and if so, call its
callback function directly. If the IPI is not issued (e.g. kdump is
disabled), the actual behavior doesn't change.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210065245.4587.39316.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Hidehiro Kawai
b7c4948e98 x86/apic: Introduce apic_extnmi command line parameter
This patch introduces a command line parameter apic_extnmi:

 apic_extnmi=( bsp|all|none )

The default value is "bsp" and this is the current behavior: only the
Boot-Strapping Processor receives an external NMI.

"all" allows external NMIs to be broadcast to all CPUs. This would
raise the success rate of panic on NMI when BSP hangs in NMI context
or the external NMI is swallowed by other NMI handlers on the BSP.

If you specify "none", no CPUs receive external NMIs. This is useful for
the dump capture kernel so that it cannot be shot down by accidentally
pressing the external NMI button (on platforms which have it) while
saving a crash dump.

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Bandan Das <bsd@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: kexec@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrs
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 11:07:01 +01:00
Thomas Gleixner
d267b8d6c6 Merge branch 'linus' into x86/apic
Pull in update changes so we can apply conflicting patches
2015-12-19 11:03:18 +01:00
Boris Ostrovsky
91e2eea98f x86/xen: Avoid fast syscall path for Xen PV guests
After 32-bit syscall rewrite, and specifically after commit:

  5f310f739b ("x86/entry/32: Re-implement SYSENTER using the new C path")

... the stack frame that is passed to xen_sysexit is no longer a
"standard" one (i.e. it's not pt_regs).

Since we end up calling xen_iret from xen_sysexit we don't need
to fix up the stack and instead follow entry_SYSENTER_32's IRET
path directly to xen_iret.

We can do the same thing for compat mode even though stack does
not need to be fixed. This will allow us to drop usergs_sysret32
paravirt op (in the subsequent patch)

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-12-19 09:55:52 +01:00
Linus Torvalds
5b24a7a2aa Add 'unsafe' user access functions for batched accesses
The naming is meant to discourage random use: the helper functions are
not really any more "unsafe" than the traditional double-underscore
functions (which need the address range checking), but they do need even
more infrastructure around them, and should not be used willy-nilly.

In addition to checking the access range, these user access functions
require that you wrap the user access with a "user_acess_{begin,end}()"
around it.

That allows architectures that implement kernel user access control
(x86: SMAP, arm64: PAN) to do the user access control in the wrapping
user_access_begin/end part, and then batch up the actual user space
accesses using the new interfaces.

The main (and hopefully only) use for these are for core generic access
helpers, initially just the generic user string functions
(strnlen_user() and strncpy_from_user()).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-17 09:57:27 -08:00
Linus Torvalds
11f1a4b975 x86: reorganize SMAP handling in user space accesses
This reorganizes how we do the stac/clac instructions in the user access
code.  Instead of adding the instructions directly to the same inline
asm that does the actual user level access and exception handling, add
them at a higher level.

This is mainly preparation for the next step, where we will expose an
interface to allow users to mark several accesses together as being user
space accesses, but it does already clean up some code:

 - the inlined trivial cases of copy_in_user() now do stac/clac just
   once over the accesses: they used to do one pair around the user
   space read, and another pair around the write-back.

 - the {get,put}_user_ex() macros that are used with the catch/try
   handling don't do any stac/clac at all, because that happens in the
   try/catch surrounding them.

Other than those two cleanups that happened naturally from the
re-organization, this should not make any difference. Yet.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-12-17 09:45:09 -08:00
Andrey Smetanin
1f4b34f825 kvm/x86: Hyper-V SynIC timers
Per Hyper-V specification (and as required by Hyper-V-aware guests),
SynIC provides 4 per-vCPU timers.  Each timer is programmed via a pair
of MSRs, and signals expiration by delivering a special format message
to the configured SynIC message slot and triggering the corresponding
synthetic interrupt.

Note: as implemented by this patch, all periodic timers are "lazy"
(i.e. if the vCPU wasn't scheduled for more than the timer period the
timer events are lost), regardless of the corresponding configuration
MSR.  If deemed necessary, the "catch up" mode (the timer period is
shortened until the timer catches up) will be implemented later.

Changes v2:
* Use remainder to calculate periodic timer expiration time

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-12-16 18:49:45 +01:00
Ingo Molnar
057032e457 Linux 4.4-rc5
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWbh6rAAoJEHm+PkMAQRiG2L4H/AmgfAPX8vYQTKXAnpxt7cLx
 2W9C+fyKRcHzqBCsUB4wxPwYwDGPmEZHo4Y/evaSdUbPhngRRRfEVZpzbTUqheEm
 A7h/1mCl5gcaGRzDNTAST3vfmNmwOHAWC+Ch3ZuuzxH+brtY0Ynb32CNa1XnmW9K
 7qknzpgyE3ZNQgwKzZ7F/+TscGcslalKRoAxPa7fumb1srW/Z04aGXYZdEQxOhow
 6Oc2op0IijTse5TdfW/MsbpvbH2uBLnQcYHvKXJ0wRmnQGeowguLSgyW356EAOQ9
 7L7xCvXyX5dFakZ9EApT3wsSP+cS7jUInDLDAxf14gyODrnl65KO3LU8vmZbJmI=
 =l5Rq
 -----END PGP SIGNATURE-----

Merge tag 'v4.4-rc5' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-14 09:31:23 +01:00
Andy Lutomirski
cc1e24fdb0 x86/vdso: Remove pvclock fixmap machinery
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/4933029991103ae44672c82b97a20035f5c1fe4f.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Andy Lutomirski
dac16fba6f x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/9d37826fdc7e2d2809efe31d5345f97186859284.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-11 08:56:03 +01:00
Tomasz Nowicki
21461775f3 x86/PCI: Clarify AMD Fam10h config access restrictions comment
Clarify the comment about AMD Fam10h config access restrictions, fix typos,
and add a reference to the specification.

[bhelgaas: streamline]
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
2015-12-10 19:38:06 -06:00
Andy Shevchenko
4077a387b7 x86/platform/iosf_mbi: Remove duplicate definitions
The read and write opcodes are global for all units on SoC and even across
Intel SoCs. Remove duplication of corresponding constants. At the same time
convert all current users.

No functional change.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Boon Leong Ong <boon.leong.ong@intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-12-09 01:18:34 +01:00
Andi Kleen
7f47d8cc03 x86, tracing, perf: Add trace point for MSR accesses
For debugging low level code interacting with the CPU it is often
useful to trace the MSR read/writes. This gives a concise summary of
PMU and other operations.

perf has an ad-hoc way to do this using trace_printk, but it's
somewhat limited (and also now spews ugly boot messages when enabled)

Instead define real trace points for all MSR accesses.

This adds three new trace points: read_msr and write_msr and rdpmc.

They also report if the access faulted (if *_safe is used)

This allows filtering and triggering on specific MSR values, which
allows various more advanced debugging techniques.

All the values are well defined in the CPU documentation.

The trace can be post processed with
Documentation/trace/postprocess/decode_msr.py to add symbolic MSR
names to the trace.

I only added it to native MSR accesses in C, not paravirtualized or in
entry*.S (which is not too interesting)

Originally the patch kit moved the MSRs out of line.  This uses an
alternative approach recommended by Steven Rostedt of only moving the
trace calls out of line, but open coding the access to the jump label.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1449018060-1742-3-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:56:10 +01:00
Andi Kleen
153a4334c4 x86/headers: Don't include asm/processor.h in asm/atomic.h
asm/atomic.h doesn't really need asm/processor.h anymore. Everything
it uses has moved to other header files. So remove that include.

processor.h is a nasty header that includes lots of
other headers and makes it prone to include loops. Removing the
include here makes asm/atomic.h a "leaf" header that can
be safely included in most other headers.

The only fallout is in the lib/atomic tester which relied on
this implicit include. Give it an explicit include.
(the include is in ifdef because the user is also in ifdef)

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1449018060-1742-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-06 12:56:03 +01:00
Kirill A. Shutemov
70f1528747 x86/mm: Fix regression with huge pages on PAE
Recent PAT patchset has caused issue on 32-bit PAE machines:

  page:eea45000 count:0 mapcount:-128 mapping:  (null) index:0x0 flags: 0x40000000()
  page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) < 0)
  ------------[ cut here ]------------
  kernel BUG at /home/build/linux-boris/mm/huge_memory.c:1485!
  invalid opcode: 0000 [#1] SMP
  [...]
  Call Trace:
   unmap_single_vma
   ? __wake_up
   unmap_vmas
   unmap_region
   do_munmap
   vm_munmap
   SyS_munmap
   do_fast_syscall_32
   ? __do_page_fault
   sysenter_past_esp
  Code: ...
  EIP: [<c11bde80>] zap_huge_pmd+0x240/0x260 SS:ESP 0068:f6459d98

The problem is in pmd_pfn_mask() and pmd_flags_mask(). These
helpers use PMD_PAGE_MASK to calculate resulting mask.
PMD_PAGE_MASK is 'unsigned long', not 'unsigned long long' as
phys_addr_t is on 32-bit PAE (ARCH_PHYS_ADDR_T_64BIT). As a
result, the upper bits of resulting mask get truncated.

pud_pfn_mask() and pud_flags_mask() aren't problematic since we
don't have PUD page table level on 32-bit systems, but it's
reasonable to keep them consistent with PMD counterpart.

Introduce PHYSICAL_PMD_PAGE_MASK and PHYSICAL_PUD_PAGE_MASK in
addition to existing PHYSICAL_PAGE_MASK and reworks helpers to
use them.

Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[ Fix -Woverflow warnings from the realmode code. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jürgen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: elliott@hpe.com
Cc: konrad.wilk@oracle.com
Cc: linux-mm <linux-mm@kvack.org>
Fixes: f70abb0fc3 ("x86/asm: Fix pud/pmd interfaces to handle large PAT bit")
Link: http://lkml.kernel.org/r/1448878233-11390-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 09:14:27 +01:00
Matt Fleming
67a9108ed4 x86/efi: Build our own page table structures
With commit e1a58320a3 ("x86/mm: Warn on W^X mappings") all
users booting on 64-bit UEFI machines see the following warning,

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x5dc/0x780()
  x86/mm: Found insecure W+X mapping at address ffff88000005f000/0xffff88000005f000
  ...
  x86/mm: Checked W+X mappings: FAILED, 165660 W+X pages found.
  ...

This is caused by mapping EFI regions with RWX permissions.
There isn't much we can do to restrict the permissions for these
regions due to the way the firmware toolchains mix code and
data, but we can at least isolate these mappings so that they do
not appear in the regular kernel page tables.

In commit d2f7cbe7b2 ("x86/efi: Runtime services virtual
mapping") we started using 'trampoline_pgd' to map the EFI
regions because there was an existing identity mapping there
which we use during the SetVirtualAddressMap() call and for
broken firmware that accesses those addresses.

But 'trampoline_pgd' shares some PGD entries with
'swapper_pg_dir' and does not provide the isolation we require.
Notably the virtual address for __START_KERNEL_map and
MODULES_START are mapped by the same PGD entry so we need to be
more careful when copying changes over in
efi_sync_low_kernel_mappings().

This patch doesn't go the full mile, we still want to share some
PGD entries with 'swapper_pg_dir'. Having completely separate
page tables brings its own issues such as synchronising new
mappings after memory hotplug and module loading. Sharing also
keeps memory usage down.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29 09:15:42 +01:00
Matt Fleming
c9f2a9a65e x86/efi: Hoist page table switching code into efi_call_virt()
This change is a prerequisite for pending patches that switch to
a dedicated EFI page table, instead of using 'trampoline_pgd'
which shares PGD entries with 'swapper_pg_dir'. The pending
patches make it impossible to dereference the runtime service
function pointer without first switching %cr3.

It's true that we now have duplicated switching code in
efi_call_virt() and efi_call_phys_{prolog,epilog}() but we are
sacrificing code duplication for a little more clarity and the
ease of writing the page table switching code in C instead of
asm.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-5-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29 09:15:42 +01:00
Julia Lawall
d6b56b0bc6 x86/platform/calgary: Constify cal_chipset_ops structures
The cal_chipset_ops structures are never modified, so declare
them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jon D. Mason <jdmason@kudzu.us>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448726295-10959-1-git-send-email-Julia.Lawall@lip6.fr
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-29 08:50:58 +01:00
Chen Yu
7a9c2dd08e x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
A bug was reported that on certain Broadwell platforms, after
resuming from S3, the CPU is running at an anomalously low
speed.

It turns out that the BIOS has modified the value of the
THERM_CONTROL register during S3, and changed it from 0 to 0x10,
thus enabled clock modulation(bit4), but with undefined CPU Duty
Cycle(bit1:3) - which causes the problem.

Here is a simple scenario to reproduce the issue:

 1. Boot up the system
 2. Get MSR 0x19a, it should be 0
 3. Put the system into sleep, then wake it up
 4. Get MSR 0x19a, it shows 0x10, while it should be 0

Although some BIOSen want to change the CPU Duty Cycle during
S3, in our case we don't want the BIOS to do any modification.

Fix this issue by introducing a more generic x86 framework to
save/restore specified MSR registers(THERM_CONTROL in this case)
for suspend/resume. This allows us to fix similar bugs in a much
simpler way in the future.

When the kernel wants to protect certain MSRs during suspending,
we simply add a quirk entry in msr_save_dmi_table, and customize
the MSR registers inside the quirk callback, for example:

  u32 msr_id_need_to_save[] = {MSR_ID0, MSR_ID1, MSR_ID2...};

and the quirk mechanism ensures that, once resumed from suspend,
the MSRs indicated by these IDs will be restored to their
original, pre-suspend values.

Since both 64-bit and 32-bit kernels are affected, this patch
covers the common 64/32-bit suspend/resume code path. And
because the MSRs specified by the user might not be available or
readable in any situation, we use rdmsrl_safe() to safely save
these MSRs.

Reported-and-tested-by: Marcin Kaszewski <marcin.kaszewski@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: len.brown@intel.com
Cc: linux@horizon.com
Cc: luto@kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/c9abdcbc173dd2f57e8990e304376f19287e92ba.1448382971.git.yu.c.chen@intel.com
[ More edits to the naming of data structures. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-26 10:04:53 +01:00
Juergen Gross
d6ccc3ec95 x86/paravirt: Remove paravirt ops pmd_update[_defer] and pte_update_defer
pte_update_defer can be removed as it is always set to the same
function as pte_update. So any usage of pte_update_defer() can be
replaced by pte_update().

pmd_update and pmd_update_defer are always set to paravirt_nop, so they
can just be nuked.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: jeremy@goop.org
Cc: chrisw@sous-sol.org
Cc: akataria@vmware.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xen.org
Cc: konrad.wilk@oracle.com
Cc: david.vrabel@citrix.com
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/1447771879-1806-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-25 23:08:37 +01:00
Takuya Yoshikawa
018aabb56d KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
New struct kvm_rmap_head makes the code type-safe to some extent.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:25:44 +01:00
Andrey Smetanin
db3975717a kvm/x86: Hyper-V kvm exit
A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
5c919412fe kvm/x86: Hyper-V synthetic interrupt controller
SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:22 +01:00
Andrey Smetanin
d62caabb41 kvm/x86: per-vcpu apicv deactivation support
The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andrey Smetanin
6308630bd3 kvm/x86: split ioapic-handled and EOI exit bitmaps
The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Gleb Natapov <gleb@kernel.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Roman Kagan <rkagan@virtuozzo.com>
CC: Denis V. Lunev <den@openvz.org>
CC: qemu-devel@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-25 17:24:21 +01:00
Andy Lutomirski
2671c3e4fe x86/asm: Add asm macros for static keys/jump labels
Unfortunately, we can only do this if HAVE_JUMP_LABEL.  In
principle, we could do some serious surgery on the core jump
label infrastructure to keep the patch infrastructure available
on x86 on all builds, but that's probably not worth it.

Implementing the macros using a conditional branch as a fallback
seems like a bad idea: we'd have to clobber flags.

This limitation can't cause silent failures -- trying to include
asm/jump_label.h at all on a non-HAVE_JUMP_LABEL kernel will
error out.  The macro's users are responsible for handling this
issue themselves.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/63aa45c4b692e8469e1876d6ccbb5da707972990.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:56:44 +01:00
Andy Lutomirski
c28454332f x86/asm: Error out if asm/jump_label.h is included inappropriately
Rather than potentially generating incorrect code on a
non-HAVE_JUMP_LABEL kernel if someone includes asm/jump_label.h,
error out.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/99407f0ac7fa3ab03a3d31ce076d47b5c2f44795.1447361906.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:56:43 +01:00
Borislav Petkov
b7106fa0f2 x86/fpu: Get rid of xstate_fault()
Add macros for the alternative XSAVE*/XRSTOR* operations which
contain the fault handling and use them. Kill xstate_fault().

Also, copy_xregs_to_kernel() didn't have the extended state as
memory reference in the asm.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447932326-4371-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:52:52 +01:00
Borislav Petkov
b74a0cf1b3 x86/fpu: Add an XSTATE_OP() macro
Add an XSTATE_OP() macro which contains the XSAVE* fault handling
and replace all non-alternatives users of xstate_fault() with
it.

This fixes also the buglet in copy_xregs_to_user() and
copy_user_to_xregs() where the inline asm didn't have @xstate as
memory reference and thus potentially causing unwanted
reordering of accesses to the extended state.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447932326-4371-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:52:52 +01:00
Borislav Petkov
679bcea857 x86/MSR: Chop off lower 32-bit value
sparse complains that the cast truncates the high bits. But here
we really do know what we're doing and we need the lower 32 bits
only as the @low argument. So make that explicit.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:55 +01:00
Borislav Petkov
ae8b787543 x86/cpu/amd, kvm: Satisfy guest kernel reads of IC_CFG MSR
The kernel accesses IC_CFG MSR (0xc0011021) on AMD because it
checks whether the way access filter is enabled on some F15h
models, and, if so, disables it.

kvm doesn't handle that MSR access and complains about it, which
can get really noisy in dmesg when one starts kvm guests all the
time for testing. And it is useless anyway - guest kernel
shouldn't be doing such changes anyway so tell it that that
filter is disabled.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Borislav Petkov
99f925ce92 x86/cpu: Unify CPU family, model, stepping calculation
Add generic functions which calc family, model and stepping from
the CPUID_1.EAX leaf and stick them into the library we have.

Rename those which do call CPUID with the prefix "x86_cpuid" as
suggested by Paolo Bonzini.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1448273546-2567-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-24 09:15:54 +01:00
Boris Ostrovsky
75ef82190d x86/entry, x86/paravirt: Remove the unused usergs_sysret32 PV op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", usergs_sysret32 pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-4-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Boris Ostrovsky
88c15ec90f x86/paravirt: Remove the unused irq_enable_sysexit pv op
As result of commit "x86/xen: Avoid fast syscall path for Xen PV
guests", the irq_enable_sysexit pv op is not called by Xen PV guests
anymore and since they were the only ones who used it we can
safely remove it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-3-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Boris Ostrovsky
5fdf5d37f4 x86/xen: Avoid fast syscall path for Xen PV guests
After 32-bit syscall rewrite, and specifically after commit:

  5f310f739b ("x86/entry/32: Re-implement SYSENTER using the new C path")

... the stack frame that is passed to xen_sysexit is no longer a
"standard" one (i.e. it's not pt_regs).

Since we end up calling xen_iret from xen_sysexit we don't need
to fix up the stack and instead follow entry_SYSENTER_32's IRET
path directly to xen_iret.

We can do the same thing for compat mode even though stack does
not need to be fixed. This will allow us to drop usergs_sysret32
paravirt op (in the subsequent patch)

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1447970147-1733-2-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:48:16 +01:00
Waiman Long
d78045306c locking/pvqspinlock, x86: Optimize the PV unlock code path
The unlock function in queued spinlocks was optimized for better
performance on bare metal systems at the expense of virtualized guests.

For x86-64 systems, the unlock call needs to go through a
PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit
registers before calling the real __pv_queued_spin_unlock()
function. The thunk code may also be in a separate cacheline from
__pv_queued_spin_unlock().

This patch optimizes the PV unlock code path by:

 1) Moving the unlock slowpath code from the fastpath into a separate
    __pv_queued_spin_unlock_slowpath() function to make the fastpath
    as simple as possible..

 2) For x86-64, hand-coded an assembly function to combine the register
    saving thunk code with the fastpath code. Only registers that
    are used in the fastpath will be saved and restored. If the
    fastpath fails, the slowpath function will be called via another
    PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C
    __pv_queued_spin_unlock() code as the thunk saves and restores
    only one 32-bit register.

With a microbenchmark of 5M lock-unlock loop, the table below shows
the execution times before and after the patch with different number
of threads in a VM running on a 32-core Westmere-EX box with x86-64
4.2-rc1 based kernels:

  Threads	Before patch	After patch	% Change
  -------	------------	-----------	--------
     1		   134.1 ms	  119.3 ms	  -11%
     2		   1286  ms	   953  ms	  -26%
     3		   3715  ms	  3480  ms	  -6.3%
     4		   4092  ms	  3764  ms	  -8.0%

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:02:02 +01:00
Takao Indoh
24cc12b176 perf/x86/intel/pt: Add interface to stop Intel PT logging
This patch add a function for external components to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, the intel_pt driver disables Intel PT and saves its registers
using pt_event_stop(), which is also used by pmu.stop handler.

This function stops Intel PT on the CPU where it is working, therefore
users of it need to call it for each CPU to stop all logging.

Signed-off-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin<alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: H.Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Vivek Goyal <vgoyal@redhat.com>
Link: http://lkml.kernel.org/r/1446614553-6072-2-git-send-email-indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:26 +01:00
Andi Kleen
10013ebb5d x86: Add an inlined __copy_from_user_nmi() variant
Add a inlined __ variant of copy_from_user_nmi. The inlined variant allows
the user to:

 - batch the access_ok() check for multiple accesses

 - avoid having a pagefault_disable/enable() on every access if the
   caller already ensures disabled page faults due to its context.

 - get all the optimizations in copy_*_user() for small constant sized
   transfers

It is just a define to __copy_from_user_inatomic().

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445551641-13379-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 09:58:24 +01:00
Juergen Gross
4609586592 x86/paravirt: Remove unused pv_apic_ops structure
The only member of that structure is startup_ipi_hook which is always
set to paravirt_nop.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: jeremy@goop.org
Cc: chrisw@sous-sol.org
Cc: akataria@vmware.com
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xen.org
Cc: konrad.wilk@oracle.com
Cc: boris.ostrovsky@oracle.com
Link: http://lkml.kernel.org/r/1447767872-16730-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-19 11:03:58 +01:00
Thomas Gleixner
2f7a3f8e87 x86/tsc: Remove unused tsc_pre_init() hook
No more users. Remove it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@suse.de>
2015-11-19 11:03:13 +01:00
Juergen Gross
ed29210cd6 x86: Remove unused function cpu_has_ht_siblings()
It is used nowhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
Link: http://lkml.kernel.org/r/1447761943-770-1-git-send-email-jgross@suse.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-17 15:40:17 +01:00
Rafael J. Wysocki
d9f67dbc0f Merge branch 'pm-tools'
* pm-tools:
  x86: remove unused definition of MSR_NHM_PLATFORM_INFO
  tools/power turbostat: use new name for MSR_PLATFORM_INFO
2015-11-16 22:57:02 +01:00
Linus Torvalds
bba072dfd7 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A couple of fixes and updates related to x86:

   - Fix the W+X check regression on XEN

   - The real fix for the low identity map trainwreck

   - Probe legacy PIC early instead of unconditionally allocating legacy
     irqs

   - Add cpu verification to long mode entry

   - Adjust the cache topology to AMD Fam17H systems

   - Let Merrifield use the TSC across S3"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Call verify_cpu() after having entered long mode too
  x86/setup: Fix low identity map for >= 2GB kernel range
  x86/mm: Skip the hypervisor range when walking PGD
  x86/AMD: Fix last level cache topology for AMD Fam17h systems
  x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
  x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield
2015-11-15 09:32:59 -08:00
Len Brown
5369a21e3f x86: remove unused definition of MSR_NHM_PLATFORM_INFO
MSR_NHM_PLATFORM_INFO has been replaced by...
MSR_PLATFORM_INFO

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-11-13 23:28:01 +01:00
Linus Torvalds
3370b69eb0 Four changes:
- x86: work around two nasty cases where a benign exception occurs while
 another is being delivered.  The endless stream of exceptions causes an
 infinite loop in the processor, which not even NMIs or SMIs can interrupt;
 in the virt case, there is no possibility to exit to the host either.
 
 - x86: support for Skylake per-guest TSC rate.  Long supported by AMD,
 the patches mostly move things from there to common arch/x86/kvm/ code.
 
 - generic: remove local_irq_save/restore from the guest entry and exit
 paths when context tracking is enabled.  The patches are a few months
 old, but we discussed them again at kernel summit.  Andy will pick up
 from here and, in 4.5, try to remove it from the user entry/exit paths.
 
 - PPC: Two bug fixes, see merge commit 370289756b for details.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWRFb0AAoJEL/70l94x66DjjMH/31jr8d119MW0uv2x+03+wRq
 6dbJ8tjQ8grvBRExKvLsUVjDmHlhCa1BQl5qjCsyYhX9UeAf4NQOmoEFpq+YTLxh
 Ctveyn+yiZWC7qxbQDmauiQ4JCOp+W9ial782iqw5+ouQMajGOffq5WrojCa2ZNF
 jI278JgdHJLrKj/uie//WBu3V7MJY5Apc3p4zatnSYFSQ3MA0sxl4r4zIrwOa5qs
 23ZeeoqbP4sHh4X5wL/30Y6XFSCHj0qoYHHyAgzLi0PCMvBdt4DrAFUPDG/Rhlv6
 o1WB/kcUfcz3DtBX85wfSOMuw0nF6patWhWv07R/3EIbYoz3dKvp9d6ORYgXqlY=
 =Um9M
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull second batch of kvm updates from Paolo Bonzini:
 "Four changes:

   - x86: work around two nasty cases where a benign exception occurs
     while another is being delivered.  The endless stream of exceptions
     causes an infinite loop in the processor, which not even NMIs or
     SMIs can interrupt; in the virt case, there is no possibility to
     exit to the host either.

   - x86: support for Skylake per-guest TSC rate.  Long supported by
     AMD, the patches mostly move things from there to common
     arch/x86/kvm/ code.

   - generic: remove local_irq_save/restore from the guest entry and
     exit paths when context tracking is enabled.  The patches are a few
     months old, but we discussed them again at kernel summit.  Andy
     will pick up from here and, in 4.5, try to remove it from the user
     entry/exit paths.

   - PPC: Two bug fixes, see merge commit 370289756b for details"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept
  KVM: svm: unconditionally intercept #DB
  KVM: x86: work around infinite loop in microcode when #AC is delivered
  context_tracking: avoid irq_save/irq_restore on guest entry and exit
  context_tracking: remove duplicate enabled check
  KVM: VMX: Dump TSC multiplier in dump_vmcs()
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: declare a few variables as __read_mostly
  KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
  KVM: PPC: Book3S HV: Don't dynamically split core when already split
  ...
2015-11-12 14:34:06 -08:00
Paolo Bonzini
a96036b8ef KVM: x86: rename update_db_bp_intercept to update_bp_intercept
Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:25 +01:00
Haozhong Zhang
64903d6195 KVM: VMX: Enable and initialize VMX TSC scaling
This patch exhances kvm-intel module to enable VMX TSC scaling and
collects information of TSC scaling ratio during initialization.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:19 +01:00
Haozhong Zhang
4ba76538dd KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
Both VMX and SVM scales the host TSC in the same way in call-back
read_l1_tsc(), so this patch moves the scaling logic from call-back
read_l1_tsc() to a common function kvm_read_l1_tsc().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:18 +01:00
Haozhong Zhang
58ea676787 KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
For both VMX and SVM, if the 2nd argument of call-back
adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale
it first. This patch moves this common TSC scaling logic to its caller
adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to
adjust_tsc_offset_guest().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:17 +01:00
Haozhong Zhang
07c1419a32 KVM: x86: Replace call-back compute_tsc_offset() with a common function
Both VMX and SVM calculate the tsc-offset in the same way, so this
patch removes the call-back compute_tsc_offset() and replaces it with a
common function kvm_compute_tsc_offset().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
381d585c80 KVM: x86: Replace call-back set_tsc_khz() with a common function
Both VMX and SVM propagate virtual_tsc_khz in the same way, so this
patch removes the call-back set_tsc_khz() and replaces it with a common
function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:16 +01:00
Haozhong Zhang
35181e86df KVM: x86: Add a common TSC scaling function
VMX and SVM calculate the TSC scaling ratio in a similar logic, so this
patch generalizes it to a common TSC scaling function.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
[Inline the multiplication and shift steps into mul_u64_u64_shr.  Remove
 BUG_ON.  - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:15 +01:00
Haozhong Zhang
ad721883e9 KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
This patch moves the field of TSC scaling ratio from the architecture
struct vcpu_svm to the common struct kvm_vcpu_arch.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Haozhong Zhang
bc9b961b35 KVM: x86: Collect information for setting TSC scaling ratio
The number of bits of the fractional part of the 64-bit TSC scaling
ratio in VMX and SVM is different. This patch makes the architecture
code to collect the number of fractional bits and other related
information into variables that can be accessed in the common code.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:14 +01:00
Paolo Bonzini
893590c734 KVM: x86: declare a few variables as __read_mostly
These include module parameters and variables that are set by
kvm_x86_ops->hardware_setup.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-10 12:06:13 +01:00
Nicolas Pitre
77c5b5da02 kmap_atomic_to_page() has no users, remove it
Removal started in commit 5bbeed12bd ("sparc32: drop unused
kmap_atomic_to_page").  Let's do it across the whole tree.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Borislav Petkov
79f1d83692 x86/paravirt: Kill some unused patching functions
paravirt_patch_ignore() is completely unused and paravirt_patch_nop()
doesn't do a whole lot. Remove them both.

Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1446542329-32037-1-git-send-email-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 18:09:35 +01:00
Vitaly Kuznetsov
8c058b0b9c x86/irq: Probe for PIC presence before allocating descs for legacy IRQs
Commit d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain
interfaces") brought a regression for Hyper-V Gen2 instances. These
instances don't have i8259 legacy PIC but they use legacy IRQs for serial
port, rtc, and acpi. With this commit included we end up with these IRQs
not initialized. Earlier, there was a special workaround for legacy IRQs
in mp_map_pin_to_irq() doing mp_irqdomain_map() without looking at
nr_legacy_irqs() and now we fail in __irq_domain_alloc_irqs() when
irq_domain_alloc_descs() returns -EEXIST.

The essence of the issue seems to be that early_irq_init() calls
arch_probe_nr_irqs() to figure out the number of legacy IRQs before
we probe for i8259 and gets 16. Later when init_8259A() is called we switch
to NULL legacy PIC and nr_legacy_irqs() starts to return 0 but we already
have 16 descs allocated.

Solve the issue by separating i8259 probe from init and calling it in
arch_probe_nr_irqs() before we actually use nr_legacy_irqs() information.

Fixes: d32932d02e ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1446543614-3621-1-git-send-email-vkuznets@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-07 10:37:37 +01:00
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Thomas Gleixner
7e29393b20 x86/apic: Provide default send single IPI wrapper
Instead of doing the wrapping in the smp code we can provide a default
wrapper for those APICs which insist on cpumasks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220849.631111846@linutronix.de
2015-11-05 13:07:53 +01:00
Thomas Gleixner
53be0fac8b x86/apic: Implement default single target IPI function
apic_physflat and bigsmp_apic can share that implementation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.898543767@linutronix.de
2015-11-05 13:07:52 +01:00
Linus Torvalds
539da78772 x86/apic: Add a single-target IPI function to the apic
We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Borislav Petkov <bp@alien.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Travis <travis@sgi.com>
Cc: Daniel J Blueman <daniel@numascale.com>
Link: http://lkml.kernel.org/r/20151104220848.737120838@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-11-05 13:07:51 +01:00
Linus Torvalds
0d51ce9ca1 Power management and ACPI updates for v4.4-rc1
- ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).
 
    The most significant change is to allow the AML debugger to be
    built into the kernel.  On top of that there is an update related
    to the NFIT table (the ACPI persistent memory interface)
    and a few fixes and cleanups.
 
  - ACPI CPPC2 (Collaborative Processor Performance Control v2)
    support along with a cpufreq frontend (Ashwin Chaugule).
 
    This can only be enabled on ARM64 at this point.
 
  - New ACPI infrastructure for the early probing of IRQ chips and
    clock sources (Marc Zyngier).
 
  - Support for a new hierarchical properties extension of the ACPI
    _DSD (Device Specific Data) device configuration object allowing
    the kernel to handle hierarchical properties (provided by the
    platform firmware this way) automatically and make them available
    to device drivers via the generic device properties interface
    (Rafael Wysocki).
 
  - Generic device properties API extension to obtain an index of
    certain string value in an array of strings, along the lines of
    of_property_match_string(), but working for all of the supported
    firmware node types, and support for the "dma-names" device
    property based on it (Mika Westerberg).
 
  - ACPI core fix to parse the MADT (Multiple APIC Description Table)
    entries in the order expected by platform firmware (and mandated
    by the specification) to avoid confusion on systems with more than
    255 logical CPUs (Lukasz Anaczkowski).
 
  - Consolidation of the ACPI-based handling of PCI host bridges
    on x86 and ia64 (Jiang Liu).
 
  - ACPI core fixes to ensure that the correct IRQ number is used to
    represent the SCI (System Control Interrupt) in the cases when
    it has been re-mapped (Chen Yu).
 
  - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).
 
  - ACPI EC driver fixes (Lv Zheng).
 
  - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
    Kosina, Rami Rosen, Rasmus Villemoes).
 
  - New mechanism in the PM core allowing drivers to check if the
    platform firmware is going to be involved in the upcoming system
    suspend or if it has been involved in the suspend the system is
    resuming from at the moment (Rafael Wysocki).
 
    This should allow drivers to optimize their suspend/resume
    handling in some cases and the changes include a couple of users
    of it (the i8042 input driver, PCI PM).
 
  - PCI PM fix to prevent runtime-suspended devices with PME enabled
    from being resumed during system suspend even if they aren't
    configured to wake up the system from sleep (Rafael Wysocki).
 
  - New mechanism to report the number of a wakeup IRQ that woke up
    the system from sleep last time (Alexandra Yates).
 
  - Removal of unused interfaces from the generic power domains
    framework and fixes related to latency measurements in that
    code (Ulf Hansson, Daniel Lezcano).
 
  - cpufreq core sysfs interface rework to make it handle CPUs that
    share performance scaling settings (represented by a common
    cpufreq policy object) more symmetrically (Viresh Kumar).
 
    This should help to simplify the CPU offline/online handling among
    other things.
 
  - cpufreq core fixes and cleanups (Viresh Kumar).
 
  - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
    mechanism on client platforms which causes the turbo P-states
    range to vary depending on platform firmware settings (Srinivas
    Pandruvada).
 
  - intel_pstate sysfs interface fix (Prarit Bhargava).
 
  - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
    and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
    Bhat, Luis de Bethencourt).
 
  - cpuidle mvebu driver cleanups (Russell King).
 
  - OPP (Operating Performance Points) framework code reorganization
    to make it more maintainable (Viresh Kumar).
 
  - Intel Broxton support for the RAPL (Running Average Power Limits)
    power capping driver (Amy Wiles).
 
  - Assorted power management code fixes and cleanups (Dan Carpenter,
    Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
    Villemoes).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJWOC9oAAoJEILEb/54YlRx/c8P/joflwoFsISwJccG62YTQMuc
 bMQKM4Kw0vl5La8+pkLpe5t6+mW7l81UFtYF6Dzd8LOKlD9sszD34z1lHmCeT/oR
 wn0uZpHagRyLMUfoyiEtlU/VRU6WQNNtS3EgjwUi7xgFz9Q0pjcCZ9OQ6vKov1j5
 +6j40ODif5sgo+2vl+rztJiV0SIMkYdkgNqgfN1FE9bdLA2Zkk+PxxJbtGQORuDu
 O/K+XhQT2xWquVWi/1p+VtQxs5glBS1oKm0kogV5bElCvNTRNIVABUNcjogITQwo
 QSAKgoCKIoaIl5jtDT6u5dc0y67q/dMtqOY9fOCcOz1Z7jbWQzR8D7mpFWIsJUPK
 K2LClI3t85ynpN6Jref246A6+C9nwB8JMAiAR11oBw7WbBlkd6tbRgcT5B+iz8UE
 FuCCif7pha/Fs+Jt1YRazscIqteQ2bAhhxikuIPMfw2M6M67MNfVNeKA1bAoSM34
 dH7JsilblitvV7shrwJHwXPXCOF2jEPoK8I4/q2+TR5qUxEpRJjelQxXGSaQScMZ
 iNnjeTgv8H8q+rY5Yjzsl4pxP0Fvf7IuqkptWOJbgepg4cQc9pS87wOpY3uEeQzr
 H7ruaQJFCnLO4aXbPNClsiJARhrBk+qMlsh4vBEyCJ2T0ucb+nIUcN4BTi8t85yl
 X97BfHHUiDoUrnIsNids
 =1gaH
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "Quite a new features are included this time.

  First off, the Collaborative Processor Performance Control interface
  (version 2) defined by ACPI will now be supported on ARM64 along with
  a cpufreq frontend for CPU performance scaling.

  Second, ACPI gets a new infrastructure for the early probing of IRQ
  chips and clock sources (along the lines of the existing similar
  mechanism for DT).

  Next, the ACPI core and the generic device properties API will now
  support a recently introduced hierarchical properties extension of the
  _DSD (Device Specific Data) ACPI device configuration object.  If the
  ACPI platform firmware uses that extension to organize device
  properties in a hierarchical way, the kernel will automatically handle
  it and make those properties available to device drivers via the
  generic device properties API.

  It also will be possible to build the ACPICA's AML interpreter
  debugger into the kernel now and use that to diagnose AML-related
  problems more efficiently.  In the future, this should make it
  possible to single-step AML execution and do similar things.
  Interesting stuff, although somewhat experimental at this point.

  Finally, the PM core gets a new mechanism that can be used by device
  drivers to distinguish between suspend-to-RAM (based on platform
  firmware support) and suspend-to-idle (or other variants of system
  suspend the platform firmware is not involved in) and possibly
  optimize their device suspend/resume handling accordingly.

  In addition to that, some existing features are re-organized quite
  substantially.

  First, the ACPI-based handling of PCI host bridges on x86 and ia64 is
  unified and the common code goes into the ACPI core (so as to reduce
  code duplication and eliminate non-essential differences between the
  two architectures in that area).

  Second, the Operating Performance Points (OPP) framework is
  reorganized to make the code easier to find and follow.

  Next, the cpufreq core's sysfs interface is reorganized to get rid of
  the "primary CPU" concept for configurations in which the same
  performance scaling settings are shared between multiple CPUs.

  Finally, some interfaces that aren't necessary any more are dropped
  from the generic power domains framework.

  On top of the above we have some minor extensions, cleanups and bug
  fixes in multiple places, as usual.

  Specifics:

   - ACPICA update to upstream revision 20150930 (Bob Moore, Lv Zheng).

     The most significant change is to allow the AML debugger to be
     built into the kernel.  On top of that there is an update related
     to the NFIT table (the ACPI persistent memory interface) and a few
     fixes and cleanups.

   - ACPI CPPC2 (Collaborative Processor Performance Control v2) support
     along with a cpufreq frontend (Ashwin Chaugule).

     This can only be enabled on ARM64 at this point.

   - New ACPI infrastructure for the early probing of IRQ chips and
     clock sources (Marc Zyngier).

   - Support for a new hierarchical properties extension of the ACPI
     _DSD (Device Specific Data) device configuration object allowing
     the kernel to handle hierarchical properties (provided by the
     platform firmware this way) automatically and make them available
     to device drivers via the generic device properties interface
     (Rafael Wysocki).

   - Generic device properties API extension to obtain an index of
     certain string value in an array of strings, along the lines of
     of_property_match_string(), but working for all of the supported
     firmware node types, and support for the "dma-names" device
     property based on it (Mika Westerberg).

   - ACPI core fix to parse the MADT (Multiple APIC Description Table)
     entries in the order expected by platform firmware (and mandated by
     the specification) to avoid confusion on systems with more than 255
     logical CPUs (Lukasz Anaczkowski).

   - Consolidation of the ACPI-based handling of PCI host bridges on x86
     and ia64 (Jiang Liu).

   - ACPI core fixes to ensure that the correct IRQ number is used to
     represent the SCI (System Control Interrupt) in the cases when it
     has been re-mapped (Chen Yu).

   - New ACPI backlight quirk for Lenovo IdeaPad S405 (Hans de Goede).

   - ACPI EC driver fixes (Lv Zheng).

   - Assorted ACPI fixes and cleanups (Dan Carpenter, Insu Yun, Jiri
     Kosina, Rami Rosen, Rasmus Villemoes).

   - New mechanism in the PM core allowing drivers to check if the
     platform firmware is going to be involved in the upcoming system
     suspend or if it has been involved in the suspend the system is
     resuming from at the moment (Rafael Wysocki).

     This should allow drivers to optimize their suspend/resume handling
     in some cases and the changes include a couple of users of it (the
     i8042 input driver, PCI PM).

   - PCI PM fix to prevent runtime-suspended devices with PME enabled
     from being resumed during system suspend even if they aren't
     configured to wake up the system from sleep (Rafael Wysocki).

   - New mechanism to report the number of a wakeup IRQ that woke up the
     system from sleep last time (Alexandra Yates).

   - Removal of unused interfaces from the generic power domains
     framework and fixes related to latency measurements in that code
     (Ulf Hansson, Daniel Lezcano).

   - cpufreq core sysfs interface rework to make it handle CPUs that
     share performance scaling settings (represented by a common cpufreq
     policy object) more symmetrically (Viresh Kumar).

     This should help to simplify the CPU offline/online handling among
     other things.

   - cpufreq core fixes and cleanups (Viresh Kumar).

   - intel_pstate fixes related to the Turbo Activation Ratio (TAR)
     mechanism on client platforms which causes the turbo P-states range
     to vary depending on platform firmware settings (Srinivas
     Pandruvada).

   - intel_pstate sysfs interface fix (Prarit Bhargava).

   - Assorted cpufreq driver (imx, tegra20, powernv, integrator) fixes
     and cleanups (Bai Ping, Bartlomiej Zolnierkiewicz, Shilpasri G
     Bhat, Luis de Bethencourt).

   - cpuidle mvebu driver cleanups (Russell King).

   - OPP (Operating Performance Points) framework code reorganization to
     make it more maintainable (Viresh Kumar).

   - Intel Broxton support for the RAPL (Running Average Power Limits)
     power capping driver (Amy Wiles).

   - Assorted power management code fixes and cleanups (Dan Carpenter,
     Geert Uytterhoeven, Geliang Tang, Luis de Bethencourt, Rasmus
     Villemoes)"

* tag 'pm+acpi-4.4-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (108 commits)
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  PM / Domains: Merge measurements for PM QoS device latencies
  PM / Domains: Don't measure ->start|stop() latency in system PM callbacks
  PM / clk: Fix broken build due to non-matching code and header #ifdefs
  ACPI / Documentation: add copy_dsdt to ACPI format options
  ACPI / sysfs: correctly check failing memory allocation
  ACPI / video: Add a quirk to force native backlight on Lenovo IdeaPad S405
  ACPI / CPPC: Fix potential memory leak
  ACPI / CPPC: signedness bug in register_pcc_channel()
  ACPI / PAD: power_saving_thread() is not freezable
  ACPI / PM: Fix incorrect wakeup IRQ setting during suspend-to-idle
  ACPI: Using correct irq when waiting for events
  ACPI: Use correct IRQ when uninstalling ACPI interrupt handler
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
  ...
2015-11-04 18:10:13 -08:00
Linus Torvalds
41ecf1404b xen: features for 4.4-rc0
- Improve balloon driver memory hotplug placement.
 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).
 - Support 64 KiB guest pages on arm64.
 - CPU hotplug support on arm/arm64.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOeSkAAoJEFxbo/MsZsTRph0H/0nE8Tx0GyGtOyCYfBdInTvI
 WgjvL8VR1XrweZMVis3668MzhLSYg6b5lvJsoi+L3jlzYRyze43iHXsKfvp+8p0o
 TVUhFnlHEHF8ASEtPydAi6HgS7Dn9OQ9LaZ45R1Gk0rHnwJjIQonhTn2jB0yS9Am
 Hf4aZXP2NVZphjYcloqNsLH0G6mGLtgq8cS0uKcVO2YIrR4Dr3sfj9qfq9mflf8n
 sA/5ifoHRfOUD1vJzYs4YmIBUv270jSsprWK/Mi2oXIxUTBpKRAV1RVCAPW6GFci
 HIZjIJkjEPWLsvxWEs0dUFJQGp3jel5h8vFPkDWBYs3+9rILU2DnLWpKGNDHx3k=
 =vUfa
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:

 - Improve balloon driver memory hotplug placement.

 - Use unpopulated hotplugged memory for foreign pages (if
   supported/enabled).

 - Support 64 KiB guest pages on arm64.

 - CPU hotplug support on arm/arm64.

* tag 'for-linus-4.4-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (44 commits)
  xen: fix the check of e_pfn in xen_find_pfn_range
  x86/xen: add reschedule point when mapping foreign GFNs
  xen/arm: don't try to re-register vcpu_info on cpu_hotplug.
  xen, cpu_hotplug: call device_offline instead of cpu_down
  xen/arm: Enable cpu_hotplug.c
  xenbus: Support multiple grants ring with 64KB
  xen/grant-table: Add an helper to iterate over a specific number of grants
  xen/xenbus: Rename *RING_PAGE* to *RING_GRANT*
  xen/arm: correct comment in enlighten.c
  xen/gntdev: use types from linux/types.h in userspace headers
  xen/gntalloc: use types from linux/types.h in userspace headers
  xen/balloon: Use the correct sizeof when declaring frame_list
  xen/swiotlb: Add support for 64KB page granularity
  xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
  arm/xen: Add support for 64KB page granularity
  xen/privcmd: Add support for Linux 64KB page granularity
  net/xen-netback: Make it running on 64KB page granularity
  net/xen-netfront: Make it running on 64KB page granularity
  block/xen-blkback: Make it running on 64KB page granularity
  block/xen-blkfront: Make it running on 64KB page granularity
  ...
2015-11-04 17:32:42 -08:00
Linus Torvalds
e627078a0c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There is only one new feature in this pull for the 4.4 merge window,
  most of it is small enhancements, cleanup and bug fixes:

   - Add the s390 backend for the software dirty bit tracking.  This
     adds two new pgtable functions pte_clear_soft_dirty and
     pmd_clear_soft_dirty which is why there is a hit to
     arch/x86/include/asm/pgtable.h in this pull request.

   - A series of cleanup patches for the AP bus, this includes the
     removal of the support for two outdated crypto cards (PCICC and
     PCICA).

   - The irq handling / signaling on buffer full in the runtime
     instrumentation code is dropped.

   - Some micro optimizations: remove unnecessary memory barriers for a
     couple of functions: [smb_]rmb, [smb_]wmb, atomics, bitops, and for
     spin_unlock.  Use the builtin bswap if available and make
     test_and_set_bit_lock more cache friendly.

   - Statistics and a tracepoint for the diagnose calls to the
     hypervisor.

   - The CPU measurement facility support to sample KVM guests is
     improved.

   - The vector instructions are now always enabled for user space
     processes if the hardware has the vector facility.  This simplifies
     the FPU handling code.  The fpu-internal.h header is split into fpu
     internals, api and types just like x86.

   - Cleanup and improvements for the common I/O layer.

   - Rework udelay to solve a problem with kprobe.  udelay has busy loop
     semantics but still uses an idle processor state for the wait"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (66 commits)
  s390: remove runtime instrumentation interrupts
  s390/cio: de-duplicate subchannel validation
  s390/css: unneeded initialization in for_each_subchannel
  s390/Kconfig: use builtin bswap
  s390/dasd: fix disconnected device with valid path mask
  s390/dasd: fix invalid PAV assignment after suspend/resume
  s390/dasd: fix double free in dasd_eckd_read_conf
  s390/kernel: fix ptrace peek/poke for floating point registers
  s390/cio: move ccw_device_stlck functions
  s390/cio: move ccw_device_call_handler
  s390/topology: reduce per_cpu() invocations
  s390/nmi: reduce size of percpu variable
  s390/nmi: fix terminology
  s390/nmi: remove casts
  s390/nmi: remove pointless error strings
  s390: don't store registers on disabled wait anymore
  s390: get rid of __set_psw_mask()
  s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
  s390/dasd: fix list_del corruption after lcu changes
  s390/spinlock: remove unneeded serializations at unlock
  ...
2015-11-04 11:31:31 -08:00
Radim Krčmář
7a036a6f67 KVM: x86: add read_phys to x86_emulate_ops
We want to read the physical memory when emulating RSM.

X86EMUL_IO_NEEDED is returned on all errors for consistency with other
helpers.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-04 16:24:31 +01:00
Paolo Bonzini
197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Linus Torvalds
66ef3493d4 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform changes from Ingo Molnar:
 "Misc updates to the Intel MID and SGI UV platforms"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/intel-mid: Make intel_mid_ops static
  arch/x86/intel-mid: Use kmemdup rather than duplicating its implementation
  x86/platform/uv: Implement simple dump failover if kdump fails
  x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmi
2015-11-03 21:33:18 -08:00
Linus Torvalds
639ab3eb38 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm changes from Ingo Molnar:
 "The main changes are: continued PAT work by Toshi Kani, plus a new
  boot time warning about insecure RWX kernel mappings, by Stephen
  Smalley.

  The new CONFIG_DEBUG_WX=y warning is marked default-y if
  CONFIG_DEBUG_RODATA=y is already eanbled, as a special exception, as
  these bugs are hard to notice and this check already found several
  live bugs"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Warn on W^X mappings
  x86/mm: Fix no-change case in try_preserve_large_page()
  x86/mm: Fix __split_large_page() to handle large PAT bit
  x86/mm: Fix try_preserve_large_page() to handle large PAT bit
  x86/mm: Fix gup_huge_p?d() to handle large PAT bit
  x86/mm: Fix slow_virt_to_phys() to handle large PAT bit
  x86/mm: Fix page table dump to show PAT bit
  x86/asm: Add pud_pgprot() and pmd_pgprot()
  x86/asm: Fix pud/pmd interfaces to handle large PAT bit
  x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
  x86/asm: Move PUD_PAGE macros to page_types.h
  x86/vdso32: Define PGTABLE_LEVELS to 32bit VDSO
2015-11-03 21:23:56 -08:00
Linus Torvalds
4302d506d5 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 sigcontext header cleanups from Ingo Molnar:
 "This series reorganizes and cleans up various aspects of the main
  sigcontext UAPI headers, such as unifying the data structures and
  updating/adding lots of comments to explain all the ABI details and
  quirks.  The headers can now also be built in user-space standalone"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Clean up too long lines
  x86/headers: Remove <asm/sigcontext.h> references on the kernel side
  x86/headers: Remove direct sigcontext32.h uses
  x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
  x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
  x86/headers: Make sigcontext pointers bit independent
  x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
  x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
  x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
  x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
  x86/headers: Unify register type definitions between 32-bit compat and i386
  x86/headers: Use ABI types consistently in sigcontext*.h
  x86/headers: Separate out legacy user-space structure definitions
  x86/headers: Clean up and better document uapi/asm/sigcontext.h
  x86/headers: Clean up uapi/asm/sigcontext32.h
  x86/headers: Fix (old) header file dependency bug in uapi/asm/sigcontext32.h
2015-11-03 21:05:40 -08:00
Linus Torvalds
ce4d72fac1 Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu changes from Ingo Molnar:
 "There are two main areas of changes:

   - Rework of the extended FPU state code to robustify the kernel's
     usage of cpuid provided xstate sizes - and related changes (Dave
     Hansen)"

   - math emulation enhancements: new modern FPU instructions support,
     with testcases, plus cleanups (Denys Vlasnko)"

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/fpu: Fixup uninitialized feature_name warning
  x86/fpu/math-emu: Add support for FISTTP instructions
  x86/fpu/math-emu, selftests: Add test for FISTTP instructions
  x86/fpu/math-emu: Add support for FCMOVcc insns
  x86/fpu/math-emu: Add support for F[U]COMI[P] insns
  x86/fpu/math-emu: Remove define layer for undocumented opcodes
  x86/fpu/math-emu, selftests: Add tests for FCMOV and FCOMI insns
  x86/fpu/math-emu: Remove !NO_UNDOC_CODE
  x86/fpu: Check CPU-provided sizes against struct declarations
  x86/fpu: Check to ensure increasing-offset xstate offsets
  x86/fpu: Correct and check XSAVE xstate size calculations
  x86/fpu: Add C structures for AVX-512 state components
  x86/fpu: Rework YMM definition
  x86/fpu/mpx: Rework MPX 'xstate' types
  x86/fpu: Add xfeature_enabled() helper instead of test_bit()
  x86/fpu: Remove 'xfeature_nr'
  x86/fpu: Rework XSTATE_* macros to remove magic '2'
  x86/fpu: Rename XFEATURES_NR_MAX
  x86/fpu: Rename XSAVE macros
  x86/fpu: Remove partial LWP support definitions
  ...
2015-11-03 20:50:26 -08:00
Linus Torvalds
f323c49b30 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu changes from Ingo Molnar:
 "Two changes in this cycle: a Kconfig help text enhancement, and an AMD
  CLZERO instruction capability detection and enumeration"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add CLZERO detection
  x86/Kconfig/cpus: Fix/complete CPU type help texts
2015-11-03 19:39:42 -08:00
Linus Torvalds
378e4e9825 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot cleanup from Ingo Molnar:
 "A single commit: remove an obsolete kcrash boot flag"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kexec: Remove obsolete 'in_crash_kexec' flag
2015-11-03 19:28:37 -08:00
Linus Torvalds
a75a3f6fc9 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The main change in this cycle is another step in the big x86 system
  call interface rework by Andy Lutomirski, which moves most of the low
  level x86 entry code from assembly to C, for all syscall entries
  except native 64-bit system calls:

    arch/x86/entry/entry_32.S        | 182 ++++------
    arch/x86/entry/entry_64_compat.S | 547 ++++++++-----------------------
    194 insertions(+), 535 deletions(-)

  ... our hope is that the final remaining step (converting native
  64-bit system calls) will be less painful as all the previous steps,
  given that most of the legacies and quirks are concentrated around
  native 32-bit and compat environments"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/entry/32: Fix FS and GS restore in opportunistic SYSEXIT
  x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on
  um/x86: Fix build after x86 syscall changes
  x86/asm: Remove the xyz_cfi macros from dwarf2.h
  selftests/x86: Style fixes for the 'unwind_vdso' test
  x86/entry/64/compat: Document sysenter_fix_flags's reason for existence
  x86/entry: Split and inline syscall_return_slowpath()
  x86/entry: Split and inline prepare_exit_to_usermode()
  x86/entry: Use pt_regs_to_thread_info() in syscall entry tracing
  x86/entry: Hide two syscall entry assertions behind CONFIG_DEBUG_ENTRY
  x86/entry: Micro-optimize compat fast syscall arg fetch
  x86/entry: Force inlining of 32-bit syscall code
  x86/entry: Make irqs_disabled checks in exit code depend on lockdep
  x86/entry: Remove unnecessary IRQ twiddling in fast 32-bit syscalls
  x86/asm: Remove thread_info.sysenter_return
  x86/entry/32: Re-implement SYSENTER using the new C path
  x86/entry/32: Switch INT80 to the new C syscall path
  x86/entry/32: Open-code return tracking from fork and kthreads
  x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
  x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
  ...
2015-11-03 18:59:10 -08:00
Linus Torvalds
d2bea739f8 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic changes from Ingo Molnar:
 "The main changes in this cycle were:

   - Numachip updates: new hardware support, fixes and cleanups.
     (Daniel J Blueman)

   - misc smaller cleanups and fixlets"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/io_apic: Make eoi_ioapic_pin() static
  x86/irq: Drop unlikely before IS_ERR_OR_NULL
  x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
  x86/apic: Deinline various functions
  x86/numachip: Fix timer build conflict
  x86/numachip: Introduce Numachip2 timer mechanisms
  x86/numachip: Add Numachip IPI optimisations
  x86/numachip: Add Numachip2 APIC support
  x86/numachip: Cleanup Numachip support
2015-11-03 18:33:15 -08:00
Linus Torvalds
53528695ff Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "The main changes in this cycle were:

   - sched/fair load tracking fixes and cleanups (Byungchul Park)

   - Make load tracking frequency scale invariant (Dietmar Eggemann)

   - sched/deadline updates (Juri Lelli)

   - stop machine fixes, cleanups and enhancements for bugs triggered by
     CPU hotplug stress testing (Oleg Nesterov)

   - scheduler preemption code rework: remove PREEMPT_ACTIVE and related
     cleanups (Peter Zijlstra)

   - Rework the sched_info::run_delay code to fix races (Peter Zijlstra)

   - Optimize per entity utilization tracking (Peter Zijlstra)

   - ... misc other fixes, cleanups and smaller updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  sched: Don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS
  sched: Move cpu_active() tests from stop_two_cpus() into migrate_swap_stop()
  sched: Start stopper early
  stop_machine: Kill cpu_stop_threads->setup() and cpu_stop_unpark()
  stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark()
  stop_machine: Change cpu_stop_queue_two_works() to rely on stopper->enabled
  stop_machine: Introduce __cpu_stop_queue_work() and cpu_stop_queue_two_works()
  stop_machine: Ensure that a queued callback will be called before cpu_stop_park()
  sched/x86: Fix typo in __switch_to() comments
  sched/core: Remove a parameter in the migrate_task_rq() function
  sched/core: Drop unlikely behind BUG_ON()
  sched/core: Fix task and run queue sched_info::run_delay inconsistencies
  sched/numa: Fix task_tick_fair() from disabling numa_balancing
  sched/core: Add preempt_count invariant check
  sched/core: More notrace annotations
  sched/core: Kill PREEMPT_ACTIVE
  sched/core, sched/x86: Kill thread_info::saved_preempt_count
  sched/core: Simplify preempt_count tests
  sched/core: Robustify preemption leak checks
  sched/core: Stop setting PREEMPT_ACTIVE
  ...
2015-11-03 18:03:50 -08:00
Linus Torvalds
b831ef2cad Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS changes from Ingo Molnar:
 "The main system reliability related changes were from x86, but also
  some generic RAS changes:

   - AMD MCE error injection subsystem enhancements.  (Aravind
     Gopalakrishnan)

   - Fix MCE and CPU hotplug interaction bug.  (Ashok Raj)

   - kcrash bootup robustness fix.  (Baoquan He)

   - kcrash cleanups.  (Borislav Petkov)

   - x86 microcode driver rework: simplify it by unmodularizing it and
     other cleanups.  (Borislav Petkov)"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
  x86/mce: Add a Scalable MCA vendor flags bit
  MAINTAINERS: Unify the microcode driver section
  x86/microcode/intel: Move #ifdef DEBUG inside the function
  x86/microcode/amd: Remove maintainers from comments
  x86/microcode: Remove modularization leftovers
  x86/microcode: Merge the early microcode loader
  x86/microcode: Unmodularize the microcode driver
  x86/mce: Fix thermal throttling reporting after kexec
  kexec/crash: Say which char is the unrecognized
  x86/setup/crash: Check memblock_reserve() retval
  x86/setup/crash: Cleanup some more
  x86/setup/crash: Remove alignment variable
  x86/setup: Cleanup crashkernel reservation functions
  x86/amd_nb, EDAC: Rename amd_get_node_id()
  x86/setup: Do not reserve crashkernel high memory if low reservation failed
  x86/microcode/amd: Do not overwrite final patch levels
  x86/microcode/amd: Extract current patch level read to a function
  x86/ras/mce_amd_inj: Inject bank 4 errors on the NBC
  x86/ras/mce_amd_inj: Trigger deferred and thresholding errors interrupts
  ...
2015-11-03 17:51:33 -08:00
Linus Torvalds
d63a978865 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking changes from Ingo Molnar:
 "The main changes in this cycle were:

   - More gradual enhancements to atomic ops: new atomic*_read_ctrl()
     ops, synchronize atomic_{read,set}() ordering requirements between
     architectures, add atomic_long_t bitops.  (Peter Zijlstra)

   - Add _{relaxed|acquire|release}() variants for inc/dec atomics and
     use them in various locking primitives: mutex, rtmutex, mcs, rwsem.
     This enables weakly ordered architectures (such as arm64) to make
     use of more locking related optimizations.  (Davidlohr Bueso)

   - Implement atomic[64]_{inc,dec}_relaxed() on ARM.  (Will Deacon)

   - Futex kernel data cache footprint micro-optimization.  (Rasmus
     Villemoes)

   - pvqspinlock runtime overhead micro-optimization.  (Waiman Long)

   - misc smaller fixlets"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ARM, locking/atomics: Implement _relaxed variants of atomic[64]_{inc,dec}
  locking/rwsem: Use acquire/release semantics
  locking/mcs: Use acquire/release semantics
  locking/rtmutex: Use acquire/release semantics
  locking/mutex: Use acquire/release semantics
  locking/asm-generic: Add _{relaxed|acquire|release}() variants for inc/dec atomics
  atomic: Implement atomic_read_ctrl()
  atomic, arch: Audit atomic_{read,set}()
  atomic: Add atomic_long_t bitops
  futex: Force hot variables into a single cache line
  locking/pvqspinlock: Kick the PV CPU unconditionally when _Q_SLOW_VAL
  locking/osq: Relax atomic semantics
  locking/qrwlock: Rename ->lock to ->wait_lock
  locking/Documentation/lockstat: Fix typo - lokcing -> locking
  locking/atomics, cmpxchg: Privatize the inclusion of asm/cmpxchg.h
2015-11-03 16:10:43 -08:00
Linus Torvalds
f5a8160c1e Merge branch 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI changes from Ingo Molnar:
 "The main changes in this cycle were:

   - further EFI code generalization to make it more workable for ARM64
   - various extensions, such as 64-bit framebuffer address support,
     UEFI v2.5 EFI_PROPERTIES_TABLE support
   - code modularization simplifications and cleanups
   - new debugging parameters
   - various fixes and smaller additions"

* 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  efi: Fix warning of int-to-pointer-cast on x86 32-bit builds
  efi: Use correct type for struct efi_memory_map::phys_map
  x86/efi: Fix kernel panic when CONFIG_DEBUG_VIRTUAL is enabled
  efi: Add "efi_fake_mem" boot option
  x86/efi: Rename print_efi_memmap() to efi_print_memmap()
  efi: Auto-load the efi-pstore module
  efi: Introduce EFI_NX_PE_DATA bit and set it from properties table
  efi: Add support for UEFIv2.5 Properties table
  efi: Add EFI_MEMORY_MORE_RELIABLE support to efi_md_typeattr_format()
  efifb: Add support for 64-bit frame buffer addresses
  efi/arm64: Clean up efi_get_fdt_params() interface
  arm64: Use core efi=debug instead of uefi_debug command line parameter
  efi/x86: Move efi=debug option parsing to core
  drivers/firmware: Make efi/esrt.c driver explicitly non-modular
  efi: Use the generic efi.memmap instead of 'memmap'
  acpi/apei: Use appropriate pgprot_t to map GHES memory
  arm64, acpi/apei: Implement arch_apei_get_mem_attributes()
  arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
  acpi, x86: Implement arch_apei_get_mem_attributes()
  efi, x86: Rearrange efi_mem_attributes()
  ...
2015-11-03 15:05:52 -08:00
Linus Torvalds
7b2a4306f9 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "The timer departement provides:

   - More y2038 work in the area of ntp and pps.

   - Optimization of posix cpu timers

   - New time related selftests

   - Some new clocksource drivers

   - The usual pile of fixes, cleanups and improvements"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  timeconst: Update path in comment
  timers/x86/hpet: Type adjustments
  clocksource/drivers/armada-370-xp: Implement ARM delay timer
  clocksource/drivers/tango_xtal: Add new timer for Tango SoCs
  clocksource/drivers/imx: Allow timer irq affinity change
  clocksource/drivers/exynos_mct: Use container_of() instead of this_cpu_ptr()
  clocksource/drivers/h8300_*: Remove unneeded memset()s
  clocksource/drivers/sh_cmt: Remove unneeded memset() in sh_cmt_setup()
  clocksource/drivers/em_sti: Remove unneeded memset()s
  clocksource/drivers/mediatek: Use GPT as sched clock source
  clockevents/drivers/mtk: Fix spurious interrupt leading to crash
  posix_cpu_timer: Reduce unnecessary sighand lock contention
  posix_cpu_timer: Convert cputimer->running to bool
  posix_cpu_timer: Check thread timers only when there are active thread timers
  posix_cpu_timer: Optimize fastpath_timer_check()
  timers, kselftest: Add 'adjtick' test to validate adjtimex() tick adjustments
  timers: Use __fls in apply_slack()
  clocksource: Remove return statement from void functions
  net: sfc: avoid using timespec
  ntp/pps: use y2038 safe types in pps_event_time
  ...
2015-11-03 14:13:41 -08:00
Rafael J. Wysocki
69f8947b8c Merge branches 'pm-cpufreq' and 'pm-cpuidle'
* pm-cpufreq:
  cpufreq: postfix policy directory with the first CPU in related_cpus
  cpufreq: create cpu/cpufreq/policyX directories
  cpufreq: remove cpufreq_sysfs_{create|remove}_file()
  cpufreq: create cpu/cpufreq at boot time
  cpufreq: Use cpumask_copy instead of cpumask_or to copy a mask
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  cpufreq: intel_pstate: Fix intel_pstate powersave min_perf_pct value
  cpufreq: intel_pstate: Avoid calculation for max/min
  Documentation: kernel_parameters for Intel P state driver
  cpufreq: intel_pstate: Use ACPI perf configuration
  cpufreq: intel-pstate: Use separate max pstate for scaling
  cpufreq: intel_pstate: get P1 from TAR when available
  cpufreq: Drop redundant check for inactive policies
  cpufreq : powernv: Report Pmax throttling if capped below nominal frequency
  cpufreq: imx: update the clock switch flow to support imx6ul
  cpufreq: tegra20: remove superfluous CONFIG_PM ifdefs
  cpufreq: conservative: remove 'enable' field
  cpufreq: integrator: Fix module autoload for OF platform driver

* pm-cpuidle:
  cpuidle: mvebu: disable the bind/unbind attributes and use builtin_platform_driver
  cpuidle: mvebu: clean up multiple platform drivers
2015-11-02 00:54:10 +01:00
Wan Zongshun
2167ceabf3 x86/cpu: Add CLZERO detection
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.

Add the bit here to allow /proc/cpuinfo to list the feature.

Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.

Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:23 +01:00
Aravind Gopalakrishnan
c7f54d21fb x86/mce: Add a Scalable MCA vendor flags bit
Scalable MCA (SMCA) is a new feature in AMD Fam17h processors
which indicates presence of MCA extensions.

MCA extensions expands existing register space for the MCE banks
and also introduces a new MSR range to accommodate new banks.

Add the detection bit.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Reformat mce_vendor_flags definitions and save indentation levels. Improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1446207099-24948-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:13 +01:00
Linus Torvalds
0386729247 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Misc fixes: two KASAN fixes, two EFI boot fixes, two boot-delay
  optimization fixes, and a fix for a IRQ handling hang observed on
  virtual platforms"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm, kasan: Silence KASAN warnings in get_wchan()
  compiler, atomics, kasan: Provide READ_ONCE_NOCHECK()
  x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
  x86/smpboot: Fix CPU #1 boot timeout
  x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior
  x86/ioapic: Disable interrupts when re-routing legacy IRQs
  x86/setup: Extend low identity map to cover whole kernel range
  x86/efi: Fix multiple GOP device support
2015-10-23 22:34:32 +09:00
Stefano Stabellini
a314e3eb84 xen/arm: Enable cpu_hotplug.c
Build cpu_hotplug for ARM and ARM64 guests.

Rename arch_(un)register_cpu to xen_(un)register_cpu and provide an
empty implementation on ARM and ARM64. On x86 just call
arch_(un)register_cpu as we are already doing.

Initialize cpu_hotplug on ARM.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2015-10-23 14:20:47 +01:00
Julien Grall
291be10fd7 xen/swiotlb: Pass addresses rather than frame numbers to xen_arch_need_swiotlb
With 64KB page granularity support, the frame number will be different.

It will be easier to modify the behavior in a single place rather than
in each caller.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:43 +01:00
Julien Grall
008c320a96 xen/grant: Introduce helpers to split a page into grant
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.

The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.

Also provide an helper to count the number of grants within a given
contiguous region.

Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-10-23 14:20:33 +01:00
David Vrabel
8edfcf882e x86/xen: export xen_alloc_p2m_entry()
Rename alloc_p2m() to xen_alloc_p2m_entry() and export it.

This is useful for ensuring that a p2m entry is allocated (i.e., not a
shared missing or identity entry) so that subsequent set_phys_to_machine()
calls will require no further allocations.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
---
v3:
- Make xen_alloc_p2m_entry() a nop on auto-xlate guests.
2015-10-23 14:20:28 +01:00
Christoffer Dall
3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
Borislav Petkov
6b26e1bf66 x86/microcode: Remove modularization leftovers
Remove the remaining module functionality leftovers. Make
"dis_ucode_ldr" an early_param and make it static again. Drop
module aliases, autoloading table, description, etc.

Bump version number, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
fe055896c0 x86/microcode: Merge the early microcode loader
Merge the early loader functionality into the driver proper. The
diff is huge but logically, it is simply moving code from the
_early.c files into the main driver.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:12 +02:00
Borislav Petkov
9a2bc335f1 x86/microcode: Unmodularize the microcode driver
Make CONFIG_MICROCODE a bool. It was practically a bool already anyway,
since early loader was forcing it to =y.

Regardless, there's no real reason to have something be a module which
gets built-in on the majority of installations out there. And its not
like there's noticeable change in functionality - we still can load late
microcode - just the module glue disappears.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1445334889-300-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:22:11 +02:00
Jan Beulich
3d45ac4b35 timers/x86/hpet: Type adjustments
Standardize on bool instead of an inconsistent mixture of u8 and plain 'int'.

Also use u32 or 'unsigned int' instead of 'unsigned long' when a 32-bit type
suffices, generating slightly better code on x86-64.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/5624E3A002000078000AC49A@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:17:32 +02:00
Aravind Gopalakrishnan
1a6775c1a2 x86/amd_nb, EDAC: Rename amd_get_node_id()
This function doesn't give us the "Node ID" as the function name
suggests. Rather, it receives a PCI device as argument, checks
the available F3 PCI device IDs in the system and returns the
index of the matching Bus/Device IDs.

Rename it to amd_pci_dev_to_node_id().

No functional change is introduced.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1445246268-26285-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-21 11:10:55 +02:00
Ingo Molnar
6af597de62 Merge branch 'sched/urgent' into sched/core, to pick up fixes and resolve conflicts
Conflicts:
	kernel/sched/fair.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:18:16 +02:00
Ingo Molnar
a1a2ab2ff7 Linux 4.3-rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWJCaDAAoJEHm+PkMAQRiGvJQH/jGlxawu+mnNFjIRXmk8Zucq
 cxDVTPnQ4S1DJMyRSQTqe6ccDgDM/TEISJPZ0Gwc2SQLda27zdQiz05upbqHSPSH
 /FUHMKiMRhhIBOzdfEGR4Ry2YZID+DlG5EWCgRKf/GlgY3STZSpjIBPYd3Rl3zsU
 qragNjtbCE6eUpnLwkv40Q2mzzR3/fZxJ0drgE/QiSF8P0VmvVbrcQMF58vnUQgy
 9URp48mLhmKj+IrElSnvXMG0XLrKn1tRYXGXd3w+x1Nlr//SHYsIuQHUcp6SAVQh
 p2HXLL7eoy8rs45MCiOfXG7VgNWF6HFjVMtrNUcNfgBsLCx0qFMVrKIILrGtKh0=
 =ETze
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc6' into locking/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-20 10:16:46 +02:00
Andrey Ryabinin
a75ca545e8 x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
Declaration of memcpy() is hidden under #ifndef CONFIG_KMEMCHECK.
In asm/efi.h under #ifdef CONFIG_KASAN we #undef memcpy(), due to
which the following happens:

  In file included from arch/x86/kernel/setup.c:96:0:
  ./arch/x86/include/asm/desc.h: In function ‘native_write_idt_entry’:
  ./arch/x86/include/asm/desc.h:122:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]   memcpy(&idt[entry], gate, sizeof(*gate));
    ^
    cc1: some warnings being treated as errors
    make[2]: *** [arch/x86/kernel/setup.o] Error 1

We will get rid of that #undef in asm/efi.h eventually.
But in the meanwhile move memcpy() declaration out of #ifdefs
to fix the build.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1444994933-28328-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-19 10:07:23 +02:00
Rafael J. Wysocki
7855e10294 Merge back earlier cpufreq material for v4.4. 2015-10-16 22:12:02 +02:00
Wanpeng Li
99b83ac893 KVM: nVMX: emulate the INVVPID instruction
Add the INVVPID instruction emulation.

Reviewed-by: Wincy Van <fanwenyi0529@gmail.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-16 10:30:24 +02:00
Srinivas Pandruvada
6a35fc2d6c cpufreq: intel_pstate: get P1 from TAR when available
After Ivybridge, the max non turbo ratio obtained from platform info msr
is not always guaranteed P1 on client platforms. The max non turbo
activation ratio (TAR), determines the max for the current level of TDP.
The ratio in platform info is physical max. The TAR MSR can be locked,
so updating this value is not possible on all platforms.
This change gets this ratio from MSR TURBO_ACTIVATION_RATIO if
available,
but also do some sanity checking to make sure that this value is
correct.
The sanity check involves reading the TDP ratio for the current tdp
control value when platform has configurable TDP present and matching
TAC
with this.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-10-15 01:53:18 +02:00
Linus Torvalds
cfed1e3de4 Bug fixes for system management mode emulation. The first two patches
fix SMM emulation on Nehalem processors.  The others fix some cases
 that became apparent as work progressed on the firmware side.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWHms0AAoJEL/70l94x66DfQkIAIpya6c/1UAthxSTqJ1wFOf8
 ZKp3GCMjUjtm9k88kk6JGOlPiAvWz7CG9BVbptpkJGpgoDzquvr6ZKGG2BV88F17
 MnkZCid4IBW6VeKYy7R2otkKw7+Pw8DTHRQks+VI6BN/KkeaZLzh5J8+FAl4ZaWk
 YX/VulRce6SfZPYuUTRkkK8aebsopZNVG8mwWIGuBYwyH54R3KH1k/euX2joUPwm
 oopzmQLgEWW7e3RsO67T36rIRgEorJLZaiiexvj1djI+e0kEEudvhJ9nC6eB52qa
 oZ9nR0nkkmBmrBF8gldKDZBC+Y/ci1cJLAaoi7tdsp0wVCebPxubwbPOXxKwD8g=
 =ij8Q
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Bug fixes for system management mode emulation.

  The first two patches fix SMM emulation on Nehalem processors.  The
  others fix some cases that became apparent as work progressed on the
  firmware side"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: fix RSM into 64-bit protected mode
  KVM: x86: fix previous commit for 32-bit
  KVM: x86: fix SMI to halted VCPU
  KVM: x86: clean up kvm_arch_vcpu_runnable
  KVM: x86: map/unmap private slots in __x86_set_memory_region
  KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
2015-10-14 10:01:32 -07:00
Andy Lutomirski
af22aa7c76 x86/asm: Remove the xyz_cfi macros from dwarf2.h
They are currently unused, and I don't think that anyone was
ever particularly happy with them.  They had the unfortunate
property that they made it easy to CFI-annotate things without
thinking about them -- when pushing, do you want to just update
the CFA offset, or do you also want to update the saved location
of the register being pushed?

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447bfbd10bb268b4593b32534ecefa1f4df287e.1444696194.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:56:28 +02:00
Ingo Molnar
790a2ee242 * Make the EFI System Resource Table (ESRT) driver explicitly
non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway - Paul Gortmaker
 
  * Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64 - Leif Lindholm
 
  * Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses - Matt Fleming
 
  * Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel
 
  * Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module - Ben Hutchings
 
  * Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support - Taku Izumi
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWG7OwAAoJEC84WcCNIz1VEEEP/0SsdrwJ66B4MfP5YNjqHYWm
 +OTHR6Ovv2i10kc+NjOV/GN8sWPndnkLfIfJ4EqJ9BoQ9PDEYZilV2aleSQ4DrPm
 H7uGwBXQkfd76tZKX9pMToK76mkhg6M7M2LR3Suv3OGfOEzuozAOt3Ez37lpksTN
 2ByhHr/oGbhu99jC2ki5+k0ySH8PMqDBRxqrPbBzTD+FfB7bM11vAJbSNbSMQ21R
 ZwX0acZBLqb9J2Vf7tDsW+fCfz0TFo8JHW8jdLRFm/y2dpquzxswkkBpODgA8+VM
 0F5UbiUdkaIRug75I6N/OJ8+yLwdzuxm7ul+tbS3JrXGLAlK3850+dP2Pr5zQ2Ce
 zaYGRUy+tD5xMXqOKgzpu+Ia8XnDRLhOlHabiRd5fG6ZC9nR8E9uK52g79voSN07
 pADAJnVB03CGV/HdduDOI4C4UykUKubuArbQVkqWJcecV1Jic/tYI0gjeACmU1VF
 v8FzXpBUe3U3A0jauOz8PBz8M+k5qky/GbIrnEvXreBtKdt999LN9fykTN7rBOpo
 dk/6vTR1Jyv3aYc9EXHmRluktI6KmfWCqmRBOIgQveX1VhdRM+1w2LKC0+8co3dF
 v/DBh19KDyfPI8eOvxKykhn164UeAt03EXqDa46wFGr2nVOm/JiShL/d+QuyYU4G
 8xb/rET4JrhCG4gFMUZ7
 =1Oee
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into core/efi

Pull v4.4 EFI updates from Matt Fleming:

  - Make the EFI System Resource Table (ESRT) driver explicitly
    non-modular by ripping out the module_* code since Kconfig doesn't
    allow it to be built as a module anyway. (Paul Gortmaker)

  - Make the x86 efi=debug kernel parameter, which enables EFI debug
    code and output, generic and usable by arm64. (Leif Lindholm)

  - Add support to the x86 EFI boot stub for 64-bit Graphics Output
    Protocol frame buffer addresses. (Matt Fleming)

  - Detect when the UEFI v2.5 EFI_PROPERTIES_TABLE feature is enabled
    in the firmware and set an efi.flags bit so the kernel knows when
    it can apply more strict runtime mapping attributes - Ard Biesheuvel

  - Auto-load the efi-pstore module on EFI systems, just like we
    currently do for the efivars module. (Ben Hutchings)

  - Add "efi_fake_mem" kernel parameter which allows the system's EFI
    memory map to be updated with additional attributes for specific
    memory ranges. This is useful for testing the kernel code that handles
    the EFI_MEMORY_MORE_RELIABLE memmap bit even if your firmware
    doesn't include support. (Taku Izumi)

Note: there is a semantic conflict between the following two commits:

  8a53554e12 ("x86/efi: Fix multiple GOP device support")
  ae2ee627dc ("efifb: Add support for 64-bit frame buffer addresses")

I fixed up the interaction in the merge commit, changing the type of
current_fb_base from u32 to u64.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:51:34 +02:00
Ingo Molnar
c7d77a7980 Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:05:18 +02:00
Martin Schwidefsky
a7b7617493 mm: add architecture primitives for software dirty bit clearing
There are primitives to create and query the software dirty bits
in a pte or pmd. But the clearing of the software dirty bits is done
in common code with x86 specific page table functions.

Add the missing architecture primitives to clear the software dirty
bits to allow the feature to be used on non-x86 systems, e.g. the
s390 architecture.

Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:05 +02:00
Paolo Bonzini
58f800d5ac Merge branch 'kvm-master' into HEAD
This merge brings in a couple important SMM fixes, which makes it
easier to test latest KVM with unrestricted_guest=0 and to test
the in-progress work on SMM support in the firmware.

Conflicts:
	arch/x86/kvm/x86.c
2015-10-13 21:32:50 +02:00
Paolo Bonzini
1d8007bdee KVM: x86: build kvm_userspace_memory_region in x86_set_memory_region
The next patch will make x86_set_memory_region fill the
userspace_addr.  Since the struct is not used untouched
anymore, it makes sense to build it in x86_set_memory_region
directly; it also simplifies the callers.

Reported-by: Alexandre DERUMIER <aderumier@odiso.com>
Cc: stable@vger.kernel.org
Fixes: 9da0e4d5ac
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-13 18:28:46 +02:00
Borislav Petkov
0399f73299 x86/microcode/amd: Do not overwrite final patch levels
A certain number of patch levels of applied microcode should not
be overwritten by the microcode loader, otherwise bad things
will happen.

Check those and abort update if the current core has one of
those final patch levels applied by the BIOS. 32-bit needs
special handling, of course.

See https://bugzilla.suse.com/show_bug.cgi?id=913996 for more
info.

Tested-by: Peter Kirchgeßner <pkirchgessner@t-online.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1444641762-9437-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 16:15:48 +02:00
Borislav Petkov
2eff73c0a1 x86/microcode/amd: Extract current patch level read to a function
Pave the way for checking the current patch level of the
microcode in a core. We want to be able to do stuff depending on
the patch level - in this case decide whether to update or not.
But that will be added in a later patch.

Drop unused local var uci assignment, while at it.

Integrate a fix for 32-bit and CONFIG_PARAVIRT from Takashi Iwai:

 Use native_rdmsr() in check_current_patch_level() because with
 CONFIG_PARAVIRT enabled and on 32-bit, where we run before
 paging has been enabled, we cannot deref pv_info yet. Or we
 could, but we'd need to access its physical address. This way of
 fixing it is simpler. See:

   https://bugzilla.suse.com/show_bug.cgi?id=943179 for the background.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.com>:
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1444641762-9437-6-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 16:15:48 +02:00
Taku Izumi
0bbea1ce98 x86/efi: Rename print_efi_memmap() to efi_print_memmap()
This patch renames print_efi_memmap() to efi_print_memmap() and
make it global function so that we can invoke it outside of
arch/x86/platform/efi/efi.c

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2015-10-12 14:20:08 +01:00
Minfei Huang
e9c40d257f x86/kexec: Remove obsolete 'in_crash_kexec' flag
Previously, UV NMI used the 'in_crash_kexec' flag to determine whether
we are in a kdump kernel or not:

  5edd19af18 ("x86, UV: Make kdump avoid stack dumps")

But this flags was removed in the following commit:

  9c48f1c629 ("x86, nmi: Wire up NMI handlers to new routines")

Since it isn't used any more, remove it.

Signed-off-by: Minfei Huang <mnfhuang@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: cpw@sgi.com
Cc: kexec@lists.infradead.org
Cc: mhuang@redhat.com
Link: http://lkml.kernel.org/r/1444070155-17934-1-git-send-email-mhuang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-12 09:43:11 +02:00
Andy Lutomirski
487e3bf4f7 x86/asm: Remove thread_info.sysenter_return
It's no longer needed.

We could reinstate something like it as an optimization, which
would remove two cachelines from the fast syscall entry working
set.  I benchmarked it, and it makes no difference whatsoever to
the performance of cache-hot compat syscalls on Sandy Bridge.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f08cc0cff30201afe9bb565c47134c0a6c1a96a2.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:11 +02:00
Andy Lutomirski
eb974c6256 x86/syscalls: Give sys_call_ptr_t a useful type
Syscalls are asmlinkage functions (on 32-bit kernels), take six
args of type unsigned long, and return long.  Note that uml
could probably be slightly cleaned up on top of this patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/4d3ecc4a169388d47009175408b2961961744e6f.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:08 +02:00
Andy Lutomirski
034042cc1e x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h
The header was missing some compat declarations.

Also make sys_call_ptr_t have a consistent type.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/3166aaff0fb43897998fcb6ef92991533f8c5c6c.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:08 +02:00
Andy Lutomirski
8242c6c84a x86/vdso/32: Save extra registers in the INT80 vsyscall path
The goal is to integrate the SYSENTER and SYSCALL32 entry paths
with the INT80 path.  SYSENTER clobbers ESP and EIP.  SYSCALL32
clobbers ECX (and, invisibly, R11).  SYSRETL (long mode to
compat mode) clobbers ECX and, invisibly, R11.  SYSEXIT (which
we only need for native 32-bit) clobbers ECX and EDX.

This means that we'll need to provide ESP to the kernel in a
register (I chose ECX, since it's only needed for SYSENTER) and
we need to provide the args that normally live in ECX and EDX in
memory.

The epilogue needs to restore ECX and EDX, since user code
relies on regs being preserved.

We don't need to do anything special about EIP, since the kernel
already knows where we are.  The kernel will eventually need to
know where int $0x80 lands, so add a vdso_image entry for it.

The only user-visible effect of this code is that ptrace-induced
changes to ECX and EDX during fast syscalls will be lost.  This
is already the case for the SYSENTER path.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/b860925adbee2d2627a0671fbfe23a7fd04127f8.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:06 +02:00
Andy Lutomirski
7bcdea4d05 x86/elf/64: Clear more registers in elf_common_init()
Before we start calling execve in contexts that honor the full
pt_regs, we need to teach it to initialize all registers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/65a38a9edee61a1158cfd230800c61dbd963dac5.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:06 +02:00
Andy Lutomirski
f24f910884 x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm
For the vDSO, user code wants runtime unwind info.  Make sure
that, if we use .cfi directives, we generate it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/16e29ad8855e6508197000d8c41f56adb00d7580.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:05 +02:00
Andy Lutomirski
7b956f035a x86/asm: Re-add parts of the manual CFI infrastructure
Commit:

  131484c8da ("x86/debug: Remove perpetually broken, unmaintainable dwarf annotations")

removed all the manual DWARF annotations outside the vDSO.  It also removed
the macros we used for the manual annotations.

Re-add these macros so that we can clean up the vDSO annotations.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/4c70bb98a8b773c8ccfaabf6745e569ff43e7f65.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:05 +02:00
Andy Lutomirski
0a6d1fa0d2 x86/vdso: Remove runtime 32-bit vDSO selection
32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO.  Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:08 +02:00
Andy Lutomirski
7e0f51cb44 x86/uaccess: Add unlikely() to __chk_range_not_ok() failure paths
This should improve code quality a bit. It also shrinks the kernel text:

 Before:
       text     data      bss       dec    filename
   21828379  5194760  1277952  28301091    vmlinux

 After:
       text     data      bss       dec    filename
   21827997  5194760  1277952  28300709    vmlinux

... by 382 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f427b8002d932e5deab9055e0074bb4e7e80ee39.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:06 +02:00
Andy Lutomirski
a76cf66e94 x86/uaccess: Tell the compiler that uaccess is unlikely to fault
GCC doesn't realize that get_user(), put_user(), and their __
variants are unlikely to fail.  Tell it.

I noticed this while playing with the C entry code.

 Before:
       text     data      bss       dec    filename
   21828763  5194760  1277952  28301475    vmlinux.baseline

 After:
      text      data      bss       dec    filename
   21828379  5194760  1277952  28301091    vmlinux.new

The generated code shrunk by 384 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/dc37bed7024319c3004d950d57151fca6aeacf97.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:06 +02:00
Ingo Molnar
25a9a924c0 Merge branch 'linus' into x86/asm, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:24:24 +02:00
Ingo Molnar
82fc167c39 Linux 4.3-rc4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWEUxnAAoJEHm+PkMAQRiGYCYH/3gtGkFdvSLi+E1PfI8Qk3ZA
 XuYA4Mj09JBVSmaICeueMTDVrdiq0OE0zPib26GWlF/za13kNU8KgMR3+6XCuLSX
 DiCmh6mwDItoNoSIIUERLqrFHABXz8rZ3gb3uu2+kNN74Cl0piNm1YpFclEEWjMr
 9Wk5fkq+ontnDVUQOvWUxPiUXOJTvdLXBWTRDw1yTdE3RMNwRI2d/hme6Hq++WYV
 tRalZZKQaoB33js9WRVAoLVunvtna+i+/y7VGLj8QyS0+d6ec81Hey2r1/fR/oG4
 bs4ul6vtqeb3IR/PjUqxF59pSrCLEO+qrp9KrTlJNYgr1m1QyjRxWUdy/XhyaWo=
 =gIhN
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc4' into locking/core, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-06 17:10:28 +02:00
Peter Zijlstra
d87b7a3379 sched/core, sched/x86: Kill thread_info::saved_preempt_count
With the introduction of the context switch preempt_count invariant,
and the demise of PREEMPT_ACTIVE, its pointless to save/restore the
per-cpu preemption count, it must always be 2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-06 17:08:18 +02:00
Peter Zijlstra
609ca06638 sched/core: Create preempt_count invariant
Assuming units of PREEMPT_DISABLE_OFFSET for preempt_count() numbers.

Now that TASK_DEAD no longer results in preempt_count() == 3 during
scheduling, we will always call context_switch() with preempt_count()
== 2.

However, we don't always end up with preempt_count() == 2 in
finish_task_switch() because new tasks get created with
preempt_count() == 1.

Create FORK_PREEMPT_COUNT and set it to 2 and use that in the right
places. Note that we cannot use INIT_PREEMPT_COUNT as that serves
another purpose (boot).

After this, preempt_count() is invariant across the context switch,
with exception of PREEMPT_ACTIVE.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-06 17:08:14 +02:00
Linus Torvalds
f6702681a0 xen: bug fixes for 4.3-rc4
- Fix VM save performance regression with x86 PV guests.
 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI).
 - Other minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWE8wQAAoJEFxbo/MsZsTRVTMH/0eqSg2M78wv4sBl234Y3FE9
 AN8KFUdlkK7VN9v0uuLMDSKIWNUuFJIvo/2rElWGRiX2Q+/pfnQg3ZSFhub9S8uL
 T4LCvmG9viRFb2oUz792ewqncSw3X98Jpto4smA820gJRjndBSWm5HUKUtPAkv1M
 l5DFMEgOeHbu+wCbKD/ZPEt5K9GsIaNviSNoWtYHirZwrd00oLmNbWp+g8lIGQiT
 3vLW0SaZzjL6akKxihb/p3WZ9eNmyz8yk0V7dItUEVUB9qoaDDLJ5qIRSHHWTWQD
 Jza/GE32VallZLuEXGG5/D86MsnyVYHC+lZtwo2IptOGm8v7WuZRv094wI1ev5c=
 =aiDw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Fix VM save performance regression with x86 PV guests

 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)

 - Other minor fixes.

* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen/p2m: hint at the last populated P2M entry
  x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
  x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
  xen/x86: Don't try to write syscall-related MSRs for PV guests
  xen: use correct type for HYPERVISOR_memory_op()
2015-10-06 15:05:02 +01:00
Stephen Smalley
e1a58320a3 x86/mm: Warn on W^X mappings
Warn on any residual W+X mappings after setting NX
if DEBUG_WX is enabled.  Introduce a separate
X86_PTDUMP_CORE config that enables the code for
dumping the page tables without enabling the debugfs
interface, so that DEBUG_WX can be enabled without
exposing the debugfs interface.  Switch EFI_PGT_DUMP
to using X86_PTDUMP_CORE so that it also does not require
enabling the debugfs interface.

On success it prints this to the kernel log:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.

On failure it prints a warning and a count of the failed pages:

  ------------[ cut here ]------------
  WARNING: CPU: 1 PID: 1 at arch/x86/mm/dump_pagetables.c:226 note_page+0x610/0x7b0()
  x86/mm: Found insecure W+X mapping at address ffffffff81755000/__stop___ex_table+0xfa8/0xabfa8
  [...]
  Call Trace:
   [<ffffffff81380a5f>] dump_stack+0x44/0x55
   [<ffffffff8109d3f2>] warn_slowpath_common+0x82/0xc0
   [<ffffffff8109d48c>] warn_slowpath_fmt+0x5c/0x80
   [<ffffffff8106cfc9>] ? note_page+0x5c9/0x7b0
   [<ffffffff8106d010>] note_page+0x610/0x7b0
   [<ffffffff8106d409>] ptdump_walk_pgd_level_core+0x259/0x3c0
   [<ffffffff8106d5a7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
   [<ffffffff81063905>] mark_rodata_ro+0xf5/0x100
   [<ffffffff817415a0>] ? rest_init+0x80/0x80
   [<ffffffff817415bd>] kernel_init+0x1d/0xe0
   [<ffffffff8174cd1f>] ret_from_fork+0x3f/0x70
   [<ffffffff817415a0>] ? rest_init+0x80/0x80
  ---[ end trace a1f23a1e42a2ac76 ]---
  x86/mm: Checked W+X mappings: FAILED, 171 W+X pages found.

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1444064120-11450-1-git-send-email-sds@tycho.nsa.gov
[ Improved the Kconfig help text and made the new option default-y
  if CONFIG_DEBUG_RODATA=y, because it already found buggy mappings,
  so we really want people to have this on by default. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-06 11:11:48 +02:00
Ingo Molnar
38a413cbc2 Linux 4.3-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWB9f6AAoJEHm+PkMAQRiGiFMIAJYFLIkF/dXFYMNPGsRGRGYO
 SsQkfYzjy4i/yloyVlGB33e6dqxWdVgCeqYC77TO+1CBq34o6dqM4PACTrhjtS+3
 qQvaP/qn6cSoaGIkdD3v43CCiwMpZZ5+Uj7F7Uz8N4twrpykOZFMM5T7f1lrsG2F
 wJGafmvok9NU2F2wYwaJ8JrzsF6iO6ibFeB8BosRF5Ba4nKqiXVI0xNa0R8PFDm3
 tbh/IkkqokemEqnHyWyszhGFsCQupi+QgsjY/LhWUcCaL7HLEgJmkBX0tXNlgMmK
 TFCq7L8Bigu4nlgZ/iVUB9kh4GTBNVcbdRVN3loJFlczFJlIAa171OVlfRu3lvU=
 =m29x
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc3' into x86/mm, to pick up fixes before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-06 10:56:54 +02:00
Linus Torvalds
2cf30826bb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
  build failure fixes for corner case configs, x32 header fix and a
  speling fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
  x86/mm: Set NX on gap between __ex_table and rodata
  x86/kexec: Fix kexec crash in syscall kexec_file_load()
  x86/process: Unify 32bit and 64bit implementations of get_wchan()
  x86/process: Add proper bound checks in 64bit get_wchan()
  x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
  x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
  x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
2015-10-03 10:53:05 -04:00
Linus Torvalds
bde17b90dd Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dmapool: fix overflow condition in pool_find_page()
  thermal: avoid division by zero in power allocator
  memcg: remove pcp_counter_lock
  kprobes: use _do_fork() in samples to make them work again
  drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
  memcg: make mem_cgroup_read_stat() unsigned
  memcg: fix dirty page migration
  dax: fix NULL pointer in __dax_pmd_fault()
  mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
  mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
  userfaultfd: remove kernel header include from uapi header
  arch/x86/include/asm/efi.h: fix build failure
2015-10-01 22:20:11 -04:00
Andrey Ryabinin
a523841ee4 arch/x86/include/asm/efi.h: fix build failure
With KMEMCHECK=y, KASAN=n:

  arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
  arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
  arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]

Don't #undef memcpy if KASAN=n.

Fixes: 769a8089c1 ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
Linus Torvalds
ccf70ddcbe (Relatively) a lot of reverts, mostly.
Bugs have trickled in for a new feature in 4.2 (MTRR support in guests)
 so I'm reverting it all; let's not make this -rc period busier for KVM
 than it's been so far.  This covers the four reverts from me.
 
 The fifth patch is being reverted because Radim found a bug in the
 implementation of stable scheduler clock, *but* also managed to implement
 the feature entirely without hypervisor support.  So instead of fixing
 the hypervisor side we can remove it completely; 4.4 will get the new
 implementation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWDXc/AAoJEL/70l94x66D8GoH/0WXeSYHn8+Ql5oZ5vI0QcCG
 6MiKVixhHTOpkug2QE4DGClYoFSUPuDEB/w6D7YciNn0quDHFZbI3XEMXYtLobHN
 0J9cMv9Vpy5pBVMG/LJOw9pFAJRdhSx/cHU2DW9vUiRG9dO9zuxFzBtUciWLOPAX
 tSQfDumeUV30BsTP5ldi9kaIUJBM9oBD4JhES0JHx6ePBvy+9vCRmHotugzrrGx6
 N+AbCmwUwxnK29PF9i7KMfex6T8l1uQG3fwWVazHoswsqbFEQyF6NpaSTYoZkjM9
 6gaXEE1FQ7tRhuio4bBDos0lLu6iGesveP71p/HpULleq2sbH2ER8TpzR5iSnQA=
 =zAJS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "(Relatively) a lot of reverts, mostly.

  Bugs have trickled in for a new feature in 4.2 (MTRR support in
  guests) so I'm reverting it all; let's not make this -rc period busier
  for KVM than it's been so far.  This covers the four reverts from me.

  The fifth patch is being reverted because Radim found a bug in the
  implementation of stable scheduler clock, *but* also managed to
  implement the feature entirely without hypervisor support.  So instead
  of fixing the hypervisor side we can remove it completely; 4.4 will
  get the new implementation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
  Update KVM homepage Url
  Revert "KVM: SVM: use NPT page attributes"
  Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"
  Revert "KVM: SVM: Sync g_pat with guest-written PAT value"
  Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"
  Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
2015-10-01 16:43:25 -04:00
Feng Wu
bf9f6ac8d7 KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <feng.wu@intel.com>
[Concentrate invocation of pre/post-block hooks to vcpu_block. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:53 +02:00
Feng Wu
8727688006 KVM: x86: select IRQ_BYPASS_MANAGER
Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:52 +02:00
Feng Wu
efc644048e KVM: x86: Update IRTE for posted-interrupts
This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[Squashed in automatically generated patch from the build robot
 "KVM: x86: vcpu_to_pi_desc() can be static" - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:51 +02:00
Feng Wu
d84f1e0755 KVM: make kvm_set_msi_irq() public
Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:50 +02:00
Feng Wu
8feb4a04dc KVM: Define a new interface kvm_intr_is_single_vcpu()
This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:49 +02:00
Paolo Bonzini
18cd52c4d9 irq_remapping: move structs outside #ifdef
This is friendlier to clients of the code, who are going to prepare
vcpu_data structs unconditionally, even if CONFIG_IRQ_REMAP is not
defined.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:42 +02:00
Xiao Guangrong
8b3e34e46a KVM: x86: add pcommit support
Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction

Currently we do not catch pcommit instruction for L1 guest and
allow L1 to catch this instruction for L2 if, as required by the spec,
L1 can enumerate the PCOMMIT instruction via CPUID:
| IA32_VMX_PROCBASED_CTLS2[53] (which enumerates support for the
| 1-setting of PCOMMIT exiting) is always the same as
| CPUID.07H:EBX.PCOMMIT[bit 22]. Thus, software can set PCOMMIT exiting
| to 1 if and only if the PCOMMIT instruction is enumerated via CPUID

The spec can be found at
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:35 +02:00
Andrey Smetanin
9eec50b8bb kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support
HV_X64_MSR_VP_RUNTIME msr used by guest to get
"the time the virtual processor consumes running guest code,
and the time the associated logical processor spends running
hypervisor code on behalf of that guest."

Calculation of this time is performed by task_cputime_adjusted()
for vcpu task.

Necessary to support loading of winhv.sys in guest, which in turn is
required to support Windows VMBus.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Reviewed-by: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:33 +02:00
Steve Rutherford
1c1a9ce973 KVM: x86: Add support for local interrupt requests from userspace
In order to enable userspace PIC support, the userspace PIC needs to
be able to inject local interrupts even when the APICs are in the
kernel.

KVM_INTERRUPT now supports sending local interrupts to an APIC when
APICs are in the kernel.

The ready_for_interrupt_request flag is now only set when the CPU/APIC
will immediately accept and inject an interrupt (i.e. APIC has not
masked the PIC).

When the PIC wishes to initiate an INTA cycle with, say, CPU0, it
kicks CPU0 out of the guest, and renedezvous with CPU0 once it arrives
in userspace.

When the CPU/APIC unmasks the PIC, a KVM_EXIT_IRQ_WINDOW_OPEN is
triggered, so that userspace has a chance to inject a PIC interrupt
if it had been pending.

Overall, this design can lead to a small number of spurious userspace
renedezvous. In particular, whenever the PIC transistions from low to
high while it is masked and whenever the PIC becomes unmasked while
it is low.

Note: this does not buffer more than one local interrupt in the
kernel, so the VMM needs to enter the guest in order to complete
interrupt injection before injecting an additional interrupt.

Compiles for x86.

Can pass the KVM Unit Tests.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:29 +02:00
Steve Rutherford
b053b2aef2 KVM: x86: Add EOI exit bitmap inference
In order to support a userspace IOAPIC interacting with an in kernel
APIC, the EOI exit bitmaps need to be configurable.

If the IOAPIC is in userspace (i.e. the irqchip has been split), the
EOI exit bitmaps will be set whenever the GSI Routes are configured.
In particular, for the low MSI routes are reservable for userspace
IOAPICs. For these MSI routes, the EOI Exit bit corresponding to the
destination vector of the route will be set for the destination VCPU.

The intention is for the userspace IOAPICs to use the reservable MSI
routes to inject interrupts into the guest.

This is a slight abuse of the notion of an MSI Route, given that MSIs
classically bypass the IOAPIC. It might be worthwhile to add an
additional route type to improve clarity.

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:28 +02:00
Steve Rutherford
7543a635aa KVM: x86: Add KVM exit for IOAPIC EOIs
Adds KVM_EXIT_IOAPIC_EOI which allows the kernel to EOI
level-triggered IOAPIC interrupts.

Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
to be informed (which is identical to the EOI_EXIT_BITMAP field used
by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
exits on older processors).

[Note: A prototype using ResampleFDs found that decoupling the EOI
from the VCPU's thread made it possible for the VCPU to not see a
recent EOI after reentering the guest. This does not match real
hardware.]

Compile tested for Intel x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:27 +02:00
Steve Rutherford
49df6397ed KVM: x86: Split the APIC from the rest of IRQCHIP.
First patch in a series which enables the relocation of the
PIC/IOAPIC to userspace.

Adds capability KVM_CAP_SPLIT_IRQCHIP;

KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
rest of the irqchip.

Compile tested for x86.

Signed-off-by: Steve Rutherford <srutherford@google.com>
Suggested-by: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:26 +02:00
Paolo Bonzini
d50ab6c1a2 KVM: x86: replace vm_has_apicv hook with cpu_uses_apicv
This will avoid an unnecessary trip to ->kvm and from there to the VPIC.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:24 +02:00
Paolo Bonzini
3bb345f387 KVM: x86: store IOAPIC-handled vectors in each VCPU
We can reuse the algorithm that computes the EOI exit bitmap to figure
out which vectors are handled by the IOAPIC.  The only difference
between the two is for edge-triggered interrupts other than IRQ8
that have no notifiers active; however, the IOAPIC does not have to
do anything special for these interrupts anyway.

This again limits the interactions between the IOAPIC and the LAPIC,
making it easier to move the former to userspace.

Inspired by a patch from Steve Rutherford.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:06:23 +02:00
Paolo Bonzini
82f6c9cd90 Merge branch 'x86/for-kvm' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD
This merges a cleanup of asm/apic.h, which is needed by the KVM patches
to support VT-d posted interrupts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 15:02:45 +02:00
Thomas Gleixner
09907ca630 Merge branch 'x86/for-kvm' into x86/apic
Pull in the apic change which is provided for kvm folks to pull into
their tree.
2015-09-30 21:20:39 +02:00
Paolo Bonzini
e02ae38713 x86/x2apic: Make stub functions available even if !CONFIG_X86_LOCAL_APIC
Some CONFIG_X86_X2APIC functions, especially x2apic_enabled(), are not
declared if !CONFIG_X86_LOCAL_APIC.  However, the same stubs that work
for !CONFIG_X86_X2APIC are okay even if there is no local APIC support
at all.

Avoid the introduction of #ifdefs by moving the x2apic declarations
completely outside the CONFIG_X86_LOCAL_APIC block.  (Unfortunately,
diff generation messes up the actual change that this patch makes).
There is no semantic change because CONFIG_X86_X2APIC depends on
CONFIG_X86_LOCAL_APIC.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Feng Wu <feng.wu@intel.com>
Link: http://lkml.kernel.org/r/1443435991-35750-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-30 21:17:36 +02:00
Andrey Ryabinin
4ac86a6dce x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
With KMEMCHECK=y, KASAN=n we get this build failure:

  arch/x86/platform/efi/efi.c:673:3:    error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]
  arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]
  arch/x86/include/asm/desc.h:121:2:    error: implicit declaration of function ‘memcpy’ [-Werror=implicit-function-declaration]

Don't #undef memcpy if KASAN=n.

Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 769a8089c1 ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
Link: http://lkml.kernel.org/r/1443544814-20122-1-git-send-email-ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30 09:29:53 +02:00
Ingo Molnar
ccf79c238f Linux 4.3-rc3
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWB9f6AAoJEHm+PkMAQRiGiFMIAJYFLIkF/dXFYMNPGsRGRGYO
 SsQkfYzjy4i/yloyVlGB33e6dqxWdVgCeqYC77TO+1CBq34o6dqM4PACTrhjtS+3
 qQvaP/qn6cSoaGIkdD3v43CCiwMpZZ5+Uj7F7Uz8N4twrpykOZFMM5T7f1lrsG2F
 wJGafmvok9NU2F2wYwaJ8JrzsF6iO6ibFeB8BosRF5Ba4nKqiXVI0xNa0R8PFDm3
 tbh/IkkqokemEqnHyWyszhGFsCQupi+QgsjY/LhWUcCaL7HLEgJmkBX0tXNlgMmK
 TFCq7L8Bigu4nlgZ/iVUB9kh4GTBNVcbdRVN3loJFlczFJlIAa171OVlfRu3lvU=
 =m29x
 -----END PGP SIGNATURE-----

Merge tag 'v4.3-rc3' into x86/urgent, before applying dependent fix

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-30 09:29:27 +02:00
Juergen Gross
24f775a660 xen: use correct type for HYPERVISOR_memory_op()
HYPERVISOR_memory_op() is defined to return an "int" value. This is
wrong, as the Xen hypervisor will return "long".

The sub-function XENMEM_maximum_reservation returns the maximum
number of pages for the current domain. An int will overflow for a
domain configured with 8TB of memory or more.

Correct this by using the correct type.

Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-28 14:48:52 +01:00
Radim Krčmář
9bac175d8e Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
Shifting pvclock_vcpu_time_info.system_time on write to KVM system time
MSR is a change of ABI.  Probably only 2.6.16 based SLES 10 breaks due
to its custom enhancements to kvmclock, but KVM never declared the MSR
only for one-shot initialization.  (Doc says that only one write is
needed.)

This reverts commit b7e60c5aed.
And adds a note to the definition of PVCLOCK_COUNTS_FROM_ZERO.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-28 13:06:37 +02:00
Linus Torvalds
e3be4266d3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Another pile of fixes for perf:

   - Plug overflows and races in the core code

   - Sanitize the flow of the perf syscall so we error out before
     handling the more complex and hard to undo setups

   - Improve and fix Broadwell and Skylake hardware support

   - Revert a fix which broke what it tried to fix in perf tools

   - A couple of smaller fixes in various places of perf tools"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix copying of /proc/kcore
  perf intel-pt: Remove no_force_psb from documentation
  perf probe: Use existing routine to look for a kernel module by dso->short_name
  perf/x86: Change test_aperfmperf() and test_intel() to static
  tools lib traceevent: Fix string handling in heterogeneous arch environments
  perf record: Avoid infinite loop at buildid processing with no samples
  perf: Fix races in computing the header sizes
  perf: Fix u16 overflows
  perf: Restructure perf syscall point of no return
  perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask
  perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
  perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific
  perf tools: Bool functions shouldn't return -1
  tools build: Add test for presence of __get_cpuid() gcc builtin
  tools build: Add test for presence of numa_num_possible_cpus() in libnuma
  Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum"
  perf stat: Fix per-pkg event reporting bug
2015-09-27 12:51:39 -04:00
Linus Torvalds
b6d980f493 AMD fixes for bugs introduced in the 4.2 merge window,
and a few PPC bug fixes too.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWBSn7AAoJEL/70l94x66Dxd4H/RT6kWWj9x4grEYUkcJUDyK2
 AXm7XcKQm04auwAic8Otr+ts/Qix/50kWmBe/TU0QLgqb8rj5Dj3yGFK6Z1y6mAz
 KvaxqMJd4tZGTqN0DDvC2ItEdzjfAdeJZo/FHXqPHVspG0G14T7STLna02LTBBEJ
 tNzY9qor8nFhg2fT2szqKaudUNgTqkCTpo57o2BrHE96SHG+m0WdpQCV1F5hPVpg
 Te0Pb7qX9xng5n3sQ7IV/t3QYbrza1ACwNQS9XJa0Yu6iEz7JdmVmzHQASK9ynn6
 hUHhsNYGx4IsPjPtfJk2GroNaRDZL+VMzw07tfcOvPx8xkS9hS63pwzmSBqfLrM=
 =Ywqn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "AMD fixes for bugs introduced in the 4.2 merge window, and a few PPC
  bug fixes too"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: disable halt_poll_ns as default for s390x
  KVM: x86: fix off-by-one in reserved bits check
  KVM: x86: use correct page table format to check nested page table reserved bits
  KVM: svm: do not call kvm_set_cr0 from init_vmcb
  KVM: x86: trap AMD MSRs for the TSeg base and mask
  KVM: PPC: Book3S: Take the kvm->srcu lock in kvmppc_h_logical_ci_load/store()
  KVM: PPC: Book3S HV: Pass the correct trap argument to kvmhv_commence_exit
  KVM: PPC: Book3S HV: Fix handling of interrupted VCPUs
  kvm: svm: reset mmu on VCPU reset
2015-09-25 10:51:40 -07:00
David Hildenbrand
920552b213 KVM: disable halt_poll_ns as default for s390x
We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-25 10:31:30 +02:00
Denys Vlasenko
0b101e62af x86/asm: Force inlining of cpu_relax()
On x86, cpu_relax() simply calls rep_nop(), which generates one
instruction, PAUSE (aka REP NOP).

With this config:

  http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os

gcc-4.7.2 does not always inline rep_nop(): it generates several
copies of this:

  <rep_nop> (16 copies, 194 calls):
       55                      push   %rbp
       48 89 e5                mov    %rsp,%rbp
       f3 90                   pause
       5d                      pop    %rbp
       c3                      retq

See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

This patch fixes this via s/inline/__always_inline/
on rep_nop() and cpu_relax().

( Forcing inlining only on rep_nop() causes GCC to
  deinline cpu_relax(), with almost no change in generated code).

      text     data      bss       dec     hex filename
  88118971 19905208 36421632 144445811 89c1173 vmlinux.before
  88118139 19905208 36421632 144444979 89c0e33 vmlinux

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443096149-27291-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-25 09:44:34 +02:00
Andy Lutomirski
3f2c5085ed x86/sched/64: Don't save flags on context switch (reinstated)
This reinstates the following commit:

  2c7577a758 ("sched/x86_64: Don't save flags on context switch")

which was reverted in:

  512255a2ad ("Revert 'sched/x86_64: Don't save flags on context switch'")

Historically, Linux has always saved and restored EFLAGS across
context switches.  As far as I know, the only reason to do this
is because of the NT flag.  In particular, if something calls
switch_to() with the NT flag set, then we don't want to leak the
NT flag into a different task that might try to IRET and fail
because NT is set.

Before this commit:

  8c7aa698ba ("x86_64, entry: Filter RFLAGS.NT on entry from userspace")

we could run system call bodies with NT set.  This would be a DoS or possibly
privilege escalation hole if scheduling in such a system call would leak
NT into a different task.

Importantly, we don't need to worry about NT being set while
preemptible or across page faults.  The only way we can schedule
due to preemption or a page fault is in an interrupt entry that
nests inside the SYSENTER prologue.  The CPU will clear NT when
entering through an interrupt gate, so we won't schedule with NT
set.

The only other interesting flags are IOPL and AC.  Allowing
switch_to() to change IOPL has no effect, as the value loaded
during kernel execution doesn't matter at all except between a
SYSENTER entry and the subsequent PUSHF, and anythign that
interrupts in that window will restore IOPL on return.

If we call __switch_to() with AC set, we have bigger problems.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d4440fdc2a89247bffb7c003d2a9a2952bd46827.1441146105.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-25 09:29:17 +02:00
Kristen Carlson Accardi
a7adb91b13 x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
Because noitification just isn't right.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1442944296-11737-1-git-send-email-kristen@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:57:24 +02:00
Peter Zijlstra
62e8a3258b atomic, arch: Audit atomic_{read,set}()
This patch makes sure that atomic_{read,set}() are at least
{READ,WRITE}_ONCE().

We already had the 'requirement' that atomic_read() should use
ACCESS_ONCE(), and most archs had this, but a few were lacking.
All are now converted to use READ_ONCE().

And, by a symmetry and general paranoia argument, upgrade atomic_set()
to use WRITE_ONCE().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: james.hogan@imgtec.com
Cc: linux-kernel@vger.kernel.org
Cc: oleg@redhat.com
Cc: will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:54:28 +02:00
Andrey Ryabinin
769a8089c1 x86, efi, kasan: #undef memset/memcpy/memmove per arch
In not-instrumented code KASAN replaces instrumented memset/memcpy/memmove
with not-instrumented analogues __memset/__memcpy/__memove.

However, on x86 the EFI stub is not linked with the kernel.  It uses
not-instrumented mem*() functions from arch/x86/boot/compressed/string.c

So we don't replace them with __mem*() variants in EFI stub.

On ARM64 the EFI stub is linked with the kernel, so we should replace
mem*() functions with __mem*(), because the EFI stub runs before KASAN
sets up early shadow.

So let's move these #undef mem* into arch's asm/efi.h which is also
included by the EFI stub.

Also, this will fix the warning in 32-bit build reported by kbuild test
robot:

	efi-stub-helper.c:599:2: warning: implicit declaration of function 'memcpy'

[akpm@linux-foundation.org: use 80 cols in comment]
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Fengguang Wu <fengguang.wu@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22 15:09:53 -07:00
Daniel J Blueman
ce2e572cfe x86/numachip: Introduce Numachip2 timer mechanisms
Add 1GHz 64-bit Numachip2 clocksource timer support for accurate
system-wide timekeeping, as core TSCs are unsynchronised.

Additionally, add a per-core clockevent mechanism that interrupts via the
platform IPI vector after a programmed period.

[ tglx: Taking it through x86 due to dependencies ]

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442829745-29311-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 22:25:33 +02:00
Daniel J Blueman
ad03a9c25d x86/numachip: Add Numachip IPI optimisations
When sending IPIs, first check if the non-local part of the source and
destination APIC IDs match; if so, send via the local APIC for efficiency.

Secondly, since the AMD BIOS-kernel developer guide states IPI delivery
will occur invarient of prior deliver status, avoid polling the delivery
status bit for efficiency.

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-3-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 22:25:33 +02:00
Daniel J Blueman
d9d4dee6ce x86/numachip: Add Numachip2 APIC support
Introduce support for Numachip2 remote interrupts via detecting the right
ACPI SRAT signature.

Access is performed via a fixed mapping in the x86 physical address space.

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-2-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 22:25:33 +02:00
Daniel J Blueman
db1003a719 x86/numachip: Cleanup Numachip support
Drop unused code and includes in Numachip header files and APIC driver.

Additionally, use the 'numachip1' prefix on Numachip1-specific functions;
this prepares for adding Numachip2 support in later patches.

Signed-off-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Steffen Persvold <sp@numascale.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1442768522-19217-1-git-send-email-daniel@numascale.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 22:25:32 +02:00
Toshi Kani
bbac8c6dea x86/asm: Add pud_pgprot() and pmd_pgprot()
pte_pgprot() returns a pgprot_t value by calling pte_flags().  Now
that pud_flags() and pmd_flags() work specifically for the pud/pmd
levels, define pud_pgprot() and pmd_pgprot() for PUD/PMD.

Also update pte_pgprot() to remove the unnecessary mask with
PTE_FLAGS_MASK as pte_flags() takes care of it.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Robert Elliot <elliott@hpe.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-6-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 21:27:32 +02:00
Toshi Kani
f70abb0fc3 x86/asm: Fix pud/pmd interfaces to handle large PAT bit
Now that we have pud/pmd mask interfaces, which handle pfn & flags
mask properly for the large PAT bit.

Fix pud/pmd pfn & flags interfaces by replacing PTE_PFN_MASK and
PTE_FLAGS_MASK with the pud/pmd mask interfaces.

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Robert Elliot <elliott@hpe.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-5-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 21:27:32 +02:00
Toshi Kani
4be4c1fb9a x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
The PAT bit gets relocated to bit 12 when PUD and PMD mappings are
used.  This bit 12, however, is not covered by PTE_FLAGS_MASK, which
is used for masking pfn and flags for all levels.

Add pud/pmd mask interfaces to handle pfn and flags properly by using
P?D_PAGE_MASK when PUD/PMD mappings are used, i.e. PSE bit is set.

Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Robert Elliot <elliott@hpe.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-4-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 21:27:32 +02:00
Toshi Kani
8321026718 x86/asm: Move PUD_PAGE macros to page_types.h
PUD_SHIFT is defined according to a given kernel configuration, which
allows it be commonly used by any x86 kernels.  However, PUD_PAGE_SIZE
and PUD_PAGE_MASK, which are set from PUD_SHIFT, are defined in
page_64_types.h, which can be used by 64-bit kernel only.

Move PUD_PAGE_SIZE and PUD_PAGE_MASK to page_types.h so that they can
be used by any x86 kernels as well.

Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Konrad Wilk <konrad.wilk@oracle.com>
Cc: Robert Elliot <elliott@hpe.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1442514264-12475-3-git-send-email-toshi.kani@hpe.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 21:27:32 +02:00
Paolo Bonzini
3afb112180 KVM: x86: trap AMD MSRs for the TSeg base and mask
These have roughly the same purpose as the SMRR, which we do not need
to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
boot, which causes problems when running a Xen dom0 under KVM.
Just return 0, meaning that processor protection of SMRAM is not
in effect.

Reported-by: M A Young <m.a.young@durham.ac.uk>
Cc: stable@vger.kernel.org
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-21 07:41:22 +02:00
Linus Torvalds
3ae839454e Mostly stable material, a lot of ARM fixes.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJV+/ucAAoJEL/70l94x66DV8YH/1KDym/1GJ+/Br/YkHZnM53l
 3Q0PwSLu9cNcIL9lUuDLwGTaVj+y8ud1Hjr/uzvKwivktmUYVZhkdtnZmnanvGOM
 qKB9K3nFXCPx8uqy8Dn7fOwEKcg9FmDOTTkWy13HDnXO+V4crSVVt+rPw+6FUMld
 NV5tYdw9Lu7y3XrveDebPWaPtyDL7OAagzmeK47eMffxG7X9Hf1H2aT7HueRi7x/
 SkLIe3gmiOWmHVJDPE9TOmFYIj19gywDFysKes1gdVJLVUIXiELMT7SrvAYnToVB
 zISIEj7Zx4SINPxpf2dUn8REm7NsmJY+PffLIl/Nv+ozGggFQGFH0SMZ08p0bxw=
 =tfmn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Mostly stable material, a lot of ARM fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
  sched: access local runqueue directly in single_task_running
  arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS'
  arm64: KVM: Remove all traces of the ThumbEE registers
  arm: KVM: Disable virtual timer even if the guest is not using it
  arm64: KVM: Disable virtual timer even if the guest is not using it
  arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources
  KVM: s390: Replace incorrect atomic_or with atomic_andnot
  arm: KVM: Fix incorrect device to IPA mapping
  arm64: KVM: Fix user access for debug registers
  KVM: vmx: fix VPID is 0000H in non-root operation
  KVM: add halt_attempted_poll to VCPU stats
  kvm: fix zero length mmio searching
  kvm: fix double free for fast mmio eventfd
  kvm: factor out core eventfd assign/deassign logic
  kvm: don't try to register to KVM_FAST_MMIO_BUS for non mmio eventfd
  KVM: make the declaration of functions within 80 characters
  KVM: arm64: add workaround for Cortex-A57 erratum #852523
  KVM: fix polling for guest halt continued even if disable it
  arm/arm64: KVM: Fix PSCI affinity info return value for non valid cores
  arm64: KVM: set {v,}TCR_EL2 RES1 bits
  ...
2015-09-18 09:23:08 -07:00
Andi Kleen
d0dc8494cd perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
Skylake has a new FRONTEND_LATENCY PEBS event to accurately profile
frontend problems (like ITLB or decoding issues).

The new event is configured through a separate MSR, which selects
a range of sub events.

Define the extra MSR as a extra reg and export support for it
through sysfs.  To avoid duplicating the existing
tables use a new function to add new entries to existing tables.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1435707205-6676-4-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-18 09:20:22 +02:00
Linus Torvalds
42dc2a3048 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 - misc fixes all around the map
 - block non-root vm86(old) if mmap_min_addr != 0
 - two small debuggability improvements
 - removal of obsolete paravirt op

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform: Fix Geode LX timekeeping in the generic x86 build
  x86/apic: Serialize LVTT and TSC_DEADLINE writes
  x86/ioapic: Force affinity setting in setup_ioapic_dest()
  x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
  x86/ldt: Fix small LDT allocation for Xen
  x86/vm86: Fix the misleading CONFIG_VM86 Kconfig help text
  x86/cpu: Print family/model/stepping in hex
  x86/vm86: Block non-root vm86(old) if mmap_min_addr != 0
  x86/alternatives: Make optimize_nops() interrupt safe and synced
  x86/mm/srat: Print non-volatile flag in SRAT
  x86/cpufeatures: Enable cpuid for Intel SHA extensions
2015-09-17 11:01:34 -07:00
Linus Torvalds
9786cff38a Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
 "Spinlock performance regression fix, plus documentation fixes"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/static_keys: Fix up the static keys documentation
  locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support
  locking/qspinlock/x86: Fix performance regression under unaccelerated VMs
  locking/static_keys: Fix a silly typo
2015-09-17 08:45:23 -07:00
Paolo Bonzini
62bea5bff4 KVM: add halt_attempted_poll to VCPU stats
This new statistic can help diagnosing VCPUs that, for any reason,
trigger bad behavior of halt_poll_ns autotuning.

For example, say halt_poll_ns = 480000, and wakeups are spaced exactly
like 479us, 481us, 479us, 481us. Then KVM always fails polling and wastes
10+20+40+80+160+320+480 = 1110 microseconds out of every
479+481+479+481+479+481+479 = 3359 microseconds. The VCPU then
is consuming about 30% more CPU than it would use without
polling.  This would show as an abnormally high number of
attempted polling compared to the successful polls.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com<
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-09-16 12:17:00 +02:00
Juergen Gross
cda34fc774 x86/paravirt: Remove the unused pv_time_ops::get_tsc_khz method
It's not used anywhere.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1442227343-403-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 14:15:22 +02:00
Dave Hansen
060dd0f712 x86/fpu: Add C structures for AVX-512 state components
AVX-512 has 3 separate state components:

  1. opmask registers
  2. zmm upper half of registers 0-15
  3. new zmm registers (16-31)

This patch adds C structures for the three components along with
a few comments mostly lifted from the SDM to explain what they
do.  This will allow us to check our structures against what the
hardware tells us about the sizes of the components.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.F2433B98@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:01 +02:00
Dave Hansen
83aa3c4530 x86/fpu: Rework YMM definition
We are about to rework all of the "extended state" definitions.
This makes the 'ymm' naming consistent with the AVX-512 types
we will introduce later.

We also add a convenience type: "reg_128_bit" so that we do
not have to spell out our arithmetic.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.B4EB045F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:00 +02:00
Dave Hansen
1126cb4535 x86/fpu/mpx: Rework MPX 'xstate' types
MPX includes two separate "extended state components".  There is
no real need to have an 'mpx_struct' because we never really
manage the states together.

We also separate out the actual data in 'mpx_bndcsr_state' from
the padding.  We will shortly be checking the state sizes
against our structures and need them to match.  For consistency,
we also ensure to prefix these types with 'mpx_'.

Lastly, we add some comments to mirror some of the descriptions
in the Intel documents (SDM) of the various state components.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233129.384B73EB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:22:00 +02:00
Dave Hansen
8a93c9e0dc x86/fpu: Rework XSTATE_* macros to remove magic '2'
The 'xstate.c' code has a bunch of references to '2'.  This
is because we have a lot more work to do for the "extended"
xstates than the "legacy" ones and state component 2 is the
first "extended" state.

This patch replaces all of the instances of '2' with
FIRST_EXTENDED_XFEATURE, which clearly explains what is
going on.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233128.A8C0BF51@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:58 +02:00
Dave Hansen
dad8c4fe85 x86/fpu: Rename XFEATURES_NR_MAX
This is a logcal followon to the last patch.  It makes the
XFEATURE_MAX naming consistent with the other enum values.
This is what Ingo suggested.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233127.A541448F@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:57 +02:00
Dave Hansen
d91cab7813 x86/fpu: Rename XSAVE macros
There are two concepts that have some confusing naming:
 1. Extended State Component numbers (currently called
    XFEATURE_BIT_*)
 2. Extended State Component masks (currently called XSTATE_*)

The numbers are (currently) from 0-9.  State component 3 is the
bounds registers for MPX, for instance.

But when we want to enable "state component 3", we go set a bit
in XCR0.  The bit we set is 1<<3.  We can check to see if a
state component feature is enabled by looking at its bit.

The current 'xfeature_bit's are at best xfeature bit _numbers_.
Calling them bits is at best inconsistent with ending the enum
list with 'XFEATURES_NR_MAX'.

This patch renames the enum to be 'xfeature'.  These also
happen to be what the Intel documentation calls a "state
component".

We also want to differentiate these from the "XSTATE_*" macros.
The "XSTATE_*" macros are a mask, and we rename them to match.

These macros are reasonably widely used so this patch is a
wee bit big, but this really is just a rename.

The only non-mechanical part of this is the

	s/XSTATE_EXTEND_MASK/XFEATURE_MASK_EXTEND/

We need a better name for it, but that's another patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233126.38653250@viggo.jf.intel.com
[ Ported to v4.3-rc1. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:21:46 +02:00
Dave Hansen
75933433d6 x86/fpu: Remove partial LWP support definitions
LightWeight Profiling was evidently an AMD profiling feature
that we never got around to implementing.  Remove the references
to it.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233125.7E602284@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
4109ca066b x86/fpu: Remove XSTATE_RESERVE
The original purpose of XSTATE_RESERVE was to carve out space
to store all of the possible extended state components that
get saved with the XSAVE instruction(s).

However, we are now almost entirely dynamically allocating
the buffers we use for XSAVE by placing them at the end of
the task_struct and them sizing them at boot.  The one
exception for that is the init_task.

The maximum extended state component size that we have today
is on systems with space for AVX-512 and Memory Protection
Keys: 2696 bytes.  We have reserved a PAGE_SIZE buffer in
the init_task via fpregs_state->__padding.

This check ensures that even if the component sizes or
layout were changed (which we do not expect), that we will
still not overflow the init_task's buffer.

In the case that we detect we might overflow the buffer,
we completely disable XSAVE support in the kernel and try
to boot as if we had 'legacy x87 FPU' support in place.
This is a crippled state without any of the XSAVE-enabled
features (MPX, AVX, etc...).  But, it at least let us
boot safely.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233125.D948D475@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
Dave Hansen
0a26537502 x86/fpu: Move XSAVE-disabling code to a helper
When we want to _completely_ disable XSAVE support as far as
the kernel is concerned, we have a big set of feature flags
to clear.  We currently only do this in cases where the user
asks for it to be disabled, but we are about to expand the
places where we do it to handle errors too.

Move the code in to xstate.c, and put it in the xstate.h
header.  We will use it in the next patch too.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: dave@sr71.net
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150902233124.EA9A70E5@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-14 12:07:56 +02:00
George Beshers
7c52198b9c x86/platform/uv: Insert per_cpu accessor function on uv_hub_nmi
UV: NMI: insert this_cpu_read accessor function on uv_hub_nmi.

On SGI UV systems a 'power nmi' command from the CMC causes
all processors to drop into uv_handle_nmi().  With the 4.0
kernel this results in

    BUG: unable to handle kernel paging request

The bug is caused by the current code trying to use the PER_CPU
variable uv_cpu_nmi.hub without an appropriate accessor
function. That oversight occurred in

    commit e16321709c ("uv: Replace __get_cpu_var")
    Author: Christoph Lameter <cl@linux.com>
    Date:   Sun Aug 17 12:30:41 2014 -0500

This patch inserts this_cpu_read() in the uv_hub_nmi macro
restoring the intended functionality.

Signed-off-by: George Beshers <gbeshers@sgi.com>
Acked-by: Mike Travis <travis@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Hedi Berriche <hedi@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <rja@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-13 09:27:53 +02:00
Peter Zijlstra
a6b277857f locking/qspinlock/x86: Only emit the test-and-set fallback when building guest support
Only emit the test-and-set fallback for Hypervisors lacking
PARAVIRT_SPINLOCKS support when building for guests.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org # 4.2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11 07:50:12 +02:00
Peter Zijlstra
43b3f02899 locking/qspinlock/x86: Fix performance regression under unaccelerated VMs
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS
set and Linus noted that the test-and-set implementation was retarded.

One should spin on the variable with a load, not a RMW.

While there, remove 'queued' from the name, as the lock isn't queued
at all, but a simple test-and-set.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: stable@vger.kernel.org # v4.2+
Link: http://lkml.kernel.org/r/20150904152523.GR18673@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-11 07:49:42 +02:00
Linus Torvalds
33e247c7e5 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - even more of the rest of MM

 - lib/ updates

 - checkpatch updates

 - small changes to a few scruffy filesystems

 - kmod fixes/cleanups

 - kexec updates

 - a dma-mapping cleanup series from hch

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  dma-mapping: consolidate dma_set_mask
  dma-mapping: consolidate dma_supported
  dma-mapping: cosolidate dma_mapping_error
  dma-mapping: consolidate dma_{alloc,free}_noncoherent
  dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
  mm: use vma_is_anonymous() in create_huge_pmd() and wp_huge_pmd()
  mm: make sure all file VMAs have ->vm_ops set
  mm, mpx: add "vm_flags_t vm_flags" arg to do_mmap_pgoff()
  mm: mark most vm_operations_struct const
  namei: fix warning while make xmldocs caused by namei.c
  ipc: convert invalid scenarios to use WARN_ON
  zlib_deflate/deftree: remove bi_reverse()
  lib/decompress_unlzma: Do a NULL check for pointer
  lib/decompressors: use real out buf size for gunzip with kernel
  fs/affs: make root lookup from blkdev logical size
  sysctl: fix int -> unsigned long assignments in INT_MIN case
  kexec: export KERNEL_IMAGE_SIZE to vmcoreinfo
  kexec: align crash_notes allocation to make it be inside one physical page
  kexec: remove unnecessary test in kimage_alloc_crash_control_pages()
  kexec: split kexec_load syscall from kexec core code
  ...
2015-09-10 18:19:42 -07:00
Linus Torvalds
06ab838c20 xen: MFN/GFN/BFN terminology changes for 4.3-rc0
- Use the correct GFN/BFN terms more consistently.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV8VRMAAoJEFxbo/MsZsTRiGQH/i/jrAJUJfrFC2PINaA2gDwe
 O0dlrkCiSgAYChGmxxxXZQSPM5Po5+EbT/dLjZ/uvSooeorM9RYY/mFo7ut/qLep
 4pyQUuwGtebWGBZTrj9sygUVXVhgJnyoZxskNUbhj9zvP7hb9++IiI78mzne6cpj
 lCh/7Z2dgpfRcKlNRu+qpzP79Uc7OqIfDK+IZLrQKlXa7IQDJTQYoRjbKpfCtmMV
 BEG3kN9ESx5tLzYiAfxvaxVXl9WQFEoktqe9V8IgOQlVRLgJ2DQWS6vmraGrokWM
 3HDOCHtRCXlPhu1Vnrp0R9OgqWbz8FJnmVAndXT8r3Nsjjmd0aLwhJx7YAReO/4=
 =JDia
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen terminology fixes from David Vrabel:
 "Use the correct GFN/BFN terms more consistently"

* tag 'for-linus-4.3-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn
  xen/privcmd: Further s/MFN/GFN/ clean-up
  hvc/xen: Further s/MFN/GFN clean-up
  video/xen-fbfront: Further s/MFN/GFN clean-up
  xen/tmem: Use xen_page_to_gfn rather than pfn_to_gfn
  xen: Use correctly the Xen memory terminologies
  arm/xen: implement correctly pfn_to_mfn
  xen: Make clear that swiotlb and biomerge are dealing with DMA address
2015-09-10 16:21:11 -07:00
Christoph Hellwig
452e06af1f dma-mapping: consolidate dma_set_mask
Almost everyone implements dma_set_mask the same way, although some time
that's hidden in ->set_dma_mask methods.

This patch consolidates those into a common implementation that either
calls ->set_dma_mask if present or otherwise uses the default
implementation.  Some architectures used to only call ->set_dma_mask
after the initial checks, and those instance have been fixed to do the
full work.  h8300 implemented dma_set_mask bogusly as a no-ops and has
been fixed.

Unfortunately some architectures overload unrelated semantics like changing
the dma_ops into it so we still need to allow for an architecture override
for now.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
ee196371d5 dma-mapping: consolidate dma_supported
Most architectures just call into ->dma_supported, but some also return 1
if the method is not present, or 0 if no dma ops are present (although
that should never happeb). Consolidate this more broad version into
common code.

Also fix h8300 which inorrectly always returned 0, which would have been
a problem if it's dma_set_mask implementation wasn't a similarly buggy
noop.

As a few architectures have much more elaborate implementations, we
still allow for arch overrides.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
efa21e432c dma-mapping: cosolidate dma_mapping_error
Currently there are three valid implementations of dma_mapping_error:

 (1) call ->mapping_error
 (2) check for a hardcoded error code
 (3) always return 0

This patch provides a common implementation that calls ->mapping_error
if present, then checks for DMA_ERROR_CODE if defined or otherwise
returns 0.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
1e8937526e dma-mapping: consolidate dma_{alloc,free}_noncoherent
Most architectures do not support non-coherent allocations and either
define dma_{alloc,free}_noncoherent to their coherent versions or stub
them out.

Openrisc uses dma_{alloc,free}_attrs to implement them, and only Mips
implements them directly.

This patch moves the Openrisc version to common code, and handles the
DMA_ATTR_NON_CONSISTENT case in the mips dma_map_ops instance.

Note that actual non-coherent allocations require a dma_cache_sync
implementation, so if non-coherent allocations didn't work on
an architecture before this patch they still won't work after it.

[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Christoph Hellwig
6894258eda dma-mapping: consolidate dma_{alloc,free}_{attrs,coherent}
Since 2009 we have a nice asm-generic header implementing lots of DMA API
functions for architectures using struct dma_map_ops, but unfortunately
it's still missing a lot of APIs that all architectures still have to
duplicate.

This series consolidates the remaining functions, although we still need
arch opt outs for two of them as a few architectures have very
non-standard implementations.

This patch (of 5):

The coherent DMA allocator works the same over all architectures supporting
dma_map operations.

This patch consolidates them and converges the minor differences:

 - the debug_dma helpers are now called from all architectures, including
   those that were previously missing them
 - dma_alloc_from_coherent and dma_release_from_coherent are now always
   called from the generic alloc/free routines instead of the ops
   dma-mapping-common.h always includes dma-coherent.h to get the defintions
   for them, or the stubs if the architecture doesn't support this feature
 - checks for ->alloc / ->free presence are removed.  There is only one
   magic instead of dma_map_ops without them (mic_dma_ops) and that one
   is x86 only anyway.

Besides that only x86 needs special treatment to replace a default devices
if none is passed and tweak the gfp_flags.  An optional arch hook is provided
for that.

[linux@roeck-us.net: fix build]
[jcmvbkbc@gmail.com: fix xtensa]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Dave Young
2965faa5e0 kexec: split kexec_load syscall from kexec core code
There are two kexec load syscalls, kexec_load another and kexec_file_load.
 kexec_file_load has been splited as kernel/kexec_file.c.  In this patch I
split kexec_load syscall code to kernel/kexec.c.

And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and
use kexec_file_load only, or vice verse.

The original requirement is from Ted Ts'o, he want kexec kernel signature
being checked with CONFIG_KEXEC_VERIFY_SIG enabled.  But kexec-tools use
kexec_load syscall can bypass the checking.

Vivek Goyal proposed to create a common kconfig option so user can compile
in only one syscall for loading kexec kernel.  KEXEC/KEXEC_FILE selects
KEXEC_CORE so that old config files still work.

Because there's general code need CONFIG_KEXEC_CORE, so I updated all the
architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects
KEXEC_CORE in arch Kconfig.  Also updated general kernel code with to
kexec_load syscall.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Young <dyoung@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Linus Torvalds
12f03ee606 libnvdimm for 4.3:
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
    mechanism for adding device-driver-discovered memory regions to the
    kernel's direct map.  This facility is used by the pmem driver to
    enable pfn_to_page() operations on the page frames returned by DAX
    ('direct_access' in 'struct block_device_operations'). For now, the
    'memmap' allocation for these "device" pages comes from "System
    RAM".  Support for allocating the memmap from device memory will
    arrive in a later kernel.
 
 2/ Introduce memremap() to replace usages of ioremap_cache() and
    ioremap_wt().  memremap() drops the __iomem annotation for these
    mappings to memory that do not have i/o side effects.  The
    replacement of ioremap_cache() with memremap() is limited to the
    pmem driver to ease merging the api change in v4.3.  Completion of
    the conversion is targeted for v4.4.
 
 3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
    driver, update the VFS DAX implementation and PMEM api to provide
    persistence guarantees for kernel operations on a DAX mapping.
 
 4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
    cacheable to improve performance.
 
 5/ Miscellaneous updates and fixes to libnvdimm including support
    for issuing "address range scrub" commands, clarifying the optimal
    'sector size' of pmem devices, a clarification of the usage of the
    ACPI '_STA' (status) property for DIMM devices, and other minor
    fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
 JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
 OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
 nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
 NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
 zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
 1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
 sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
 bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
 o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
 dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
 slsw6DkrWT60CRE42nbK
 =o57/
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This update has successfully completed a 0day-kbuild run and has
  appeared in a linux-next release.  The changes outside of the typical
  drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
  removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
  the introduction of ZONE_DEVICE + devm_memremap_pages().

  Summary:

   - Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
     mechanism for adding device-driver-discovered memory regions to the
     kernel's direct map.

     This facility is used by the pmem driver to enable pfn_to_page()
     operations on the page frames returned by DAX ('direct_access' in
     'struct block_device_operations').

     For now, the 'memmap' allocation for these "device" pages comes
     from "System RAM".  Support for allocating the memmap from device
     memory will arrive in a later kernel.

   - Introduce memremap() to replace usages of ioremap_cache() and
     ioremap_wt().  memremap() drops the __iomem annotation for these
     mappings to memory that do not have i/o side effects.  The
     replacement of ioremap_cache() with memremap() is limited to the
     pmem driver to ease merging the api change in v4.3.

     Completion of the conversion is targeted for v4.4.

   - Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
     driver, update the VFS DAX implementation and PMEM api to provide
     persistence guarantees for kernel operations on a DAX mapping.

   - Convert the ACPI NFIT 'BLK' driver to map the block apertures as
     cacheable to improve performance.

   - Miscellaneous updates and fixes to libnvdimm including support for
     issuing "address range scrub" commands, clarifying the optimal
     'sector size' of pmem devices, a clarification of the usage of the
     ACPI '_STA' (status) property for DIMM devices, and other minor
     fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
  libnvdimm, pmem: direct map legacy pmem by default
  libnvdimm, pmem: 'struct page' for pmem
  libnvdimm, pfn: 'struct page' provider infrastructure
  x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
  add devm_memremap_pages
  mm: ZONE_DEVICE for "device memory"
  mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
  dax: drop size parameter to ->direct_access()
  nd_blk: change aperture mapping from WC to WB
  nvdimm: change to use generic kvfree()
  pmem, dax: have direct_access use __pmem annotation
  dax: update I/O path to do proper PMEM flushing
  pmem: add copy_from_iter_pmem() and clear_pmem()
  pmem, x86: clean up conditional pmem includes
  pmem: remove layer when calling arch_has_wmb_pmem()
  pmem, x86: move x86 PMEM API to new pmem.h header
  libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
  pmem: switch to devm_ allocations
  devres: add devm_memremap
  libnvdimm, btt: write and validate parent_uuid
  ...
2015-09-08 14:35:59 -07:00
Linus Torvalds
59a47fff02 Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
 
  o Allowing the trace event filters to filter on CPU number and process ids
 
  o Two new markers for trace output latency were added
     (10 and 100 msec latencies)
 
  o Have tracing_thresh filter function profiling time
 
 I also worked on modifying the ring buffer code for some future
 work, and moved the adding of the timestamp around. One of my changes
 caused a regression, and since other changes were built on top of it
 and already tested, I had to operate a revert of that change. Instead
 of rebasing, this change set has the code that caused a regression
 as well as the code to revert that change without touching the other
 changes that were made on top of it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV6aZEAAoJEEjnJuOKh9ldrR4H/A1RcQf1prLLoUibPP4w3lat
 dmQcdpS1NY+cqyiKuKPAOkFDGQL7qWzRqZ8whcPSJIsHq57ufqNSLf+0bbQYPzg9
 g3CgGL7OApmGi5ulj0sNxhadvc9TFm/SAN0nVJlNuUWdm8e1UWHLsrJZaMfopu2r
 RDEtkOhg619mhDL4rktNdS6rk0B92Fhu2o2PwLZPVlUl1NNEt4WJU+ejitXUVO1A
 Nb70/rTGGJKtyHbW+74on4LnEN5Uu0Viu6rMwGfYyIgRmC2otdBDvE4xfKMiTUKr
 SzBjzrhIoMIRn4Vl0vElfulkpYaw7pcC2BdpZ4d9VpIOiLSlZs0x/TgCtpFEv5M=
 =baZ3
 -----END PGP SIGNATURE-----

Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing update from Steven Rostedt:
 "Mostly this is just clean ups and micro optimizations.

  The changes with more meat are:

   - Allowing the trace event filters to filter on CPU number and
     process ids

   - Two new markers for trace output latency were added (10 and 100
     msec latencies)

   - Have tracing_thresh filter function profiling time

  I also worked on modifying the ring buffer code for some future work,
  and moved the adding of the timestamp around.  One of my changes
  caused a regression, and since other changes were built on top of it
  and already tested, I had to operate a revert of that change.  Instead
  of rebasing, this change set has the code that caused a regression as
  well as the code to revert that change without touching the other
  changes that were made on top of it"

* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
  tracing: Don't make assumptions about length of string on task rename
  tracing: Allow triggers to filter for CPU ids and process names
  ftrace: Format MCOUNT_ADDR address as type unsigned long
  tracing: Introduce two additional marks for delay
  ftrace: Fix function_graph duration spacing with 7-digits
  ftrace: add tracing_thresh to function profile
  tracing: Clean up stack tracing and fix fentry updates
  ring-buffer: Reorganize function locations
  ring-buffer: Make sure event has enough room for extend and padding
  ring-buffer: Get timestamp after event is allocated
  ring-buffer: Move the adding of the extended timestamp out of line
  ring-buffer: Add event descriptor to simplify passing data
  ftrace: correct the counter increment for trace_buffer data
  tracing: Fix for non-continuous cpu ids
  tracing: Prefer kcalloc over kzalloc with multiply
2015-09-08 14:04:14 -07:00
Linus Torvalds
752240e74d xen: features and fixes for 4.3-rc0
- Convert xen-blkfront to the multiqueue API
 - [arm] Support binding event channels to different VCPUs.
 - [x86] Support > 512 GiB in a PV guests (off by default as such a
   guest cannot be migrated with the current toolstack).
 - [x86] PMU support for PV dom0 (limited support for using perf with
   Xen and other guests).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV7wIdAAoJEFxbo/MsZsTR0hEH/04HTKLKGnSJpZ5WbMPxqZxE
 UqGlvhvVWNAmFocZmbPcEi9T1qtcFrX5pM55JQr6UmAp3ovYsT2q1Q1kKaOaawks
 pSfc/YEH3oQW5VUQ9Lm9Ru5Z8Btox0WrzRREO92OF36UOgUOBOLkGsUfOwDinNIM
 lSk2djbYwDYAsoeC3PHB32wwMI//Lz6B/9ZVXcyL6ULynt1ULdspETjGnptRPZa7
 JTB5L4/soioKOn18HDwwOhKmvaFUPQv9Odnv7dc85XwZreajhM/KMu3qFbMDaF/d
 WVB1NMeCBdQYgjOrUjrmpyr5uTMySiQEG54cplrEKinfeZgKlEyjKvjcAfJfiac=
 =Ktjl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from David Vrabel:
 "Xen features and fixes for 4.3:

   - Convert xen-blkfront to the multiqueue API
   - [arm] Support binding event channels to different VCPUs.
   - [x86] Support > 512 GiB in a PV guests (off by default as such a
     guest cannot be migrated with the current toolstack).
   - [x86] PMU support for PV dom0 (limited support for using perf with
     Xen and other guests)"

* tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits)
  xen: switch extra memory accounting to use pfns
  xen: limit memory to architectural maximum
  xen: avoid another early crash of memory limited dom0
  xen: avoid early crash of memory limited dom0
  arm/xen: Remove helpers which are PV specific
  xen/x86: Don't try to set PCE bit in CR4
  xen/PMU: PMU emulation code
  xen/PMU: Intercept PMU-related MSR and APIC accesses
  xen/PMU: Describe vendor-specific PMU registers
  xen/PMU: Initialization code for Xen PMU
  xen/PMU: Sysfs interface for setting Xen PMU mode
  xen: xensyms support
  xen: remove no longer needed p2m.h
  xen: allow more than 512 GB of RAM for 64 bit pv-domains
  xen: move p2m list if conflicting with e820 map
  xen: add explicit memblock_reserve() calls for special pages
  mm: provide early_memremap_ro to establish read-only mapping
  xen: check for initrd conflicting with e820 map
  xen: check pre-allocated page tables for conflict with memory map
  xen: check for kernel memory conflicting with memory layout
  ...
2015-09-08 11:46:48 -07:00
Julien Grall
0df4f266b3 xen: Use correctly the Xen memory terminologies
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.

For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.

For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.

Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.

Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=e758ed14f390342513405dd766e874934573e6cb

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08 18:03:49 +01:00
Julien Grall
32e09870ee xen: Make clear that swiotlb and biomerge are dealing with DMA address
The swiotlb is required when programming a DMA address on ARM when a
device is not protected by an IOMMU.

In this case, the DMA address should always be equal to the machine address.
For DOM0 memory, Xen ensure it by have an identity mapping between the
guest address and host address. However, when mapping a foreign grant
reference, the 1:1 model doesn't work.

For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
(Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
of view given that all ARM guest are auto-translated.

Even though the name pfn_to_mfn is misleading, we need to ensure that
those caller get a GFN and not by mistake a MFN. In pratical, I haven't
seen error related to this but we should fix it for the sake of
correctness.

In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
patch, we have to introduce new helpers to return the DMA from a PFN and
the invert.

On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.

The helpers will be used in swiotlb and xen_biovec_phys_mergeable.

This is necessary in the latter because we have to ensure that the
biovec code will not try to merge a biovec using foreign page and
another using Linux memory.

Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
given that the only usage was in swiotlb.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
2015-09-08 17:10:52 +01:00
Ingo Molnar
decb4c4115 x86/headers: Remove <asm/sigcontext.h> references on the kernel side
Now that all type definitions are in the UAPI header, include it
directly, instead of through <asm/sigcontext.h>.

[ We still keep asm/sigcontext.h, so that uapi/asm/sigcontext32.h
  can include <asm/sigcontext.h>. ]

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-16-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:06:05 +02:00
Ingo Molnar
711531f4f3 x86/headers: Remove direct sigcontext32.h uses
Now that all sigcontext types are defined in asm/sigcontext.h,
remove the various sigcontext32.h uses in the kernel.

We still keep the header itself, which includes sigcontext.h, in
case user-space relies on it.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-15-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:59 +02:00
Ingo Molnar
8fcb346b91 x86/headers: Convert sigcontext_ia32 uses to sigcontext_32
Use the new name in kernel code, and move the old name to the
user-space-only legacy section of the UAPI header.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-14-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:59 +02:00
Ingo Molnar
db1e031401 x86/headers: Unify 'struct sigcontext_ia32' and 'struct sigcontext_32'
The two structures are identical - merge them and keep the
legacy name as a define.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-13-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:58 +02:00
Ingo Molnar
f2c609bca0 x86/headers: Move the 'struct sigcontext' definitions into the UAPI header
Our goal is to eliminate the duplicate struct sigcontext_ia32
definition, so move the kernel's primary sigcontext type into
the UAPI header, defining these two variants:

	struct sigcontext_32
	struct sigcontext_64

... and map them to 'struct sigcontext'.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-11-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:58 +02:00
Ingo Molnar
2d057c69e7 x86/headers: Clean up the kernel's struct sigcontext types to be ABI-clean
Use the __u16/32/64 types we standardized on in ABI definitions
and which other sigcontext related types are already using.

This will help unify struct sigcontext types between native
32-bit, compat and 64-bit kernels.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-10-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:57 +02:00
Ingo Molnar
86e9fc3a0e x86/headers: Convert uses of _fpstate_ia32 to _fpstate_32
Remove uses of _fpstate_ia32 from the kernel, and move the
legacy _fpstate_ia32 definition to the user-space only portion
of the header.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-9-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:57 +02:00
Ingo Molnar
7bb0dc2222 x86/headers: Unify 'struct _fpstate_ia32' and i386 struct _fpstate
'struct _fpstate_ia32' and 'struct _fpstate' on i386 are
identical in all fields, except 'padding1' being named
'padding'.

We unify the two structures and add a union that is both named
'padding1' and 'padding', in the (unlikely) case there's
user-space code that relies on the padding field name.

We rename the two main types to be:

  struct _fpstate_32
  struct _fpstate_64

for the 32-bit and 64-bit frame, and map them to the main and
compat structure names (_fpstate) depending on whether we are on
32-bit or on 64-bit kernels.

We also keep the old _fpstate_ia32 name as a legacy name.

Acked-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1441438363-9999-8-git-send-email-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-08 10:03:57 +02:00
Ingo Molnar
95cd2ea7d5 Merge branch 'linus' into x86/urgent, to be able to merge a dependent fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-05 09:00:47 +02:00
Mel Gorman
72b252aed5 mm: send one IPI per CPU to TLB flush all entries after unmapping pages
An IPI is sent to flush remote TLBs when a page is unmapped that was
potentially accesssed by other CPUs.  There are many circumstances where
this happens but the obvious one is kswapd reclaiming pages belonging to a
running process as kswapd and the task are likely running on separate
CPUs.

On small machines, this is not a significant problem but as machine gets
larger with more cores and more memory, the cost of these IPIs can be
high.  This patch uses a simple structure that tracks CPUs that
potentially have TLB entries for pages being unmapped.  When the unmapping
is complete, the full TLB is flushed on the assumption that a refill cost
is lower than flushing individual entries.

Architectures wishing to do this must give the following guarantee.

        If a clean page is unmapped and not immediately flushed, the
        architecture must guarantee that a write to that linear address
        from a CPU with a cached TLB entry will trap a page fault.

This is essentially what the kernel already depends on but the window is
much larger with this patch applied and is worth highlighting.  The
architecture should consider whether the cost of the full TLB flush is
higher than sending an IPI to flush each individual entry.  An additional
architecture helper called flush_tlb_local is required.  It's a trivial
wrapper with some accounting in the x86 case.

The impact of this patch depends on the workload as measuring any benefit
requires both mapped pages co-located on the LRU and memory pressure.  The
case with the biggest impact is multiple processes reading mapped pages
taken from the vm-scalability test suite.  The test case uses NR_CPU
readers of mapped files that consume 10*RAM.

Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User          581.00       611.43
System       5804.93      4111.76
Elapsed       161.03       122.12

This is showing that the readers completed 24.40% faster with 29% less
system CPU time.  From vmstats, it is known that the vanilla kernel was
interrupted roughly 900K times per second during the steady phase of the
test and the patched kernel was interrupts 180K times per second.

The impact is lower on a single socket machine.

                                           4.2.0-rc1          4.2.0-rc1
                                             vanilla       flushfull-v7
Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)

           4.2.0-rc1    4.2.0-rc1
             vanilla flushfull-v7
User           58.09        57.64
System        111.82        76.56
Elapsed        27.29        22.55

It's still a noticeable improvement with vmstat showing interrupts went
from roughly 500K per second to 45K per second.

The patch will have no impact on workloads with no memory pressure or have
relatively few mapped pages.  It will have an unpredictable impact on the
workload running on the CPU being flushed as it'll depend on how many TLB
entries need to be refilled and how long that takes.  Worst case, the TLB
will be completely cleared of active entries when the target PFNs were not
resident at all.

[sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Linus Torvalds
ca520cab25 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and atomic updates from Ingo Molnar:
 "Main changes in this cycle are:

   - Extend atomic primitives with coherent logic op primitives
     (atomic_{or,and,xor}()) and deprecate the old partial APIs
     (atomic_{set,clear}_mask())

     The old ops were incoherent with incompatible signatures across
     architectures and with incomplete support.  Now every architecture
     supports the primitives consistently (by Peter Zijlstra)

   - Generic support for 'relaxed atomics':

       - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
       - atomic_read_acquire()
       - atomic_set_release()

     This came out of porting qwrlock code to arm64 (by Will Deacon)

   - Clean up the fragile static_key APIs that were causing repeat bugs,
     by introducing a new one:

       DEFINE_STATIC_KEY_TRUE(name);
       DEFINE_STATIC_KEY_FALSE(name);

     which define a key of different types with an initial true/false
     value.

     Then allow:

       static_branch_likely()
       static_branch_unlikely()

     to take a key of either type and emit the right instruction for the
     case.  To be able to know the 'type' of the static key we encode it
     in the jump entry (by Peter Zijlstra)

   - Static key self-tests (by Jason Baron)

   - qrwlock optimizations (by Waiman Long)

   - small futex enhancements (by Davidlohr Bueso)

   - ... and misc other changes"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  jump_label/x86: Work around asm build bug on older/backported GCCs
  locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
  locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
  locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
  locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
  locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
  locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
  locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
  locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
  locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
  locking/static_keys: Make verify_keys() static
  jump label, locking/static_keys: Update docs
  locking/static_keys: Provide a selftest
  jump_label: Provide a self-test
  s390/uaccess, locking/static_keys: employ static_branch_likely()
  x86, tsc, locking/static_keys: Employ static_branch_likely()
  locking/static_keys: Add selftest
  locking/static_keys: Add a new static_key interface
  locking/static_keys: Rework update logic
  locking/static_keys: Add static_key_{en,dis}able() helpers
  ...
2015-09-03 15:46:07 -07:00
Linus Torvalds
ae98207309 Power management and ACPI material for v4.3-rc1
- ACPICA update to upstream revision 20150818 including method
    tracing extensions to allow more in-depth AML debugging in the
    kernel and a number of assorted fixes and cleanups (Bob Moore,
    Lv Zheng, Markus Elfring).
 
  - ACPI sysfs code updates and a documentation update related to
    AML method tracing (Lv Zheng).
 
  - ACPI EC driver fix related to serialized evaluations of _Qxx
    methods and ACPI tools updates allowing the EC userspace tool
    to be built from the kernel source (Lv Zheng).
 
  - ACPI processor driver updates preparing it for future
    introduction of CPPC support and ACPI PCC mailbox driver
    updates (Ashwin Chaugule).
 
  - ACPI interrupts enumeration fix for a regression related
    to the handling of IRQ attribute conflicts between MADT
    and the ACPI namespace (Jiang Liu).
 
  - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi Kasagar).
 
  - ACPI device registration code reorganization to separate the
    sysfs-related code and bus type operations from the rest (Rafael
    J Wysocki).
 
  - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
    Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).
 
  - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups
    (Pan Xinhui, Rafael J Wysocki).
 
  - cpufreq core cleanups on top of the previous changes allowing it
    to preseve its sysfs directories over system suspend/resume (Viresh
    Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).
 
  - cpufreq fixes and cleanups related to governors (Viresh Kumar).
 
  - cpufreq updates (core and the cpufreq-dt driver) related to the
    turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).
 
  - New DT bindings for Operating Performance Points (OPP), support
    for them in the OPP framework and in the cpufreq-dt driver plus
    related OPP framework fixes and cleanups (Viresh Kumar).
 
  - cpufreq powernv driver updates (Shilpasri G Bhat).
 
  - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).
 
  - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
    and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).
 
  - intel_pstate driver updates including Skylake-S support, support
    for enabling HW P-states per CPU and an additional vendor bypass
    list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).
 
  - cpuidle core fixes related to the handling of coupled idle states
    (Xunlei Pang).
 
  - intel_idle driver updates including Skylake Client support and
    support for freeze-mode-specific idle states (Len Brown).
 
  - Driver core updates related to power management (Andy Shevchenko,
    Rafael J Wysocki).
 
  - Generic power domains framework fixes and cleanups (Jon Hunter,
    Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).
 
  - Device PM QoS framework update to allow the latency tolerance
    setting to be exposed to user space via sysfs (Mika Westerberg).
 
  - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
    exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).
 
  - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).
 
  - rockchip-io AVS support updates (Heiko Stuebner).
 
  - PM core clocks support fixup (Colin Ian King).
 
  - Power capping RAPL driver update including support for Skylake H/S
    and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).
 
  - Generic device properties framework fixes related to the handling
    of static (driver-provided) property sets (Andy Shevchenko).
 
  - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
    Shreyas B Prabhu).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJV5hhGAAoJEILEb/54YlRxs+EQAK51iFk48+IbpHYaZZ50Yo4m
 ZZc2zBcbwRcBlU9vKERrhG+jieSl8J/JJNxT8vBjKqyvNw038mCjewQh02ol0HuC
 R7nlDiVJkmZ50sLO4xwE/1UBZr/XqbddwCUnYzvFMkMTA0ePzFtf8BrJ1FXpT8S/
 fkwSXQty6hvJDwxkfrbMSaA730wMju9lahx8D6MlmUAedWYZOJDMQKB4WKa/St5X
 9uckBPHUBB2KiKlXxdbFPwKLNxHvLROq5SpDLc6cM/7XZB+QfNFy85CUjCUtYo1O
 1W8k0qnztvZ6UEv27qz5dejGyAGOarMWGGNsmL9evoeGeHRpQL+dom7HcTnbAfUZ
 walyhYSm/zKkdy7Vl3xWUUQkMG48+PviMI6K0YhHXb3Rm5wlR/yBNZTwNIty9SX/
 fKCHEa8QynWwLxgm53c3xRkiitJxMsHNK03moLD9zQMjshTyTNvpNbZoahyKQzk6
 H+9M1DBRHhkkREDWSwGutukxfEMtWe2vcZcyERrFiY7l5k1j58DwDBMPqjPhRv6q
 P/1NlCzr0XYf83Y86J18LbDuPGDhTjjIEn6CqbtI2mmWqTg3+rF7zvS2ux+FzMnA
 gisv8l6GT9JiWhxKFqqL/rrVpwtyHebWLYE/RpNUW6fEzLziRNj1qyYO9dqI/GGi
 I3rfxlXoc/5xJWCgNB8f
 =fTgI
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "From the number of commits perspective, the biggest items are ACPICA
  and cpufreq changes with the latter taking the lead (over 50 commits).

  On the cpufreq front, there are many cleanups and minor fixes in the
  core and governors, driver updates etc.  We also have a new cpufreq
  driver for Mediatek MT8173 chips.

  ACPICA mostly updates its debug infrastructure and adds a number of
  fixes and cleanups for a good measure.

  The Operating Performance Points (OPP) framework is updated with new
  DT bindings and support for them among other things.

  We have a few updates of the generic power domains framework and a
  reorganization of the ACPI device enumeration code and bus type
  operations.

  And a lot of fixes and cleanups all over.

  Included is one branch from the MFD tree as it contains some
  PM-related driver core and ACPI PM changes a few other commits are
  based on.

  Specifics:

   - ACPICA update to upstream revision 20150818 including method
     tracing extensions to allow more in-depth AML debugging in the
     kernel and a number of assorted fixes and cleanups (Bob Moore, Lv
     Zheng, Markus Elfring).

   - ACPI sysfs code updates and a documentation update related to AML
     method tracing (Lv Zheng).

   - ACPI EC driver fix related to serialized evaluations of _Qxx
     methods and ACPI tools updates allowing the EC userspace tool to be
     built from the kernel source (Lv Zheng).

   - ACPI processor driver updates preparing it for future introduction
     of CPPC support and ACPI PCC mailbox driver updates (Ashwin
     Chaugule).

   - ACPI interrupts enumeration fix for a regression related to the
     handling of IRQ attribute conflicts between MADT and the ACPI
     namespace (Jiang Liu).

   - Fixes related to ACPI device PM (Mika Westerberg, Srinidhi
     Kasagar).

   - ACPI device registration code reorganization to separate the
     sysfs-related code and bus type operations from the rest (Rafael J
     Wysocki).

   - Assorted cleanups in the ACPI core (Jarkko Nikula, Mathias Krause,
     Andy Shevchenko, Rafael J Wysocki, Nicolas Iooss).

   - ACPI cpufreq driver and ia64 cpufreq driver fixes and cleanups (Pan
     Xinhui, Rafael J Wysocki).

   - cpufreq core cleanups on top of the previous changes allowing it to
     preseve its sysfs directories over system suspend/resume (Viresh
     Kumar, Rafael J Wysocki, Sebastian Andrzej Siewior).

   - cpufreq fixes and cleanups related to governors (Viresh Kumar).

   - cpufreq updates (core and the cpufreq-dt driver) related to the
     turbo/boost mode support (Viresh Kumar, Bartlomiej Zolnierkiewicz).

   - New DT bindings for Operating Performance Points (OPP), support for
     them in the OPP framework and in the cpufreq-dt driver plus related
     OPP framework fixes and cleanups (Viresh Kumar).

   - cpufreq powernv driver updates (Shilpasri G Bhat).

   - New cpufreq driver for Mediatek MT8173 (Pi-Cheng Chen).

   - Assorted cpufreq driver (speedstep-lib, sfi, integrator) cleanups
     and fixes (Abhilash Jindal, Andrzej Hajda, Cristian Ardelean).

   - intel_pstate driver updates including Skylake-S support, support
     for enabling HW P-states per CPU and an additional vendor bypass
     list entry (Kristen Carlson Accardi, Chen Yu, Ethan Zhao).

   - cpuidle core fixes related to the handling of coupled idle states
     (Xunlei Pang).

   - intel_idle driver updates including Skylake Client support and
     support for freeze-mode-specific idle states (Len Brown).

   - Driver core updates related to power management (Andy Shevchenko,
     Rafael J Wysocki).

   - Generic power domains framework fixes and cleanups (Jon Hunter,
     Geert Uytterhoeven, Rajendra Nayak, Ulf Hansson).

   - Device PM QoS framework update to allow the latency tolerance
     setting to be exposed to user space via sysfs (Mika Westerberg).

   - devfreq support for PPMUv2 in Exynos5433 and a fix for an incorrect
     exynos-ppmu DT binding (Chanwoo Choi, Javier Martinez Canillas).

   - System sleep support updates (Alan Stern, Len Brown, SungEun Kim).

   - rockchip-io AVS support updates (Heiko Stuebner).

   - PM core clocks support fixup (Colin Ian King).

   - Power capping RAPL driver update including support for Skylake H/S
     and Broadwell-H (Radivoje Jovanovic, Seiichi Ikarashi).

   - Generic device properties framework fixes related to the handling
     of static (driver-provided) property sets (Andy Shevchenko).

   - turbostat and cpupower updates (Len Brown, Shilpasri G Bhat,
     Shreyas B Prabhu)"

* tag 'pm+acpi-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (180 commits)
  cpufreq: speedstep-lib: Use monotonic clock
  cpufreq: powernv: Increase the verbosity of OCC console messages
  cpufreq: sfi: use kmemdup rather than duplicating its implementation
  cpufreq: drop !cpufreq_driver check from cpufreq_parse_governor()
  cpufreq: rename cpufreq_real_policy as cpufreq_user_policy
  cpufreq: remove redundant 'policy' field from user_policy
  cpufreq: remove redundant 'governor' field from user_policy
  cpufreq: update user_policy.* on success
  cpufreq: use memcpy() to copy policy
  cpufreq: remove redundant CPUFREQ_INCOMPATIBLE notifier event
  cpufreq: mediatek: Add MT8173 cpufreq driver
  dt-bindings: mediatek: Add MT8173 CPU DVFS clock bindings
  PM / Domains: Fix typo in description of genpd_dev_pm_detach()
  PM / Domains: Remove unusable governor dummies
  PM / Domains: Make pm_genpd_init() available to modules
  PM / domains: Align column headers and data in pm_genpd_summary output
  powercap / RAPL: disable the 2nd power limit properly
  tools: cpupower: Fix error when running cpupower monitor
  PM / OPP: Drop unlikely before IS_ERR(_OR_NULL)
  PM / OPP: Fix static checker warning (broken 64bit big endian systems)
  ...
2015-09-01 19:45:46 -07:00
Linus Torvalds
43af9872f5 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic updates from Thomas Gleixner:
 "This udpate contains:

   - rework the irq vector array to store a pointer to the irq
     descriptor instead of the irq number to avoid a lookup of the irq
     descriptor in the irq entry path

   - lguest interrupt handling cleanups

   - conversion of the local apic timer to the new clockevent callbacks

   - preparatory changes for the irq argument removal of interrupt flow
     handlers"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Do not dereference irq descriptor before checking it
  tools/lguest: Clean up include dir
  tools/lguest: Fix redefinition of struct virtio_pci_cfg_cap
  x86/irq: Store irq descriptor in vector array
  genirq: Provide irq_desc_has_action
  x86/irq: Get rid of an indentation level
  x86/irq: Rename VECTOR_UNDEFINED to VECTOR_UNUSED
  x86/irq: Replace numeric constant
  x86/irq: Protect smp_cleanup_move
  x86/lguest: Do not setup unused irq vectors
  x86/lguest: Clean up lguest_setup_irq
  x86/apic: Drop local_irq_save/restore in timer callbacks
  x86/apic: Migrate apic timer to new set_state interface
  x86/irq: Use access helper irq_data_get_affinity_mask()
  x86/irq: Use accessor irq_data_get_irq_handler_data()
  x86/irq: Use accessor irq_data_get_node()
2015-09-01 15:20:51 -07:00
Linus Torvalds
361f7d1757 Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core platform updates from Ingo Molnar:
 "The main changes are:

   - Intel Atom platform updates.  (Andy Shevchenko)

   - modularity fixlets.  (Paul Gortmaker)

   - x86 platform clockevents driver updates for lguest, uv and Xen.
     (Viresh Kumar)

   - Microsoft Hyper-V TSC fixlet.  (Vitaly Kuznetsov)"

* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/platform: Make atom/pmc_atom.c explicitly non-modular
  x86/hyperv: Mark the Hyper-V TSC as unstable
  x86/xen/time: Migrate to new set-state interface
  x86/uv/time: Migrate to new set-state interface
  x86/lguest/timer: Migrate to new set-state interface
  x86/pci/intel_mid_pci: Use proper constants for irq polarity
  x86/pci/intel_mid_pci: Make intel_mid_pci_ops static
  x86/pci/intel_mid_pci: Propagate actual return code
  x86/pci/intel_mid_pci: Work around for IRQ0 assignment
  x86/platform/iosf_mbi: Add Intel Tangier PCI id
  x86/platform/iosf_mbi: Source cleanup
  x86/platform/iosf_mbi: Remove NULL pointer checks for pci_dev_put()
  x86/platform/iosf_mbi: Check return value of debugfs_create properly
  x86/platform/iosf_mbi: Move to dedicated folder
  x86/platform/intel/pmc_atom: Move the PMC-Atom code to arch/x86/platform/atom
  x86/platform/intel/pmc_atom: Add Cherrytrail PMC interface
  x86/platform/intel/pmc_atom: Supply register mappings via PMC object
  x86/platform/intel/pmc_atom: Print index of device in loop
  x86/platform/intel/pmc_atom: Export accessors to PMC registers
2015-09-01 10:33:31 -07:00
Linus Torvalds
25525bea46 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The dominant change in this cycle was the continued work to isolate
  kernel drivers from MTRR legacies: this tree gets rid of all kernel
  internal driver interfaces to MTRRs (mostly by rewriting it to proper
  PAT interfaces), the only access left is the /proc/mtrr ABI.

  This work was done by Luis R Rodriguez.

  There's also some related PCI interface additions for which I've
  Cc:-ed Bjorn"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()
  s390/io: Add pci_iomap_wc() and pci_iomap_wc_range()
  drivers/dma/iop-adma: Use dma_alloc_writecombine() kernel-style
  drivers/video/fbdev/vt8623fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/s3fb: Use arch_phys_wc_add() and pci_iomap_wc()
  drivers/video/fbdev/arkfb.c: Use arch_phys_wc_add() and pci_iomap_wc()
  PCI: Add pci_iomap_wc() variants
  drivers/video/fbdev/gxt4500: Use pci_ioremap_wc_bar() to map framebuffer
  drivers/video/fbdev/kyrofb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  drivers/video/fbdev/i740fb: Use arch_phys_wc_add() and pci_ioremap_wc_bar()
  PCI: Add pci_ioremap_wc_bar()
  x86/mm: Make kernel/check.c explicitly non-modular
  x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular
  x86/mm/pat: Add comments to cachemode translation tables
  arch/*/io.h: Add ioremap_uc() to all architectures
  drivers/video/fbdev/atyfb: Use arch_phys_wc_add() and ioremap_wc()
  drivers/video/fbdev/atyfb: Replace MTRR UC hole with strong UC
  drivers/video/fbdev/atyfb: Clarify ioremap() base and length used
  drivers/video/fbdev/atyfb: Carve out framebuffer length fudging into a helper
  x86/mm, asm-generic: Add IOMMU ioremap_uc() variant default
  ...
2015-09-01 10:07:40 -07:00
Linus Torvalds
6b2282aa37 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
 "Two changes: a suspend/resume quirk and a new CPUID bit definition"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpufeature: Add feature bit for Intel's Silicon Debug CPUID bit
  x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume
2015-09-01 09:41:03 -07:00
Linus Torvalds
11e612ddb4 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
 "The main x86 bootup related changes in this cycle were:

   - more boot time optimizations.  (Len Brown)

   - implement hex output to allow the debugging of early bootup
     parameters.  (Kees Cook)

   - remove obsolete MCA leftovers.  (Paolo Pisati)"

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted
  x86/smpboot: Remove SIPI delays from cpu_up()
  x86/smpboot: Remove udelay(100) when polling cpu_callin_map
  x86/smpboot: Remove udelay(100) when polling cpu_initialized_map
  x86/boot: Obsolete the MCA sys_desc_table
  x86/boot: Add hex output for debugging
2015-09-01 09:04:31 -07:00
Linus Torvalds
5778077d03 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm changes from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Revamp, simplify (and in some cases fix) Time Stamp Counter (TSC)
     primitives.  (Andy Lutomirski)

   - Add new, comprehensible entry and exit handlers written in C.
     (Andy Lutomirski)

   - vm86 mode cleanups and fixes.  (Brian Gerst)

   - 32-bit compat code cleanups.  (Brian Gerst)

  The amount of simplification in low level assembly code is already
  palpable:

     arch/x86/entry/entry_32.S                          | 130 +----
     arch/x86/entry/entry_64.S                          | 197 ++-----

  but more simplifications are planned.

  There's also the usual laudry mix of low level changes - see the
  changelog for details"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (83 commits)
  x86/asm: Drop repeated macro of X86_EFLAGS_AC definition
  x86/asm/msr: Make wrmsrl() a function
  x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
  x86/asm: Add MONITORX/MWAITX instruction support
  x86/traps: Weaken context tracking entry assertions
  x86/asm/tsc: Add rdtscll() merge helper
  selftests/x86: Add syscall_nt selftest
  selftests/x86: Disable sigreturn_64
  x86/vdso: Emit a GNU hash
  x86/entry: Remove do_notify_resume(), syscall_trace_leave(), and their TIF masks
  x86/entry/32: Migrate to C exit path
  x86/entry/32: Remove 32-bit syscall audit optimizations
  x86/vm86: Rename vm86->v86flags and v86mask
  x86/vm86: Rename vm86->vm86_info to user_vm86
  x86/vm86: Clean up vm86.h includes
  x86/vm86: Move the vm86 IRQ definitions to vm86.h
  x86/vm86: Use the normal pt_regs area for vm86
  x86/vm86: Eliminate 'struct kernel_vm86_struct'
  x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'
  x86/vm86: Move vm86 fields out of 'thread_struct'
  ...
2015-09-01 08:40:25 -07:00
Rafael J. Wysocki
e625ccec1f Merge branches 'pm-tools' and 'powercap'
* pm-tools:
  tools: cpupower: Fix error when running cpupower monitor
  tools/power turbostat: fix typo on DRAM column in Joules-mode
  cpupower: Do not change the frequency of offline cpu
  tools/power turbostat: fix parameter passing for forked command
  tools/power turbostat: dump CONFIG_TDP
  tools/power turbostat: cpu0 is no longer hard-coded, so  update output
  tools/power turbostat: update turbostat(8)

* powercap:
  powercap / RAPL: disable the 2nd power limit properly
  powercap / RAPL: Add support for Broadwell-H
  powercap / RAPL: Add support for Skylake H/S
2015-09-01 15:54:30 +02:00
Linus Torvalds
a1d8561172 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest change in this cycle is the rewrite of the main SMP load
  balancing metric: the CPU load/utilization.  The main goal was to make
  the metric more precise and more representative - see the changelog of
  this commit for the gory details:

    9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")

  It is done in a way that significantly reduces complexity of the code:

    5 files changed, 249 insertions(+), 494 deletions(-)

  and the performance testing results are encouraging.  Nevertheless we
  need to keep an eye on potential regressions, since this potentially
  affects every SMP workload in existence.

  This work comes from Yuyang Du.

  Other changes:

   - SCHED_DL updates.  (Andrea Parri)

   - Simplify architecture callbacks by removing finish_arch_switch().
     (Peter Zijlstra et al)

   - cputime accounting: guarantee stime + utime == rtime.  (Peter
     Zijlstra)

   - optimize idle CPU wakeups some more - inspired by Facebook server
     loads.  (Mike Galbraith)

   - stop_machine fixes and updates.  (Oleg Nesterov)

   - Introduce the 'trace_sched_waking' tracepoint.  (Peter Zijlstra)

   - sched/numa tweaks.  (Srikar Dronamraju)

   - misc fixes and small cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  sched/deadline: Fix comment in enqueue_task_dl()
  sched/deadline: Fix comment in push_dl_tasks()
  sched: Change the sched_class::set_cpus_allowed() calling context
  sched: Make sched_class::set_cpus_allowed() unconditional
  sched: Fix a race between __kthread_bind() and sched_setaffinity()
  sched: Ensure a task has a non-normalized vruntime when returning back to CFS
  sched/numa: Fix NUMA_DIRECT topology identification
  tile: Reorganize _switch_to()
  sched, sparc32: Update scheduler comments in copy_thread()
  sched: Remove finish_arch_switch()
  sched, tile: Remove finish_arch_switch
  sched, sh: Fold finish_arch_switch() into switch_to()
  sched, score: Remove finish_arch_switch()
  sched, avr32: Remove finish_arch_switch()
  sched, MIPS: Get rid of finish_arch_switch()
  sched, arm: Remove finish_arch_switch()
  sched/fair: Clean up load average references
  sched/fair: Provide runnable_load_avg back to cfs_rq
  sched/fair: Remove task and group entity load when they are dead
  sched/fair: Init cfs_rq's sched_entity load average
  ...
2015-08-31 20:26:22 -07:00
Linus Torvalds
3959df1dfb Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
 "MCE handling updates, but also some generic drivers/edac/ changes to
  better organize the Kconfig space"

* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/ras: Move AMD MCE injector to arch/x86/ras/
  x86/mce: Add a wrapper around mce_log() for injection
  x86/mce: Rename rcu_dereference_check_mce() to mce_log_get_idx_check()
  RAS: Add a menuconfig option with descriptive text
  x86/mce: Reenable CMCI banks when swiching back to interrupt mode
  x86/mce: Clear Local MCE opt-in before kexec
  x86/mce: Remove unused function declarations
  x86/mce: Kill drain_mcelog_buffer()
  x86/mce: Avoid potential deadlock due to printk() in MCE context
  x86/mce: Remove the MCE ring for Action Optional errors
  x86/mce: Don't use percpu workqueues
  x86/mce: Provide a lockless memory pool to save error records
  x86/mce: Reuse one of the u16 padding fields in 'struct mce'
2015-08-31 20:20:30 -07:00
Linus Torvalds
41d859a83c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Main perf kernel side changes:

   - uprobes updates/fixes.  (Oleg Nesterov)

   - Add PERF_RECORD_SWITCH to indicate context switches and use it in
     tooling.  (Adrian Hunter)

   - Support BPF programs attached to uprobes and first steps for BPF
     tooling support.  (Wang Nan)

   - x86 generic x86 MSR-to-perf PMU driver.  (Andy Lutomirski)

   - x86 Intel PT, LBR and BTS updates.  (Alexander Shishkin)

   - x86 Intel Skylake support.  (Andi Kleen)

   - x86 Intel Knights Landing (KNL) RAPL support.  (Dasaratharaman
     Chandramouli)

   - x86 Intel Broadwell-DE uncore support.  (Kan Liang)

   - x86 hw breakpoints robustization (Andy Lutomirski)

  Main perf tooling side changes:

   - Support Intel PT in several tools, enabling the use of the
     processor trace feature introduced in Intel Broadwell processors:
     (Adrian Hunter)

       # dmesg | grep Performance
       # [0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
       # perf record -e intel_pt//u -a sleep 1
       [ perf record: Woken up 1 times to write data ]
       [ perf record: Captured and wrote 0.216 MB perf.data ]
       # perf script # then navigate in the tool output to some area, like this one:
       184 1030 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba661440 dl_main (/usr/lib64/ld-2.17.so)
       185 1457 dl_main (/usr/lib64/ld-2.17.so) => 7f21ba669f10 _dl_new_object (/usr/lib64/ld-2.17.so)
       186 9f37 _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba677b90 strlen (/usr/lib64/ld-2.17.so)
       187 7ba3 strlen (/usr/lib64/ld-2.17.so) => 7f21ba677c75 strlen (/usr/lib64/ld-2.17.so)
       188 7c78 strlen (/usr/lib64/ld-2.17.so) => 7f21ba669f3c _dl_new_object (/usr/lib64/ld-2.17.so)
       189 9f8a _dl_new_object (/usr/lib64/ld-2.17.so) => 7f21ba65fab0 calloc@plt (/usr/lib64/ld-2.17.so)
       190 fab0 calloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e70 calloc (/usr/lib64/ld-2.17.so)
       191 5e87 calloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa90 malloc@plt (/usr/lib64/ld-2.17.so)
       192 fa90 malloc@plt (/usr/lib64/ld-2.17.so) => 7f21ba675e60 malloc (/usr/lib64/ld-2.17.so)
       193 5e68 malloc (/usr/lib64/ld-2.17.so) => 7f21ba65fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so)
       194 fa80 __libc_memalign@plt (/usr/lib64/ld-2.17.so) => 7f21ba675d50 __libc_memalign (/usr/lib64/ld-2.17.so)
       195 5d63 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e20 __libc_memalign (/usr/lib64/ld-2.17.so)
       196 5e40 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675d73 __libc_memalign (/usr/lib64/ld-2.17.so)
       197 5d97 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675e18 __libc_memalign (/usr/lib64/ld-2.17.so)
       198 5e1e __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba675df9 __libc_memalign (/usr/lib64/ld-2.17.so)
       199 5e10 __libc_memalign (/usr/lib64/ld-2.17.so) => 7f21ba669f8f _dl_new_object (/usr/lib64/ld-2.17.so)
       200 9fc2 _dl_new_object (/usr/lib64/ld-2.17.so) =>  7f21ba678e70 memcpy (/usr/lib64/ld-2.17.so)
       201 8e8c memcpy (/usr/lib64/ld-2.17.so) => 7f21ba678ea0 memcpy (/usr/lib64/ld-2.17.so)

   - Add support for using several Intel PT features (CYC, MTC packets),
     the relevant documentation was updated in:
         tools/perf/Documentation/intel-pt.txt
     briefly describing those packets, its purposes, how to configure
     them in the event config terms and relevant external documentation
     for further reading.  (Adrian Hunter)

   - Introduce support for probing at an absolute address, for user and
     kernel 'perf probe's, useful when one have the symbol maps on a
     developer machine but not on an embedded system.  (Wang Nan)

   - Add Intel BTS support, with a call-graph script to show it and PT
     in use in a GUI using 'perf script' python scripting with
     postgresql and Qt.  (Adrian Hunter)

   - Allow selecting the type of callchains per event, including
     disabling callchains in all but one entry in an event list, to save
     space, and also to ask for the callchains collected in one event to
     be used in other events.  (Kan Liang)

   - Beautify more syscall arguments in 'perf trace': (Arnaldo Carvalho
     de Melo)
       * A bunch more translate file/pathnames from pointers to strings.
       * Convert numbers to strings for the 'keyctl' syscall 'option'
         arg.
       * Add missing 'clockid' entries.

   - Introduce 'srcfile' sort key: (Andi Kleen)

       # perf record -F 10000 usleep 1
       # perf report --stdio --dsos '[kernel.vmlinux]' -s srcfile
       <SNIP>
       # Overhead  Source File
          26.49%  copy_page_64.S
           5.49%  signal.c
           0.51%  msr.h
       #

     It can be combined with other fields, for instance, experiment with
     '-s srcfile,symbol'.

     There are some oddities in some distros and with some specific
     DSOs, being investigated, so your mileage may vary.

   - Support per-event 'freq' term: (Namhyung Kim)

       $ perf record -e 'cpu/instructions,freq=1234/',cycles -c 1000 sleep 1
       $ perf evlist -F
       cpu/instructions,freq=1234/: sample_freq=1234
       cycles: sample_period=1000
       $

   - Deref sys_enter pointer args with contents from probe:vfs_getname,
     showing pathnames instead of pointers in many syscalls in 'perf
     trace'.  (Arnaldo Carvalho de Melo)

   - Stop collecting /proc/kallsyms in perf.data files, saving about
     4.5MB on a typical x86-64 system, use the the symbol resolution
     routines used in all the other tools (report, top, etc) now that we
     can ask libtraceevent to use perf's symbol resolution code.
     (Arnaldo Carvalho de Melo)

   - Allow filtering out of perf's PID via 'perf record --exclude-perf'.
     (Wang Nan)

   - 'perf trace' now supports syscall groups, like strace, i.e:

       $ trace -e file touch file

     Will expand 'file' into multiple, file related, syscalls.  More
     work needed to add extra groups for other syscall groups, and also
     to complement what was added for the 'file' group, included as a
     proof of concept.  (Arnaldo Carvalho de Melo)

   - Add lock_pi stresser to 'perf bench futex', to test the kernel code
     related to FUTEX_(UN)LOCK_PI.  (Davidlohr Bueso)

   - Let user have timestamps with per-thread recording in 'perf record'
     (Adrian Hunter)

   - ... and tons of other changes, see the shortlog and the Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (240 commits)
  perf evlist: Add backpointer for perf_env to evlist
  perf tools: Rename perf_session_env to perf_env
  perf tools: Do not change lib/api/fs/debugfs directly
  perf tools: Add tracing_path and remove unneeded functions
  perf buildid: Introduce sysfs/filename__sprintf_build_id
  perf evsel: Add a backpointer to the evlist a evsel is in
  perf trace: Add header with copyright and background info
  perf scripts python: Add new compaction-times script
  perf stat: Get correct cpu id for print_aggr
  tools lib traceeveent: Allow for negative numbers in print format
  perf script: Add --[no-]-demangle/--[no-]-demangle-kernel
  tracing/uprobes: Do not print '0x (null)' when offset is 0
  perf probe: Support probing at absolute address
  perf probe: Fix error reported when offset without function
  perf probe: Fix list result when address is zero
  perf probe: Fix list result when symbol can't be found
  tools build: Allow duplicate objects in the object list
  perf tools: Remove export.h from MANIFEST
  perf probe: Prevent segfault when reading probe point with absolute address
  perf tools: Update Intel PT documentation
  ...
2015-08-31 19:49:05 -07:00
Linus Torvalds
4658000955 Merge branch 'mm-kasan-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/kasan changes from Ingo Molnar:
 "These are two KASAN changes that factor out (and generalize) x86
  specific KASAN code from x86 to mm"

* 'mm-kasan-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kasan, mm: Introduce generic kasan_populate_zero_shadow()
  x86/kasan: Define KASAN_SHADOW_OFFSET per architecture
2015-08-31 19:16:36 -07:00
Linus Torvalds
5757bd6157 Merge branch 'core-types-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull inlining tuning from Ingo Molnar:
 "A handful of inlining optimizations inspired by x86 work but
  applicable in general"

* 'core-types-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  jiffies: Force inlining of {m,u}msecs_to_jiffies()
  x86/hweight: Force inlining of __arch_hweight{32,64}()
  linux/bitmap: Force inlining of bitmap weight functions
2015-08-31 18:37:33 -07:00
Linus Torvalds
26f8b7edc9 PCI changes for the v4.3 merge window:
Enumeration
     Allocate ATS struct during enumeration (Bjorn Helgaas)
     Embed ATS info directly into struct pci_dev (Bjorn Helgaas)
     Reduce size of ATS structure elements (Bjorn Helgaas)
     Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas)
     iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas)
     Move MPS configuration check to pci_configure_device() (Bjorn Helgaas)
     Set MPS to match upstream bridge (Keith Busch)
     ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri)
     Add pci_scan_root_bus_msi() (Lorenzo Pieralisi)
     ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi)
 
   Resource management
     Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi)
 
   PCI device hotplug
     pciehp: Remove unused interrupt events (Bjorn Helgaas)
     pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas)
     pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson)
     pciehp: Simplify pcie_poll_cmd() (Yijing Wang)
     Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang)
     Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang)
     Hold pci_slot_mutex while searching bus->slots list (Yijing Wang)
 
   Power management
     Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui)
 
   Virtualization
     Add ACS quirks for Intel I219-LM/V (Alex Williamson)
     Restore ACS configuration as part of pci_restore_state() (Alexander Duyck)
 
   MSI
     Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
     x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
     Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu)
     Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu)
     ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi)
     Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi)
 
   Generic host bridge driver
     Remove dependency on ARM-specific struct hw_pci (Jayachandran C)
     Build setup-irq.o for arm64 (Jayachandran C)
     Add arm64 support (Jayachandran C)
 
   APM X-Gene host bridge driver
     Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang)
     Add support for a 64-bit prefetchable memory window (Duc Dang)
     Drop owner assignment from platform_driver (Krzysztof Kozlowski)
 
   Broadcom iProc host bridge driver
     Allow BCMA bus driver to be built as module (Hauke Mehrtens)
     Delete unnecessary checks before phy calls (Markus Elfring)
     Add arm64 support (Ray Jui)
 
   Synopsys DesignWare host bridge driver
     Don't complain missing *config* reg space if va_cfg0 is set (Murali Karicheri)
 
   TI DRA7xx host bridge driver
     Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I)
     Add PM support (Kishon Vijay Abraham I)
     Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I)
     Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I)
 
   Xilinx AXI host bridge driver
     Check for MSI interrupt flag before handling as INTx (Russell Joyce)
 
   Miscellaneous
     Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa)
     Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas)
     Fix generic NCR 53c810 class code quirk (Bjorn Helgaas)
     Fix TI816X class code quirk (Bjorn Helgaas)
     Remove unused "pci_probe" flags (Bjorn Helgaas)
     Host bridge driver code simplifications (Fabio Estevam)
     Add dev_flags bit to access VPD through function 0 (Mark Rustad)
     Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad)
     Kill off set_irq_flags() usage (Rob Herring)
     Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar)
     Clean up pci_find_capability() (Wei Yang)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV5FE/AAoJEFmIoMA60/r8I2QP/R9b9MrvH2i9tN98/lTDl7g3
 czE58ZM1d4kMYtW3Pm/DrYI6y6RprAaB4ZEp5rHxlFLqBPZEQwWodA19NkjECcb6
 g5qKWOdIWA4T6Jaab6a/yCmAFa0jni7iAmmTYqca9o3Xj7tFovxDxqPSYkh+rer0
 v+1sAr/4HXSiN339KR6teEF3VZqLFp6ewMydQlVS+R7kAOHHYQDqoo9WF6JnIoL5
 PO3Kbmr1WN3fZY3s98yLq1x6XmLrLlmGdJI+2r+KewO4r/05CL6wTVP/oTMi+Eti
 dueseeISlOTcTAUhk87Vap23uJPeB/rJbYoFdCr7+0AkZGe/U/E2dpZm2wyMcCvq
 OrATuFymgzIuJm5uUPsdH4lzsX97U9BcDccracfC38rYnP5u3bqHCjw8HJzANR7p
 VYbFBzc5ZCCUYtQAjyrKt2820AvTFo+Bu+z75IsJO8LQQgv/zGtQQ8grIQeAjH+l
 sAe3xOTwzZnq6Obl4qb/GElHmIGUbQ1X4Dx1mliiijKMKkhYHOA0iFnB/OBILmEZ
 wHzKU8chWcI9lip0aaX8q9i/qovdVUt2+rdo/N40l7YY66x4jkNgQQXZX+FSKk6H
 stTvEBQgK28EKCHDxMsgzTGIqllSyk4DnRMA7ij1hRWqdUbGk7wOPTvm9QSwNDWe
 SokuWzAQD9YeMRGdsYjZ
 =DX1r
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
 "PCI changes for the v4.3 merge window:

  Enumeration:
   - Allocate ATS struct during enumeration (Bjorn Helgaas)
   - Embed ATS info directly into struct pci_dev (Bjorn Helgaas)
   - Reduce size of ATS structure elements (Bjorn Helgaas)
   - Stop caching ATS Invalidate Queue Depth (Bjorn Helgaas)
   - iommu/vt-d: Cache PCI ATS state and Invalidate Queue Depth (Bjorn Helgaas)
   - Move MPS configuration check to pci_configure_device() (Bjorn Helgaas)
   - Set MPS to match upstream bridge (Keith Busch)
   - ARM/PCI: Set MPS before pci_bus_add_devices() (Murali Karicheri)
   - Add pci_scan_root_bus_msi() (Lorenzo Pieralisi)
   - ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi() (Lorenzo Pieralisi)

  Resource management:
   - Call pci_read_bridge_bases() from core instead of arch code (Lorenzo Pieralisi)

  PCI device hotplug:
   - pciehp: Remove unused interrupt events (Bjorn Helgaas)
   - pciehp: Remove ignored MRL sensor interrupt events (Bjorn Helgaas)
   - pciehp: Handle invalid data when reading from non-existent devices (Jarod Wilson)
   - pciehp: Simplify pcie_poll_cmd() (Yijing Wang)
   - Use "slot" and "pci_slot" for struct hotplug_slot and struct pci_slot (Yijing Wang)
   - Protect pci_bus->slots with pci_slot_mutex, not pci_bus_sem (Yijing Wang)
   - Hold pci_slot_mutex while searching bus->slots list (Yijing Wang)

  Power management:
   - Disable async suspend/resume for JMicron multi-function SATA/AHCI (Zhang Rui)

  Virtualization:
   - Add ACS quirks for Intel I219-LM/V (Alex Williamson)
   - Restore ACS configuration as part of pci_restore_state() (Alexander Duyck)

  MSI:
   - Add pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
   - x86: Implement pcibios_alloc_irq() and pcibios_free_irq() (Jiang Liu)
   - Add helpers to manage pci_dev->irq and pci_dev->irq_managed (Jiang Liu)
   - Free legacy IRQ when enabling MSI/MSI-X (Jiang Liu)
   - ARM/PCI: Remove msi_controller from struct pci_sys_data (Lorenzo Pieralisi)
   - Remove unused pcibios_msi_controller() hook (Lorenzo Pieralisi)

  Generic host bridge driver:
   - Remove dependency on ARM-specific struct hw_pci (Jayachandran C)
   - Build setup-irq.o for arm64 (Jayachandran C)
   - Add arm64 support (Jayachandran C)

  APM X-Gene host bridge driver:
   - Add APM X-Gene PCIe 64-bit prefetchable window (Duc Dang)
   - Add support for a 64-bit prefetchable memory window (Duc Dang)
   - Drop owner assignment from platform_driver (Krzysztof Kozlowski)

  Broadcom iProc host bridge driver:
   - Allow BCMA bus driver to be built as module (Hauke Mehrtens)
   - Delete unnecessary checks before phy calls (Markus Elfring)
   - Add arm64 support (Ray Jui)

  Synopsys DesignWare host bridge driver:
   - Don't complain missing *config* reg space if va_cfg0 is set (Murali Karicheri)

  TI DRA7xx host bridge driver:
   - Disable pm_runtime on get_sync failure (Kishon Vijay Abraham I)
   - Add PM support (Kishon Vijay Abraham I)
   - Clear MSE bit during suspend so clocks will idle (Kishon Vijay Abraham I)
   - Add support to make GPIO drive PERST# line (Kishon Vijay Abraham I)

  Xilinx AXI host bridge driver:
   - Check for MSI interrupt flag before handling as INTx (Russell Joyce)

  Miscellaneous:
   - Fix Intersil/Techwell TW686[4589] AV capture class code (Krzysztof Hałasa)
   - Use PCI_CLASS_SERIAL_USB instead of bare number (Bjorn Helgaas)
   - Fix generic NCR 53c810 class code quirk (Bjorn Helgaas)
   - Fix TI816X class code quirk (Bjorn Helgaas)
   - Remove unused "pci_probe" flags (Bjorn Helgaas)
   - Host bridge driver code simplifications (Fabio Estevam)
   - Add dev_flags bit to access VPD through function 0 (Mark Rustad)
   - Add VPD function 0 quirk for Intel Ethernet devices (Mark Rustad)
   - Kill off set_irq_flags() usage (Rob Herring)
   - Remove Intel Cherrytrail D3 delays (Srinidhi Kasagar)
   - Clean up pci_find_capability() (Wei Yang)"

* tag 'pci-v4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (72 commits)
  PCI: Disable async suspend/resume for JMicron multi-function SATA/AHCI
  PCI: Set MPS to match upstream bridge
  PCI: Move MPS configuration check to pci_configure_device()
  PCI: Drop references acquired by of_parse_phandle()
  PCI/MSI: Remove unused pcibios_msi_controller() hook
  ARM/PCI: Remove msi_controller from struct pci_sys_data
  ARM/PCI, designware, xilinx: Use pci_scan_root_bus_msi()
  PCI: Add pci_scan_root_bus_msi()
  ARM/PCI: Replace panic with WARN messages on failures
  PCI: generic: Add arm64 support
  PCI: Build setup-irq.o for arm64
  PCI: generic: Remove dependency on ARM-specific struct hw_pci
  PCI: imx6: Simplify a trivial if-return sequence
  PCI: spear: Use BUG_ON() instead of condition followed by BUG()
  PCI: dra7xx: Remove unneeded use of IS_ERR_VALUE()
  PCI: Remove pci_ats_enabled()
  PCI: Stop caching ATS Invalidate Queue Depth
  PCI: Move ATS declarations to linux/pci.h so they're all together
  PCI: Clean up ATS error handling
  PCI: Use pci_physfn() rather than looking up physfn by hand
  ...
2015-08-31 17:14:39 -07:00
Ingo Molnar
0050ae57cd Merge branch 'x86/cpufeature' into x86/urgent, because it's ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-31 19:47:03 +02:00
Linus Torvalds
1c00038c76 Char/Misc driver patches for 4.3-rc1
Here's the "big" char/misc driver update for 4.3-rc1.
 
 Not much really interesting here, just a number of little changes all
 over the place, and some nice consolidation of the nvmem drivers to a
 common framework.  As usual, the mei drivers stand out as the largest
 "churn" to handle new devices and features in their hardware.
 
 All have been in linux-next for a while with no issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlXV844ACgkQMUfUDdst+ymYfQCgmDKjq3fsVHCxNZPxnukFYzvb
 xZkAnRb8fuub5gVQFP29A+rhyiuWD13v
 =Bq9K
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc driver patches from Greg KH:
 "Here's the "big" char/misc driver update for 4.3-rc1.

  Not much really interesting here, just a number of little changes all
  over the place, and some nice consolidation of the nvmem drivers to a
  common framework.  As usual, the mei drivers stand out as the largest
  "churn" to handle new devices and features in their hardware.

  All have been in linux-next for a while with no issues"

* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
  auxdisplay: ks0108: initialize local parport variable
  extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
  extcon: palmas: Support GPIO based USB ID detection
  extcon: Fix signedness bugs about break error handling
  extcon: Drop owner assignment from i2c_driver
  extcon: arizona: Simplify pdata symantics for micd_dbtime
  extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
  extcon: Add exception handling to prevent the NULL pointer access
  extcon: arizona: Ensure variables are set for headphone detection
  extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
  extcon: arizona: Add basic microphone detection DT/ACPI bindings
  extcon: arizona: Update to use the new device properties API
  extcon: palmas: Remove the mutually_exclusive array
  extcon: Remove optional print_state() function pointer of struct extcon_dev
  extcon: Remove duplicate header file in extcon.h
  extcon: max77843: Clear IRQ bits state before request IRQ
  toshiba laptop: replace ioremap_cache with ioremap
  misc: eeprom: max6875: clean up max6875_read()
  misc: eeprom: clean up eeprom_read()
  misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
  ...
2015-08-31 08:34:13 -07:00
Linus Torvalds
44e98edcd1 A very small release for x86 and s390 KVM.
s390: timekeeping changes, cleanups and fixes
 
 x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups.
 
 One interesting feature that was planned for 4.3 (emulating the local
 APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to
 be delayed because Intel complained about my reading of the manual.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJVznW4AAoJEL/70l94x66Dt+gH/3vydhh6kv+mKhnR+kADaGfM
 gaunw0CUpJLU6gkOkYOm5M32WGhsT9Hd3WtRTJO6PhSo7cQ88hMx24u4XAffoewo
 Os5tDwAaHeV2enVSTri6xX8e2F2mgPDghGcYJPUBwnmMjRzZ8tj2VHUcbxqVT6Pb
 pX3V8ZxOZ81+ACZU2tdNRzLUd2H1v4d74gtVS7ove1Vb0CvPOBdHf1KQuUCUa2Pi
 73fvnaEuSaFYtSWZIP1PYxLnsQHpApH3Kco/5kHeqUPpYaGa/g2bnfncHRw20Svr
 gb3opwbfyiq91xfGbRVR3+E63Cw4G6aTl5MDNv9UFJ+xFKuj8WJ72xXXTSwzUi4=
 =HgT+
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "A very small release for x86 and s390 KVM.

   - s390: timekeeping changes, cleanups and fixes

   - x86: support for Hyper-V MSRs to report crashes, and a bunch of
     cleanups.

  One interesting feature that was planned for 4.3 (emulating the local
  APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to
  be delayed because Intel complained about my reading of the manual"

* tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits)
  x86/kvm: Rename VMX's segment access rights defines
  KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn
  kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic
  KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully
  KVM: VMX: drop ept misconfig check
  KVM: MMU: fully check zero bits for sptes
  KVM: MMU: introduce is_shadow_zero_bits_set()
  KVM: MMU: introduce the framework to check zero bits on sptes
  KVM: MMU: split reset_rsvds_bits_mask_ept
  KVM: MMU: split reset_rsvds_bits_mask
  KVM: MMU: introduce rsvd_bits_validate
  KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c
  KVM: MMU: fix validation of mmio page fault
  KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
  KVM: s390: host STP toleration for VMs
  KVM: x86: clean/fix memory barriers in irqchip_in_kernel
  KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus
  KVM: x86: remove unnecessary memory barriers for shared MSRs
  KVM: move code related to KVM_SET_BOOT_CPU_ID to x86
  KVM: s390: log capability enablement and vm attribute changes
  ...
2015-08-31 08:27:44 -07:00
Ingo Molnar
02b643b643 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-31 10:25:26 +02:00
Dan Williams
96601adb74 x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
Given that a write-back (WB) mapping plus non-temporal stores is
expected to be the most efficient way to access PMEM, update the
definition of ARCH_HAS_PMEM_API to imply arch support for
WB-mapped-PMEM.  This is needed as a pre-requisite for adding PMEM to
the direct map and mapping it with struct page.

The above clarification for X86_64 means that memcpy_to_pmem() is
permitted to use the non-temporal arch_memcpy_to_pmem() rather than
needlessly fall back to default_memcpy_to_pmem() when the pcommit
instruction is not available.  When arch_memcpy_to_pmem() is not
guaranteed to flush writes out of cache, i.e. on older X86_32
implementations where non-temporal stores may just dirty cache,
ARCH_HAS_PMEM_API is simply disabled.

The default fall back for persistent memory handling remains.  Namely,
map it with the WT (write-through) cache-type and hope for the best.

arch_has_pmem_api() is updated to only indicate whether the arch
provides the proper helpers to meet the minimum "writes are visible
outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()".  Code
that cares whether wmb_pmem() actually flushes writes to pmem must now
call arch_has_wmb_pmem() directly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
[hch: set ARCH_HAS_PMEM_API=n on x86_32]
Reviewed-by: Christoph Hellwig <hch@lst.de>
[toshi: x86_32 compile fixes]
Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:40:59 -04:00
Ross Zwisler
67a3e8fe90 nd_blk: change aperture mapping from WC to WB
This should result in a pretty sizeable performance gain for reads.  For
rough comparison I did some simple read testing using PMEM to compare
reads of write combining (WC) mappings vs write-back (WB).  This was
done on a random lab machine.

PMEM reads from a write combining mapping:
	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
	100000+0 records in
	100000+0 records out
	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s

PMEM reads from a write-back mapping:
	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
	1000000+0 records in
	1000000+0 records out
	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s

To be able to safely support a write-back aperture I needed to add
support for the "read flush" _DSM flag, as outlined in the DSM spec:

http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

This flag tells the ND BLK driver that it needs to flush the cache lines
associated with the aperture after the aperture is moved but before any
new data is read.  This ensures that any stale cache lines from the
previous contents of the aperture will be discarded from the processor
cache, and the new data will be read properly from the DIMM.  We know
that the cache lines are clean and will be discarded without any
writeback because either a) the previous aperture operation was a read,
and we never modified the contents of the aperture, or b) the previous
aperture operation was a write and we must have written back the dirtied
contents of the aperture to the DIMM before the I/O was completed.

In order to add support for the "read flush" flag I needed to add a
generic routine to invalidate cache lines, mmio_flush_range().  This is
protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
only supported on x86.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2015-08-27 19:38:28 -04:00
Ingo Molnar
8d58b66ed2 Linux 4.2-rc8
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJV2pUkAAoJEHm+PkMAQRiGCIoH/Rb29ZjdCoZJp9OtnjAG+qRc
 bG3YuomIdib86x7xHRKKaLWBa7din7IYjuwT/X4S4duO5a1R5Lp1sRG3IlGfhT0W
 nBNbjFl4q4bOyiTPtTRTYyh4g5UQv4IuyCnCmZyCTJyVi/O6HVM9TWKUzm68P2dJ
 30LwLUcQJ+mHueGJwFBAXe2BaojEpvYCdSX6tvbrQ/8X3FrVExZXuJl4uMYNFYNK
 ZwG/v5t7tYOiAe76JGbrEuVFPZWLPEW7amHOWR0T4Ye4nWTlBgx7fENiNRlfgcvI
 CM16l/xkyrZQ3Q5jZy1qYDfdHYF++dyEDysX4w1ae/X0aaLZn7l+u5VQD6WpkQQ=
 =IF6I
 -----END PGP SIGNATURE-----

Merge tag 'v4.2-rc8' into x86/mm, before applying new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-25 09:59:19 +02:00
Rafael J. Wysocki
82bb70c599 Merge branch 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux into pm-tools
Pull turbostat changes for v4.3 from Len Brown.

* 'turbostat' of https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: fix typo on DRAM column in Joules-mode
  tools/power turbostat: fix parameter passing for forked command
  tools/power turbostat: dump CONFIG_TDP
  tools/power turbostat: cpu0 is no longer hard-coded, so  update output
  tools/power turbostat: update turbostat(8)
2015-08-24 23:10:02 +02:00
Andy Lutomirski
47edb65178 x86/asm/msr: Make wrmsrl() a function
As of cf991de2f6 ("x86/asm/msr: Make wrmsrl_safe() a
function"), wrmsrl_safe is a function, but wrmsrl is still a
macro.  The wrmsrl macro performs invalid shifts if the value
argument is 32 bits. This makes it unnecessarily awkward to
write code that puts an unsigned long into an MSR.

To make this work, syscall_init needs tweaking to stop passing
a function pointer to wrmsrl.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/690f0c629a1085d054e2d1ef3da073cfb3f7db92.1437678821.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-23 13:25:38 +02:00
Andrey Ryabinin
920e277e17 x86/kasan: Define KASAN_SHADOW_OFFSET per architecture
Current definition of  KASAN_SHADOW_OFFSET in
include/linux/kasan.h will not work for upcomming arm64, so move
it to the arch header.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Keitel <dkeitel@codeaurora.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yury <yury.norov@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1439444244-26057-2-git-send-email-ryabinin.a.a@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:54:55 +02:00
Huang Rui
b466bdb614 x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer
MWAITX can enable a timer and a corresponding timer value
specified in SW P0 clocks. The SW P0 frequency is the same as
TSC. The timer provides an upper bound on how long the
instruction waits before exiting.

This way, a delay function in the kernel can leverage that
MWAITX timer of MWAITX.

When a CPU core executes MWAITX, it will be quiesced in a
waiting phase, diminishing its power consumption. This way, we
can save power in comparison to our default TSC-based delays.

A simple test shows that:

	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
	$ sleep 10000s
	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc

Results:

	* TSC-based default delay:      485115 uWatts average power
	* MWAITX-based delay:           252738 uWatts average power

Thus, that's about 240 milliWatts less power consumption. The
test method relies on the support of AMD CPU accumulated power
algorithm in fam15h_power for which patches are forthcoming.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Suggested-by: Borislav Petkov <bp@suse.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Huang Rui <ray.huang@amd.com>
[ Fix delay truncation. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hector Marco-Gisbert <hecmargi@upv.es>
Cc: Jacob Shin <jacob.w.shin@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:52:16 +02:00
Huang Rui
f96756746c x86/asm: Add MONITORX/MWAITX instruction support
AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new
feature called MWAITX (MWAIT with extensions) as an extension to
MONITOR/MWAIT.

This new instruction controls a configurable timer which causes
the core to exit wait state on timer expiration, in addition to
"normal" MWAIT condition of reading from a monitored VA.

Compared to MONITOR/MWAIT, there are minor differences in opcode
and input parameters:

MWAITX ECX[1]: enable timer if set
MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks ==
TSC. The software P0 frequency is the same as the TSC frequency.

                MWAIT                           MWAITX
opcode          0f 01 c9           |            0f 01 fb
ECX[0]                  value of RFLAGS.IF seen by instruction
ECX[1]          unused/#GP if set  |            enable timer if set
ECX[31:2]                     unused/#GP if set
EAX                           unused (reserve for hint)
EBX[31:0]       unused             |            max wait time (SW P0 == TSC)

                MONITOR                         MONITORX
opcode          0f 01 c8           |            0f 01 fa
EAX                     (logical) address to monitor
ECX                     #GP if not zero

Max timeout = EBX/(TSC frequency)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Li <tony.li@amd.com>
Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-22 14:52:16 +02:00
Tim Chen
488ca7d72d x86/cpufeatures: Enable cpuid for Intel SHA extensions
Add Intel CPUID for Intel Secure Hash Algorithm Extensions. This feature
provides new instructions for accelerated computation of SHA-1 and SHA-256.
This allows the feature to be shown in the /proc/cpuinfo for cpus that
support it.

Refer to SHA extension programming guide in chapter 8.2 of the Intel
Architecture Instruction Set Extensions Programming reference
for definition of this feature's cpuid: CPUID.(EAX=07H, ECX=0):EBX.SHA [bit 29] = 1
https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf

Originally-by: Chandramouli Narayanan <mouli_7982@yahoo.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Link: http://lkml.kernel.org/r/1440194206.3940.6.camel@schen9-mobl2
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-08-22 11:17:31 +02:00