linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-10 05:46:40 +07:00

Author	SHA1	Message	Date
Linus Torvalds	b91e1302ad	mm: optimize PageWaiters bit use for unlock_page() In commit `6290602709` ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-29 11:03:15 -08:00
Thomas Gleixner	0dad3a3014	x86/mce/AMD: Make the init code more robust If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-26 17:30:24 -08:00
Al Viro	b4b8664d29	arm64: don't pull uaccess.h into .S Split asm-only parts of arm64 uaccess.h into a new header and use that from .S. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-12-26 13:05:17 -05:00
Larry Finger	8ae679c4bc	powerpc: Fix build warning on 32-bit PPC I am getting the following warning when I build kernel 4.9-git on my PowerBook G4 with a 32-bit PPC processor: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef] This problem is evident after commit `989cea5c14` ("kbuild: prevent lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an error that has been in the code since 2005 when this source file was created. That was with commit `9994a33865` ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S"). The offending line does not make a lot of sense. This error does not seem to cause any errors in the executable, thus I am not recommending that it be applied to any stable versions. Thanks to Nicholas Piggin for suggesting this solution. Fixes: `9994a33865` ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-25 16:12:20 -08:00
Linus Torvalds	3ddc76dfc7	Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t	2016-12-25 14:30:04 -08:00
Linus Torvalds	b272f732f8	Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine	2016-12-25 14:05:56 -08:00
Thomas Gleixner	8b0e195314	ktime: Cleanup ktime_set() usage ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>	2016-12-25 17:21:22 +01:00
Thomas Gleixner	a5a1d1c291	clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>	2016-12-25 11:04:12 +01:00
Thomas Gleixner	73c1b41e63	cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-25 10:47:44 +01:00
Thomas Gleixner	59fefd0890	x86/msr: Remove bogus cleanup from the error path The error cleanup which is invoked when the hotplug state setup failed tries to remove the failed state, which is broken. Fixes: `8fba38c937` ("x86/msr: Convert to hotplug state machine") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de>	2016-12-25 10:47:41 +01:00
Thomas Gleixner	834fcd2980	perf/x86/intel/cstate: Prevent hotplug callback leak If the pmu registration fails the registered hotplug callbacks are not removed. Wrong in any case, but fatal in case of a modular driver. Replace the nonsensical state names with proper ones while at it. Fixes: `77c34ef1c3` ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org	2016-12-25 10:47:40 +01:00
Thomas Gleixner	a051f220d6	ARM/imx/mmcd: Fix broken cpu hotplug handling The cpu hotplug support of this perf driver is broken in several ways: 1) It adds a instance before setting up the state. 2) The state for the instance is different from the state of the callback. It's just a randomly chosen state. 3) The instance registration is not error checked so nobody noticed that the call can never succeed. 4) The state for the multi install callbacks is chosen randomly and overwrites existing state. This is now prevented by the core code so the call is guaranteed to fail. 5) The error exit path in the init function leaves the instance registered and then frees the memory which contains the enqueued hlist node. 6) The remove function is removing the state and not the instance. Fix it by: - Setting up the state before adding instances. Use a dynamically allocated state for it. - Installing instances after the state has been set up - Removing the instance in the error path before freeing memory - Removing the instance not the state in the driver remove callback While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot be used in preemptible context, and set the driver data after successful registration of the pmu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Frank Li <frank.li@nxp.com> Cc: Zhengyu Shen <zhengyu.shen@nxp.com> Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-25 10:47:40 +01:00
Linus Torvalds	7c0f6ba682	Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-24 11:46:01 -08:00
Linus Torvalds	6ac3bb167f	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "There's a number of fixes: - a round of fixes for CPUID-less legacy CPUs - a number of microcode loader fixes - i8042 detection robustization fixes - stack dump/unwinder fixes - x86 SoC platform driver fixes - a GCC 7 warning fix - virtualization related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) Revert "x86/unwind: Detect bad stack return address" x86/paravirt: Mark unused patch_default label x86/microcode/AMD: Reload proper initrd start address x86/platform/intel/quark: Add printf attribute to imr_self_test_result() x86/platform/intel-mid: Switch MPU3050 driver to IIO x86/alternatives: Do not use sync_core() to serialize I$ x86/topology: Document cpu_llc_id x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic x86/asm: Rewrite sync_core() to use IRET-to-self x86/microcode/intel: Replace sync_core() with native_cpuid() Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 x86/tools: Fix gcc-7 warning in relocs.c x86/unwind: Dump stack data on warnings x86/unwind: Adjust last frame check for aligned function stacks x86/init: Fix a couple of comment typos x86/init: Remove i8042_detect() from platform ops Input: i8042 - Trust firmware a bit more when probing on X86 x86/init: Add i8042 state to the platform data ...	2016-12-23 16:54:46 -08:00
Linus Torvalds	00198dab3b	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's two x86 PMU driver fixes and a uprobes fix, plus on the tooling side there's a number of fixes and some late updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf sched timehist: Fix invalid period calculation perf sched timehist: Remove hardcoded 'comm_width' check at print_summary perf sched timehist: Enlarge default 'comm_width' perf sched timehist: Honour 'comm_width' when aligning the headers perf/x86: Fix overlap counter scheduling bug perf/x86/pebs: Fix handling of PEBS buffer overflows samples/bpf: Move open_raw_sock to separate header samples/bpf: Remove perf_event_open() declaration samples/bpf: Be consistent with bpf_load_program bpf_insn parameter tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Switch over to libbpf perf diff: Do not overwrite valid build id perf annotate: Don't throw error for zero length symbols perf bench futex: Fix lock-pi help string perf trace: Check if MAP_32BIT is defined (again) samples/bpf: Make perf_event_read() static uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation samples/bpf: Make samples more libbpf-centric tools lib bpf: Add flags to bpf_create_map() tools lib bpf: use __u32 from linux/types.h ...	2016-12-23 16:49:12 -08:00
Josh Poimboeuf	c280f7736a	Revert "x86/unwind: Detect bad stack return address" Revert the following commit: `b6959a3621` ("x86/unwind: Detect bad stack return address") ... because Andrey Konovalov reported an unwinder warning: WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467 The unwind was initiated from an interrupt which occurred while running in the generated code for a kprobe. The unwinder printed the warning because it expected regs->ip to point to a valid text address, but instead it pointed to the generated code. Eventually we may want come up with a way to identify generated kprobe code so the unwinder can know that it's a valid return address. Until then, just remove the warning. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-23 20:32:30 +01:00
Linus Torvalds	42e0372c0e	2nd round of ARC udpates for 4.10rc1 - Fix for aliasing VIPT dcache in old ARC700 cores - micro-optimization in ARC700 ProtV handler - Enable SG_CHAIN [Vladimir] - ARC HS38 core intc default to prio 1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYXHQ1AAoJEGnX8d3iisJeGpMP/2IIx1JxYklDRluSHkA3ekr3 OJBQrMBJbvRW2s4kD9brHgzZwMAJTjO4b7DGOLcd7w12fCRc0fuzRysyuW5x5Sky ovubNwneGBb2T4SDjI6VKibWBfOZARiyE5ROfHF4vmgAAALFUtGlztm4ZjyPy6Mw jN/rAQLyvNSv29gYmmOiq/lH68qJSmVzR6otBVpil4veleXIv5f093ujao2Egjb2 DPnzUBjTJ/QvmtWhinxQqdepq4G0Oeo//+lbDeWYFHIdYdwT0o0v6XzeTjr7whKI KKPPDRMMSzsNAuqGU1ufD+5E/oVruUNcnlepfbvfTgN+g3kXB0e6S+nFlkAKHcCG 3VcifR+2M73ZIS0Jhdx5Gb9uSvXA0sYg/GEhiV0PdCi86ntcp+SlpVs7POdyKmb8 zDvIZyFnLofWaOc8Oiugtb0kOfENFzrc8xXVRsLeECngAXQq8Eaz23yVqRLQaCwY uIFl7k0IVpIYLZrDOKOsN+eJrE2JW3eftZFfwktOwdA1JL6AqEvlXDWsjHTmIpcB pGJOYvIGFdlT74HSrKEnDKMlcP3dIqN888CBeNrck+wrqlGpvZWD4kytCiHpR95c xiXDQNZassmT9ispCRA/dxpPjUksxfhZgPiin93GiUSXk55WX5EomNbAiJz/oCqT pAv9Y10C3o+hifgV9r9F =1qMs -----END PGP SIGNATURE----- Merge tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull more ARC updates from Vineet Gupta: - Fix for aliasing VIPT dcache in old ARC700 cores - micro-optimization in ARC700 ProtV handler - Enable SG_CHAIN [Vladimir] - ARC HS38 core intc default to prio 1 * tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache ARC: mm: No need to save cache version in @cpuinfo ARC: enable SG chaining ARCv2: intc: default all interrupts to priority 1 ARCv2: entry: document intr disable in hard isr ARC: ARCompact entry: elide re-reading ECR in ProtV handler	2016-12-23 10:22:47 -08:00
Linus Torvalds	9be962d525	More ACPI updates for v4.10-rc1 - Move some Linux-specific functionality to upstream ACPICA and update the in-kernel users of it accordingly (Lv Zheng). - Drop a useless warning (triggered by the lack of an optional object) from the ACPI namespace scanning code (Zhang Rui). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYW90kAAoJEILEb/54YlRxS78P/RZwSOwkgdowIMGOlO7Hz5Xz 6Psjfc3QdGhVGOau4F3Irg7HIDsPaHE5+xijwujWFK0wh89tknrChrFDrOueND6K rZ4w72lYROQhtQl4bu9yXZ2TswP3I5aYERDPyqKfDAi+SaQBpZUtmWdoh0I/JN2M svyuzxoRlNglgp2GWPnnnKHFe6plEeZgKjgoGjPAufDHakqYvP0cQwY3i4+PpuYh prXb4Xr4fFvAG2MhM9ciT02ewrQJcYUgVj5bnuSBmMu0y0zUBJ1kK7abiQlf+lTy /BCqQbllhHrrXQD0zkqi2oBFqWL4Bnj9r+5zj03KmBsfoz3u+zXr5CApDei8Yc06 gGIZXX9aCqBirahkpsMFYPEQVPoPFCek0slXSyF5xKjCWYyocwgUtHnB87uIiSFO jCeSsWwT5IO9HnbsTf6x4xDWBdY6LOJgJyDirIGcNSUX+q4tfgn/LFrgN9Ck4M2g xkZn8e8H7u09ACX9l6dHZ1PCHlg7bWKeH6gcqdo5R4NOVeZxt9YDIz13ERsG3D17 JZvdrAbBdC1fXfHnOLBj/BRy+TFvieWOC+kHBTAnPQlnkr7NjXXv/vgWU8flVL/z kxlR2U2lD2XarG/b8OHvYxO2k/TDLRenN0IqqFmYbmyhH0dHYKGGBLY4TetAXKNL N4EouS+xUBulCP2XOI8N =55TG -----END PGP SIGNATURE----- Merge tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "Here are new versions of two ACPICA changes that were deferred previously due to a problem they had introduced, two cleanups on top of them and the removal of a useless warning message from the ACPI core. Specifics: - Move some Linux-specific functionality to upstream ACPICA and update the in-kernel users of it accordingly (Lv Zheng) - Drop a useless warning (triggered by the lack of an optional object) from the ACPI namespace scanning code (Zhang Rui)" * tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users ACPICA: Tables: Allow FADT to be customized with virtual address ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel ACPI: do not warn if _BQC does not exist	2016-12-22 10:19:32 -08:00
Linus Torvalds	eb254f323b	Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache allocation interface from Thomas Gleixner: "This provides support for Intel's Cache Allocation Technology, a cache partitioning mechanism. The interface is odd, but the hardware interface of that CAT stuff is odd as well. We tried hard to come up with an abstraction, but that only allows rather simple partitioning, but no way of sharing and dealing with the per package nature of this mechanism. In the end we decided to expose the allocation bitmaps directly so all combinations of the hardware can be utilized. There are two ways of associating a cache partition: - Task A task can be added to a resource group. It uses the cache partition associated to the group. - CPU All tasks which are not member of a resource group use the group to which the CPU they are running on is associated with. That allows for simple CPU based partitioning schemes. The main expected user sare: - Virtualization so a VM can only trash only the associated part of the cash w/o disturbing others - Real-Time systems to seperate RT and general workloads. - Latency sensitive enterprise workloads - In theory this also can be used to protect against cache side channel attacks" [ Intel RDT is "Resource Director Technology". The interface really is rather odd and very specific, which delayed this pull request while I was thinking about it. The pull request itself came in early during the merge window, I just delayed it until things had calmed down and I had more time. But people tell me they'll use this, and the good news is that it is _so_ specific that it's rather independent of anything else, and no user is going to depend on the interface since it's pretty rare. So if push comes to shove, we can just remove the interface and nothing will break ] * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/intel_rdt: Implement show_options() for resctrlfs x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount x86/intel_rdt: Fix setting of closid when adding CPUs to a group x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee x86/intel_rdt: Reset per cpu closids on unmount x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A x86/intel_rdt: Prevent deadlock against hotplug lock x86/intel_rdt: Protect info directory from removal x86/intel_rdt: Add info files to Documentation x86/intel_rdt: Export the minimum number of set mask bits in sysfs x86/intel_rdt: Propagate error in rdt_mount() properly x86/intel_rdt: Add a missing #include MAINTAINERS: Add maintainer for Intel RDT resource allocation x86/intel_rdt: Add scheduler hook x86/intel_rdt: Add schemata file x86/intel_rdt: Add tasks files x86/intel_rdt: Add cpus file x86/intel_rdt: Add mkdir to resctrl file system x86/intel_rdt: Add "info" files to resctrl file system ...	2016-12-22 09:25:45 -08:00
Peter Zijlstra	1134c2b5cb	perf/x86: Fix overlap counter scheduling bug Jiri reported the overlap scheduling exceeding its max stack. Looking at the constraint that triggered this, it turns out the overlap marker isn't needed. The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if the counter mask of such an event is not a subset of any other counter mask of a constraint with an equal or higher weight". Esp. that latter part is of interest here I think, our overlapping mask is 0x0e, that has 3 bits set and is the highest weight mask in on the PMU, therefore it will be placed last. Can we still create a scenario where we would need to rewind that? The scenario for AMD Fam15h is we're having masks like: 0x3F -- 111111 0x38 -- 111000 0x07 -- 000111 0x09 -- 001001 And we mark 0x09 as overlapping, because it is not a direct subset of 0x38 or 0x07 and has less weight than either of those. This means we'll first try and place the 0x09 event, then try and place 0x38/0x07 events. Now imagine we have: 3 * 0x07 + 0x09 and the initial pick for the 0x09 event is counter 0, then we'll fail to place all 0x07 events. So we'll pop back, try counter 4 for the 0x09 event, and then re-try all 0x07 events, which will now work. The masks on the PMU in question are: 0x01 - 0001 0x03 - 0011 0x0e - 1110 0x0c - 1100 But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 -> 0x1) are of heavier weight, it should all work out. Reported-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Liang Kan <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-22 17:45:43 +01:00
Stephane Eranian	daa864b8f8	perf/x86/pebs: Fix handling of PEBS buffer overflows This patch solves a race condition between PEBS and the PMU handler. In case multiple PEBS events are sampled at the same time, it is possible to have GLOBAL_STATUS bit 62 set indicating PEBS buffer overflow and also seeing at most 3 PEBS counters having their bits set in the status register. This is a sign that there was at least one PEBS record pending at the time of the PMU interrupt. PEBS counters must only be processed via the drain_pebs() calls, and not via the regular sample processing loop coming after that the function, otherwise phony regular samples may be generated in the sampling buffer not marked with the EXACT tag. Another possibility is to have one PEBS event and at least one non-PEBS event whic hoverflows while PEBS has armed. In this case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow status bit for the PEBS counter will be on Skylake. To avoid this problem, we systematically ignore the PEBS-enabled counters from the GLOBAL_STATUS mask and we always process PEBS events via drain_pebs(). The problem manifested itself by having non-exact samples when sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would not have the EXACT flag set. Note that this problem is only present on Skylake processor. This fix is harmless on older processors. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-22 17:45:36 +01:00
Peter Zijlstra	cef4402d76	x86/paravirt: Mark unused patch_default label A bugfix commit: `45dbea5f55` ("x86/paravirt: Fix native_patch()") ... introduced a harmless warning: arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch': arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label] Fix it by annotating the label as __maybe_unused. Reported-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Piotr Gregor <piotrgregor@rsyncme.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `45dbea5f55` ("x86/paravirt: Fix native_patch()") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-22 17:43:35 +01:00
Rafael J. Wysocki	c8e008e2a6	Merge branches 'acpica' and 'acpi-scan' * acpica: ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users ACPICA: Tables: Allow FADT to be customized with virtual address ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel * acpi-scan: ACPI: do not warn if _BQC does not exist	2016-12-22 14:34:24 +01:00
Linus Torvalds	0c961c5511	Merge branch 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - add Kernel address space layout randomization support - re-enable interrupts earlier now that we have a working IRQ stack - optimize the timer interrupt function to better cope with missed timer irqs - fix error return code in parisc perf code (by Dan Carpenter) - fix PAT debug code * 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Optimize timer interrupt function parisc: perf: return -EFAULT on error parisc: Enhance CPU detection code on PAT machines parisc: Re-enable interrupts early parisc: Enable KASLR	2016-12-21 10:47:13 -08:00
Borislav Petkov	8877ebdd3f	x86/microcode/AMD: Reload proper initrd start address When we switch to virtual addresses and, especially after reserve_initrd()->relocate_initrd() have run, we have the updated initrd address in initrd_start. Use initrd_start then instead of the address which has been passed to us through boot params. (That still gets used when we're running the very early routines on the BSP). Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-21 10:50:04 +01:00
Lv Zheng	8d3523fb3b	ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() Since all users are cleaned up, remove the 2 deprecated APIs due to no users. As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap is renamed to acpi_permanent_mmap to have a consistent coding style across entire Linux ACPI subsystem. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-21 02:36:38 +01:00
Lv Zheng	6b11d1d677	ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users This patch removes the users of the deprectated APIs: acpi_get_table_with_size() early_acpi_os_unmap_memory() The following APIs should be used instead of: acpi_get_table() acpi_put_table() The deprecated APIs are invented to be a replacement of acpi_get_table() during the early stage so that the early mapped pointer will not be stored in ACPICA core and thus the late stage acpi_get_table() won't return a wrong pointer. The mapping size is returned just because it is required by early_acpi_os_unmap_memory() to unmap the pointer during early stage. But as the mapping size equals to the acpi_table_header.length (see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when such a convenient result is returned, driver code will start to use it instead of accessing acpi_table_header to obtain the length. Thus this patch cleans up the drivers by replacing returned table size with acpi_table_header.length, and should be a no-op. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-21 02:36:38 +01:00
Linus Torvalds	ba6d973f78	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes and cleanups from David Miller: 1) Use rb_entry() instead of hardcoded container_of(), from Geliang Tang. 2) Use correct memory barriers in stammac driver, from Pavel Machek. 3) Fix assoc bind address handling in SCTP, from Xin Long. 4) Make the length check for UFO handling consistent between __ip_append_data() and ip_finish_output(), from Zheng Li. 5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo Li. 6) Handle devm_ioremap() errors properly in cavium driver, from Arvind Yadav. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits) RDS: use rb_entry() net_sched: sch_netem: use rb_entry() net_sched: sch_fq: use rb_entry() net/mlx5: use rb_entry() ethernet: sfc: Add Kconfig entry for vendor Solarflare sctp: not copying duplicate addrs to the assoc's bind address list sctp: reduce indent level in sctp_copy_local_addr_list ARM: dts: hix5hd2: don't change the existing compatible string net: hix5hd2_gmac: fix compatible strings name openvswitch: Add a missing break statement. net: netcp: ethss: fix 10gbe host port tx pri map configuration net: netcp: ethss: fix errors in ethtool ops fsl/fman: enable compilation on ARM64 fsl/fman: A007273 only applies to PPC SoCs powerpc: fsl/fman: remove fsl,fman from of_device_ids[] fsl/fman: fix 1G support for QSGMII interfaces dt: bindings: net: use boolean dt properties for eee broken modes net: phy: use boolean dt properties for eee broken modes net: phy: fix sign type error in genphy_config_eee_advert ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output ...	2016-12-20 15:48:34 -08:00
Linus Torvalds	3eb86259ec	Merge branch 'akpm' (patches from Andrew) Merge final set of updates from Andrew Morton: - a series to make IMA play better across kexec - a handful of random fixes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text ratelimit: fix WARN_ON_RATELIMIT return value kcov: make kcov work properly with KASLR enabled arm64: setup: introduce kaslr_offset() mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED ima: platform-independent hash value ima: define a canonical binary_runtime_measurements list format ima: support restoring multiple template formats ima: store the builtin/custom template definitions in a list ima: on soft reboot, save the measurement list powerpc: ima: send the kexec buffer to the next kernel ima: maintain memory size needed for serializing the measurement list ima: permit duplicate measurement list entries ima: on soft reboot, restore the measurement list powerpc: ima: get the kexec buffer passed by the previous kernel	2016-12-20 15:24:32 -08:00
Linus Torvalds	d5379e5edd	Microblaze patches for 4.10-rc1 - Wire-up new syscalls - Add new codes and fpga families - Fix return value -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEABECAAYFAlhZKicACgkQykllyylKDCFe5wCfYFkwbBhSO34IykNdK5m2Iipc u4EAn3+7AAp4k4nTf92vtQI6VCWw6245 =5PYH -----END PGP SIGNATURE----- Merge tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze Pull arch/microblaze updates from Michal Simek: - wire-up new syscalls - add new codes and fpga families - fix a return value * tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Add new fpga families microblaze: Add missing release version code v9.6 and v10 microblaze: Add missing syscalls microblaze: Fix return value from xilinx_timer_init	2016-12-20 15:16:00 -08:00
Linus Torvalds	ec92b88c3c	Xtensa improvements for 4.10: - enable HAVE_DMA_CONTIGUOUS, configure shared DMA pool reservation in kc705 DTS; - update xtensa DMA-related Documentation/features entries; - clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it, remove unused declarations, fix screen_info definition. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYWCNXAAoJEFH5zJH4P6BEpMIP/1eW0U1oCXGU/xMb1h+YEGnb YKPeGLxnxH9C9H7RaGoG40YQTHNc1tpeI6R/1+WzNzUx3d08qVHKPrTxoM/DWOM5 erkd0xq0k9/9uZVCuwC7bAMvHlAYGXegiFUy27SVV8WE0P2EXnvP4PS0jGQfnIWZ PLkGfTjYte321Px25LQnmVW8vwY9Y/vS0UUBKbFEeXd3UWJ2M8coAzi4TfEc5m/U 8LVeypSEnNED/IBIArtFFV5RWRBxz0vtbUYWWnNWHQnZ2g6YA9Vx/64n2f4mZJp6 rxyTrsDMJ+OnXUXsCsxVR+O3RjBiDJarVbfvm+Ug3B++8NSFkUf41nZYMkYoA845 Akj5o5W/BqCFAmOetkQDmqEuJQ0HH+C9rqJ5hN0utz5cRzYf08h5t21YJ64HarTf 3K3f5G4WQiKB1i+ZIF3mPR6Oi7qIoGi3y/9UVYWvBIlDdB6i6mx8SyIujLxnxCwN tm1jospeDzMSCk7wwrNxvqXNzJ98D2zo1PNB+6bZToOAEXAxNLUsD3lO2q41GWJU +S1YHiMJSyBnMdGSh1drx72slxBHZI6UvHeb9zBqcqZJxlLnwM6COlNePuyYUOOk 1w5Z8V4liH4RO9DBAieVeB1pYNLg3317TbzN1YjqgUrevYzKN9R5T6GbdirMO+d1 h3jWKmKDlleCy+qWYkZ5 =eMKb -----END PGP SIGNATURE----- Merge tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa Pull Xtensa updates from Max Filippov: - enable HAVE_DMA_CONTIGUOUS, configure shared DMA pool reservation in kc705 DTS - update xtensa DMA-related Documentation/features entries - clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it, remove unused declarations, fix screen_info definition * tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: update DMA-related Documentation/features entries xtensa: configure shared DMA pool reservation in kc705 DTS xtensa: enable HAVE_DMA_CONTIGUOUS xtensa: move S32C1I self-test to a separate file xtensa: fix screen_info, clean up unused declarations in setup.c	2016-12-20 14:48:53 -08:00
Helge Deller	160494d381	parisc: Optimize timer interrupt function Restructure the timer interrupt function to better cope with missed timer irqs. Optimize the calculation when the next interrupt should happen and skip irqs if they would happen too shortly after exit of the irq function. The update_process_times() call is done anyway at every timer irq, so we can safely drop the prof_counter and prof_multiplier variables from the per_cpu structure. Signed-off-by: Helge Deller <deller@gmx.de>	2016-12-20 21:39:40 +01:00
Dongpo Li	48fed73ab6	ARM: dts: hix5hd2: don't change the existing compatible string The SoC hix5hd2 compatible string has the suffix "-gmac" and we should not change it. We should only add the generic compatible string "hisi-gmac-v1". Fixes: `0855950ba5` ("ARM: dts: hix5hd2: add gmac generic compatible and clock names") Signed-off-by: Dongpo Li <lidongpo@hisilicon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-20 14:12:29 -05:00
Madalin Bucur	ae6021d4fc	powerpc: fsl/fman: remove fsl,fman from of_device_ids[] The fsl/fman drivers will use of_platform_populate() on all supported platforms. Call of_platform_populate() to probe the FMan sub-nodes. Signed-off-by: Igal Liberman <igal.liberman@freescale.com> Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-12-20 13:55:34 -05:00
Alexander Popov	7ede8665f2	arm64: setup: introduce kaslr_offset() Introduce kaslr_offset() similar to x86_64 to fix kcov. [ Updated by Will Deacon ] Link: http://lkml.kernel.org/r/1481417456-28826-2-git-send-email-alex.popov@linux.com Signed-off-by: Alexander Popov <alex.popov@linux.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Jon Masters <jcm@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-20 09:48:46 -08:00
Thiago Jung Bauermann	ab6b1d1fc4	powerpc: ima: send the kexec buffer to the next kernel The IMA kexec buffer allows the currently running kernel to pass the measurement list via a kexec segment to the kernel that will be kexec'd. This is the architecture-specific part of setting up the IMA kexec buffer for the next kernel. It will be used in the next patch. Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-20 09:48:44 -08:00
Thiago Jung Bauermann	467d278249	powerpc: ima: get the kexec buffer passed by the previous kernel Patch series "ima: carry the measurement list across kexec", v8. The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and then restored on the subsequent boot, possibly of a different architecture. The existing securityfs binary_runtime_measurements file conveniently provides a serialized format of the IMA measurement list. This patch set serializes the measurement list in this format and restores it. Up to now, the binary_runtime_measurements was defined as architecture native format. The assumption being that userspace could and would handle any architecture conversions. With the ability of carrying the measurement list across kexec, possibly from one architecture to a different one, the per boot architecture information is lost and with it the ability of recalculating the template digest hash. To resolve this problem, without breaking the existing ABI, this patch set introduces the boot command line option "ima_canonical_fmt", which is arbitrarily defined as little endian. The need for this boot command line option will be limited to the existing version 1 format of the binary_runtime_measurements. Subsequent formats will be defined as canonical format (eg. TPM 2.0 support for larger digests). A simplified method of Thiago Bauermann's "kexec buffer handover" patch series for carrying the IMA measurement list across kexec is included in this patch set. The simplified method requires all file measurements be taken prior to executing the kexec load, as subsequent measurements will not be carried across the kexec and restored. This patch (of 10): The IMA kexec buffer allows the currently running kernel to pass the measurement list via a kexec segment to the kernel that will be kexec'd. The second kernel can check whether the previous kernel sent the buffer and retrieve it. This is the architecture-specific part which enables IMA to receive the measurement list passed by the previous kernel. It will be used in the next patch. The change in machine_kexec_64.c is to factor out the logic of removing an FDT memory reservation so that it can be used by remove_ima_buffer. Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-12-20 09:48:40 -08:00
Nicolas Iooss	9120cf4fd9	x86/platform/intel/quark: Add printf attribute to imr_self_test_result() __printf() attributes help detecting issues in printf() format strings at compile time. Even though imr_selftest.c is only compiled with CONFIG_DEBUG_IMR_SELFTEST=y, GCC complains about a missing format attribute when compiling allmodconfig with -Wmissing-format-attribute. Silence this warning by adding the attribute. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Acked-by: Bryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161219132144.4108-1-nicolas.iooss_linux@m4x.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:37:24 +01:00
Linus Walleij	634b847b6d	x86/platform/intel-mid: Switch MPU3050 driver to IIO The Intel Mid goes in and creates a I2C device for the MPU3050 if the input driver for MPU-3050 is activated. As of commit: `3904b28efb` ("iio: gyro: Add driver for the MPU-3050 gyroscope") .. there is a proper and fully featured IIO driver for this device, so deprecate the use of the incomplete input driver by augmenting the device population code to react to the presence of the IIO driver's Kconfig symbol instead. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Jonathan Cameron <jic23@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1481722794-4348-1-git-send-email-linus.walleij@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:37:15 +01:00
Borislav Petkov	34bfab0eaf	x86/alternatives: Do not use sync_core() to serialize I$ We use sync_core() in the alternatives code to stop speculative execution of prefetched instructions because we are potentially changing them and don't want to execute stale bytes. What it does on most machines is call CPUID which is a serializing instruction. And that's expensive. However, the instruction cache is serialized when we're on the local CPU and are changing the data through the same virtual address. So then, we don't need the serializing CPUID but a simple control flow change. Last being accomplished with a CALL/RET which the noinline causes. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161203150258.vwr5zzco7ctgc4pe@pd.tnic Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:36:42 +01:00
Vitaly Kuznetsov	59107e2f48	x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic There is a feature in Hyper-V ('Debug-VM --InjectNonMaskableInterrupt') which injects NMI to the guest. We may want to crash the guest and do kdump on this NMI by enabling unknown_nmi_panic. To make kdump succeed we need to allow the kdump kernel to re-establish VMBus connection so it will see VMBus devices (storage, network,..). To properly unload VMBus making it possible to start over during kdump we need to do the following: - Send an 'unload' message to the hypervisor. This can be done on any CPU so we do this the crashing CPU. - Receive the 'unload finished' reply message. WS2012R2 delivers this message to the CPU which was used to establish VMBus connection during module load and this CPU may differ from the CPU sending 'unload'. Receiving a VMBus message means the following: - There is a per-CPU slot in memory for one message. This slot can in theory be accessed by any CPU. - We get an interrupt on the CPU when a message was placed into the slot. - When we read the message we need to clear the slot and signal the fact to the hypervisor. In case there are more messages to this CPU pending the hypervisor will deliver the next message. The signaling is done by writing to an MSR so this can only be done on the appropriate CPU. To avoid doing cross-CPU work on crash we have vmbus_wait_for_unload() function which checks message slots for all CPUs in a loop waiting for the 'unload finished' messages. However, there is an issue which arises when these conditions are met: - We're crashing on a CPU which is different from the one which was used to initially contact the hypervisor. - The CPU which was used for the initial contact is blocked with interrupts disabled and there is a message pending in the message slot. In this case we won't be able to read the 'unload finished' message on the crashing CPU. This is reproducible when we receive unknown NMIs on all CPUs simultaneously: the first CPU entering panic() will proceed to crash and all other CPUs will stop themselves with interrupts disabled. The suggested solution is to handle unknown NMIs for Hyper-V guests on the first CPU which gets them only. This will allow us to rely on VMBus interrupt handler being able to receive the 'unload finish' message in case it is delivered to a different CPU. The issue is not reproducible on WS2016 as Debug-VM delivers NMI to the boot CPU only, WS2012R2 and earlier Hyper-V versions are affected. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: devel@linuxdriverproject.org Cc: Haiyang Zhang <haiyangz@microsoft.com> Link: http://lkml.kernel.org/r/20161202100720.28121-1-vkuznets@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-12-20 09:31:48 +01:00
Vineet Gupta	08fe007968	ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache An ARC700 customer reported linux boot crashes when upgrading to bigger L1 dcache (64K from 32K). Turns out they had an aliasing VIPT config and current code only assumed 2 colours, while theirs had 4. So default to 4 colours and complain if there are fewer. Ideally this needs to be a Kconfig option, but heck that's too much of hassle for a single user. Cc: stable@vger.kernel.org Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-12-19 11:55:17 -08:00
Vineet Gupta	f64915be2d	ARC: mm: No need to save cache version in @cpuinfo Historical MMU revisions have been paired with Cache revision updates which are captured in MMU and Cache Build Configuration Registers respectively. This was used in boot code to check for configurations mismatches, speically in simulations (such as running with non existent caches, non pairing MMU and Cache version etc). This can instead be inferred from other cache params such as line size. So remove @ver from post processed @cpuinfo which could be used later to save soem other interesting info. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>	2016-12-19 11:54:41 -08:00
Linus Torvalds	45d36906e2	Early fixes for x86. Instead of the (botched) revert, the lockdep/might_sleep splat has a real fix provided by Andrea. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJYV/xYAAoJEL/70l94x66DalgH/0dMqN9nGDJfCR3tmEJEs4AH WiFFsu7rHfXac2QeO5Eq30XLFghQjlcj9GkBYgF33wU/VMQ7xcDje+lHKQZ8e4b/ gc4BSwDGPpWW6IXfVVeMz9Zxfi7WE6J03kQQ+uxDbwaF65AsrSQQq5SCgcmo8Gof SS138Ws6v5/Us/UAJNI8x5xhcJnE4p4YV2yzTIJkKmZDEAYY9Y9CuKd3na4lRjoc Seuigldo1zY6oYF1Xqwb1efik1xTH4oJdmG8dnf1fvBw0P70Gs4s1eFXm6TFQhTK mdZxfxMw0YtzxroAMwXNjPcTy+Sn4Zx47dpHMspnN2cPePw53YfPS25en9/W09g= =jQbY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Early fixes for x86. Instead of the (botched) revert, the lockdep/might_sleep splat has a real fix provided by Andrea" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) kvm: take srcu lock around kvm_steal_time_set_preempted() kvm: fix schedule in atomic in kvm_steal_time_set_preempted() KVM: hyperv: fix locking of struct kvm_hv fields KVM: x86: Expose Intel AVX512IFMA/AVX512VBMI/SHA features to guest. kvm: nVMX: Correct a VMX instruction error code for VMPTRLD	2016-12-19 08:21:29 -08:00
Jim Mattson	ef85b67385	kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) When L2 exits to L0 due to "exception or NMI", software exceptions (#BP and #OF) for which L1 has requested an intercept should be handled by L1 rather than L0. Previously, only hardware exceptions were forwarded to L1. Signed-off-by: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-19 16:05:31 +01:00
Andrea Arcangeli	cc0d907c09	kvm: take srcu lock around kvm_steal_time_set_preempted() kvm_memslots() will be called by kvm_write_guest_offset_cached() so take the srcu lock. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-19 15:45:15 +01:00
Andrea Arcangeli	931f261b42	kvm: fix schedule in atomic in kvm_steal_time_set_preempted() kvm_steal_time_set_preempted() isn't disabling the pagefaults before calling __copy_to_user and the kernel debug notices. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-12-19 15:45:14 +01:00
Andy Lutomirski	c198b121b1	x86/asm: Rewrite sync_core() to use IRET-to-self Aside from being excessively slow, CPUID is problematic: Linux runs on a handful of CPUs that don't have CPUID. Use IRET-to-self instead. IRET-to-self works everywhere, so it makes testing easy. For reference, On my laptop, IRET-to-self is ~110ns, CPUID(eax=1, ecx=0) is ~83ns on native and very very slow under KVM, and MOV-to-CR2 is ~42ns. While we're at it: sync_core() serves a very specific purpose. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/5c79f0225f68bc8c40335612bf624511abb78941.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:54:21 +01:00
Andy Lutomirski	484d0e5c79	x86/microcode/intel: Replace sync_core() with native_cpuid() The Intel microcode driver is using sync_core() to mean "do CPUID with EAX=1". I want to rework sync_core(), but first the Intel microcode driver needs to stop depending on its current behavior. Reported-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Signed-off-by: Andy Lutomirski <luto@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Juergen Gross <jgross@suse.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/535a025bb91fed1a019c5412b036337ad239e5bb.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:54:21 +01:00
Andy Lutomirski	426d1aff31	Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" This reverts commit `ed68d7e9b9`. The patch wasn't quite correct -- there are non-Intel (and hence non-486) CPUs that we support that don't have CPUID. Since we no longer require CPUID for sync_core(), just revert the patch. I think the relevant CPUs are Geode and Elan, but I'm not sure. In principle, we should try to do better at identifying CPUID-less CPUs in early boot, but that's more complicated. Reported-by: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Matthew Whitehead <tedheadster@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: xen-devel <Xen-devel@lists.xen.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/82acde18a108b8e353180dd6febcc2876df33f24.1481307769.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2016-12-19 11:54:20 +01:00

1 2 3 4 5 ...

129771 Commits