The following commit:
8196dab4fc ("x86/cpu: Get rid of compute_unit_id")
... broke the initial strategy for Bulldozer-based cores' topology,
where we consider each thread of a compute unit a standalone core
and not a HT or SMT thread.
Revert to the firmware-supplied core_id numbering and do not make
them thread siblings as we don't consider them for such even if they
technically are, more or less.
Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org> # v4.6+
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8196dab4fc ("x86/cpu: Get rid of compute_unit_id")
Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The current implementation supports only Intel Merrifield platforms. Don't mess
with the rest of the Intel MID family by not registering device with wrong
properties.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170102092450.87229-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The proper spelling of Anniedale SoC with 'e' in the middle. Fix typo in the
comment line in intel-family.h header.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170102092229.87036-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A negative number can be specified in the cmdline which will be used as
setup_clear_cpu_cap() argument. With that we can clear/set some bit in
memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
to misbehave. This patch adds lower bound check to setup_disablecpuid().
Boris Petkov reproduced a crash:
[ 1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
[ 1.236535] IP: memcpy_erms+0x6/0x10
Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andi.kleen@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: slaoub@gmail.com
Fixes: ac72e7888a ("x86: add generic clearcpuid=... option")
Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull parisc updates from Helge Deller:
- limit usage of processor-internal cr16 clocksource to UP systems only
- segfault info lines in syslog were too long, split those up
- drop own TIF_RESTORE_SIGMASK flag and switch to generic code
* 'parisc-4.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Add line-break when printing segfault info
parisc: Drop TIF_RESTORE_SIGMASK and switch to generic code
parisc: Mark cr16 clocksource unstable on SMP systems
Pull s390 fixes from Martin Schwidefsky:
"Two bug fixes for 4.10-rc3"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kbuild: enable modversions for symbols exported from asm
s390/vtime: correct system time accounting
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYabV5AAoJEMOzHC1eZifkNkoP+QHJFQfHUknblNLJ2uA9qGQB
7q0d79XA4b6R+keaJ8Qw/Vywiq2KtfUU9R2Fa/7kAlexx51Bu65i233v4mTb2K3d
rm2o/BgCVFL1ELrCKp9z2ClmUE0CDwtt84vKahare1efywkxnmzg9gY/DYGDHwMk
dB/i133CDos09WMdSQuxm9pm8Xp/FJjR8LGrvw05Pp0cxVQEIzh5WaJ4BrsJf0uh
T8xeyERchphID1uEMzDBLXV+PI29ad4inr2+PzRqHhqDj3GfUIUZLPJ/8AZ4/AVP
qsKge6T8A5s5O7YRRVwpwzXUSqGlafLHpFXIyOeb9utJzEnxyQ9/r+DyyDJitH+H
yGoj5MD34e4CLlyLJjON1xmb0Px6tt7vl4JJprdgavVT8fUQsh5mqTYCpmvpyXYc
b+jnx68POFpX/ZnpOMXi3W+6mbzWxAmPHo2TNoIYWd6K2X2geubbY+/T/EBDdsA6
H9C2ZODsKlaPFMIDzhBAiY8wU6/TQ7kd1/8YZK1uMWC0rym6EDSSCqQie1gUMt4P
D1ngamTDklgLmrBqP7C5n+6ASRxie9VU0ZrM5UBIyI/XhU4t9zxtXM0MOAKLGpB9
CLTkxN3T04yyw0SbFANsunS3DbQ3/IUs+oZDCMzDxKfJpoWpcGI4YcLF/RjaOUN9
XxsDL3pBVVzGrnzUQw4R
=iHr1
-----END PGP SIGNATURE-----
Merge tag 'openrisc-for-linus' of git://github.com/openrisc/linux
Pull Openrisc fix from Stafford Horne:
"There was nothing much interesting here except a build fix pointed out
by the test robots. Highlight:
- Defined _text symbol to fix build error"
* tag 'openrisc-for-linus' of git://github.com/openrisc/linux:
openrisc: Add _text symbol to fix ksym build error
The build robot reports:
.tmp_kallsyms1.o: In function `kallsyms_relative_base':
>> (.rodata+0x8a18): undefined reference to `_text'
This is when using 'make alldefconfig'. Adding this _text symbol to mark
the start of the kernel as in other architecture fixes this.
Signed-off-by: Stafford Horne <shorne@gmail.com>
Acked-by: Jonas Bonn <jonas@southpole.se>
Commit 7e7814180b ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code")
introduced code with which the "restore sigmask" flag lives in task_struct
instead of ti->flags. Let's use this optimization on parisc too.
Signed-off-by: Helge Deller <deller@gmx.de>
The cr16 interval timer of each CPU is not syncronized to other cr16
timers in other CPUs in a SMP system. So, delay the registration of the
cr16 clocksource until all CPUs have been detected and then - if we are
on a SMP machine - mark the cr16 clocksource as unstable and lower it's
rating before registering it at the clocksource framework.
This patch fixes the stalled CPU warnings which we have seen since
introduction of the cr16 clocksource.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v4.8+
In commit 6290602709 ("mm: add PageWaiters indicating tasks are
waiting for a page bit") Nick Piggin made our page locking no longer
unconditionally touch the hashed page waitqueue, which not only helps
performance in general, but is particularly helpful on NUMA machines
where the hashed wait queues can bounce around a lot.
However, the "clear lock bit atomically and then test the waiters bit"
sequence turns out to be much more expensive than it needs to be,
because you get a nasty stall when trying to access the same word that
just got updated atomically.
On architectures where locking is done with LL/SC, this would be trivial
to fix with a new primitive that clears one bit and tests another
atomically, but that ends up not working on x86, where the only atomic
operations that return the result end up being cmpxchg and xadd. The
atomic bit operations return the old value of the same bit we changed,
not the value of an unrelated bit.
On x86, we could put the lock bit in the high bit of the byte, and use
"xadd" with that bit (where the overflow ends up not touching other
bits), and look at the other bits of the result. However, an even
simpler model is to just use a regular atomic "and" to clear the lock
bit, and then the sign bit in eflags will indicate the resulting state
of the unrelated bit #7.
So by moving the PageWaiters bit up to bit #7, we can atomically clear
the lock bit and test the waiters bit on x86 too. And architectures
with LL/SC (which is all the usual RISC suspects), the particular bit
doesn't matter, so they are fine with this approach too.
This avoids the extra access to the same atomic word, and thus avoids
the costly stall at page unlock time.
The only downside is that the interface ends up being a bit odd and
specialized: clear a bit in a byte, and test the sign bit. Nick doesn't
love the resulting name of the new primitive, but I'd rather make the
name be descriptive and very clear about the limitation imposed by
trying to work across all relevant architectures than make it be some
generic thing that doesn't make the odd semantics explicit.
So this introduces the new architecture primitive
clear_bit_unlock_is_negative_byte();
and adds the trivial implementation for x86. We have a generic
non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)"
combination) which can be overridden by any architecture that can do
better. According to Nick, Power has the same hickup x86 has, for
example, but some other architectures may not even care.
All these optimizations mean that my page locking stress-test (which is
just executing a lot of small short-lived shell scripts: "make test" in
the git source tree) no longer makes our page locking look horribly bad.
Before all these optimizations, just the unlock_page() costs were just
over 3% of all CPU overhead on "make test". After this, it's down to
0.66%, so just a quarter of the cost it used to be.
(The difference on NUMA is bigger, but there this micro-optimization is
likely less noticeable, since the big issue on NUMA was not the accesses
to 'struct page', but the waitqueue accesses that were already removed
by Nick's earlier commit).
Acked-by: Nick Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Bob Peterson <rpeterso@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If mce_device_init() fails then the mce device pointer is NULL and the
AMD mce code happily dereferences it.
Add a sanity check.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
This problem is evident after commit 989cea5c14 ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit 9994a33865 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").
The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.
Thanks to Nicholas Piggin for suggesting this solution.
Fixes: 9994a33865 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S")
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
timers/timekeeping.
- Get rid of cycles_t and use a plain u64. The type is not really
helpful and caused more confusion than clarity
- Get rid of the ktime union. The union has become useless as we use
the scalar nanoseconds storage unconditionally now. The 32bit
timespec alike storage got removed due to the Y2038 limitations
some time ago.
That leaves the odd union access around for no reason. Clean it up.
Both changes have been done with coccinelle and a small amount of
manual mopping up"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Get rid of ktime_equal()
ktime: Cleanup ktime_set() usage
ktime: Get rid of the union
clocksource: Use a plain u64 instead of cycle_t
Pull SMP hotplug notifier removal from Thomas Gleixner:
"This is the final cleanup of the hotplug notifier infrastructure. The
series has been reintgrated in the last two days because there came a
new driver using the old infrastructure via the SCSI tree.
Summary:
- convert the last leftover drivers utilizing notifiers
- fixup for a completely broken hotplug user
- prevent setup of already used states
- removal of the notifiers
- treewide cleanup of hotplug state names
- consolidation of state space
There is a sphinx based documentation pending, but that needs review
from the documentation folks"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-xp: Consolidate hotplug state space
irqchip/gic: Consolidate hotplug state space
coresight/etm3/4x: Consolidate hotplug state space
cpu/hotplug: Cleanup state names
cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
staging/lustre/libcfs: Convert to hotplug state machine
scsi/bnx2i: Convert to hotplug state machine
scsi/bnx2fc: Convert to hotplug state machine
cpu/hotplug: Prevent overwriting of callbacks
x86/msr: Remove bogus cleanup from the error path
bus: arm-ccn: Prevent hotplug callback leak
perf/x86/intel/cstate: Prevent hotplug callback leak
ARM/imx/mmcd: Fix broken cpu hotplug handling
scsi: qedi: Convert to hotplug state machine
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
There is no point in having an extra type for extra confusion. u64 is
unambiguous.
Conversion was done with the following coccinelle script:
@rem@
@@
-typedef u64 cycle_t;
@fix@
typedef cycle_t;
@@
-cycle_t
+u64
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.
Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The error cleanup which is invoked when the hotplug state setup failed
tries to remove the failed state, which is broken.
Fixes: 8fba38c937 ("x86/msr: Convert to hotplug state machine")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
If the pmu registration fails the registered hotplug callbacks are not
removed. Wrong in any case, but fatal in case of a modular driver.
Replace the nonsensical state names with proper ones while at it.
Fixes: 77c34ef1c3 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
The cpu hotplug support of this perf driver is broken in several ways:
1) It adds a instance before setting up the state.
2) The state for the instance is different from the state of the
callback. It's just a randomly chosen state.
3) The instance registration is not error checked so nobody noticed that
the call can never succeed.
4) The state for the multi install callbacks is chosen randomly and
overwrites existing state. This is now prevented by the core code so the
call is guaranteed to fail.
5) The error exit path in the init function leaves the instance registered
and then frees the memory which contains the enqueued hlist node.
6) The remove function is removing the state and not the instance.
Fix it by:
- Setting up the state before adding instances. Use a dynamically allocated
state for it.
- Installing instances after the state has been set up
- Removing the instance in the error path before freeing memory
- Removing the instance not the state in the driver remove callback
While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
be used in preemptible context, and set the driver data after successful
registration of the pmu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Shawn Guo <shawnguo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Frank Li <frank.li@nxp.com>
Cc: Zhengyu Shen <zhengyu.shen@nxp.com>
Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Ingo Molnar:
"There's a number of fixes:
- a round of fixes for CPUID-less legacy CPUs
- a number of microcode loader fixes
- i8042 detection robustization fixes
- stack dump/unwinder fixes
- x86 SoC platform driver fixes
- a GCC 7 warning fix
- virtualization related fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
Revert "x86/unwind: Detect bad stack return address"
x86/paravirt: Mark unused patch_default label
x86/microcode/AMD: Reload proper initrd start address
x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
x86/platform/intel-mid: Switch MPU3050 driver to IIO
x86/alternatives: Do not use sync_core() to serialize I$
x86/topology: Document cpu_llc_id
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
x86/asm: Rewrite sync_core() to use IRET-to-self
x86/microcode/intel: Replace sync_core() with native_cpuid()
Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
x86/tools: Fix gcc-7 warning in relocs.c
x86/unwind: Dump stack data on warnings
x86/unwind: Adjust last frame check for aligned function stacks
x86/init: Fix a couple of comment typos
x86/init: Remove i8042_detect() from platform ops
Input: i8042 - Trust firmware a bit more when probing on X86
x86/init: Add i8042 state to the platform data
...
Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...
Revert the following commit:
b6959a3621 ("x86/unwind: Detect bad stack return address")
... because Andrey Konovalov reported an unwinder warning:
WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467
The unwind was initiated from an interrupt which occurred while running in the
generated code for a kprobe. The unwinder printed the warning because it
expected regs->ip to point to a valid text address, but instead it pointed to
the generated code.
Eventually we may want come up with a way to identify generated kprobe
code so the unwinder can know that it's a valid return address. Until
then, just remove the warning.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Move some Linux-specific functionality to upstream ACPICA and
update the in-kernel users of it accordingly (Lv Zheng).
- Drop a useless warning (triggered by the lack of an optional
object) from the ACPI namespace scanning code (Zhang Rui).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYW90kAAoJEILEb/54YlRxS78P/RZwSOwkgdowIMGOlO7Hz5Xz
6Psjfc3QdGhVGOau4F3Irg7HIDsPaHE5+xijwujWFK0wh89tknrChrFDrOueND6K
rZ4w72lYROQhtQl4bu9yXZ2TswP3I5aYERDPyqKfDAi+SaQBpZUtmWdoh0I/JN2M
svyuzxoRlNglgp2GWPnnnKHFe6plEeZgKjgoGjPAufDHakqYvP0cQwY3i4+PpuYh
prXb4Xr4fFvAG2MhM9ciT02ewrQJcYUgVj5bnuSBmMu0y0zUBJ1kK7abiQlf+lTy
/BCqQbllhHrrXQD0zkqi2oBFqWL4Bnj9r+5zj03KmBsfoz3u+zXr5CApDei8Yc06
gGIZXX9aCqBirahkpsMFYPEQVPoPFCek0slXSyF5xKjCWYyocwgUtHnB87uIiSFO
jCeSsWwT5IO9HnbsTf6x4xDWBdY6LOJgJyDirIGcNSUX+q4tfgn/LFrgN9Ck4M2g
xkZn8e8H7u09ACX9l6dHZ1PCHlg7bWKeH6gcqdo5R4NOVeZxt9YDIz13ERsG3D17
JZvdrAbBdC1fXfHnOLBj/BRy+TFvieWOC+kHBTAnPQlnkr7NjXXv/vgWU8flVL/z
kxlR2U2lD2XarG/b8OHvYxO2k/TDLRenN0IqqFmYbmyhH0dHYKGGBLY4TetAXKNL
N4EouS+xUBulCP2XOI8N
=55TG
-----END PGP SIGNATURE-----
Merge tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"Here are new versions of two ACPICA changes that were deferred
previously due to a problem they had introduced, two cleanups on top
of them and the removal of a useless warning message from the ACPI
core.
Specifics:
- Move some Linux-specific functionality to upstream ACPICA and
update the in-kernel users of it accordingly (Lv Zheng)
- Drop a useless warning (triggered by the lack of an optional
object) from the ACPI namespace scanning code (Zhang Rui)"
* tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
ACPICA: Tables: Allow FADT to be customized with virtual address
ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel
ACPI: do not warn if _BQC does not exist
Pull x86 cache allocation interface from Thomas Gleixner:
"This provides support for Intel's Cache Allocation Technology, a cache
partitioning mechanism.
The interface is odd, but the hardware interface of that CAT stuff is
odd as well.
We tried hard to come up with an abstraction, but that only allows
rather simple partitioning, but no way of sharing and dealing with the
per package nature of this mechanism.
In the end we decided to expose the allocation bitmaps directly so all
combinations of the hardware can be utilized.
There are two ways of associating a cache partition:
- Task
A task can be added to a resource group. It uses the cache
partition associated to the group.
- CPU
All tasks which are not member of a resource group use the group to
which the CPU they are running on is associated with.
That allows for simple CPU based partitioning schemes.
The main expected user sare:
- Virtualization so a VM can only trash only the associated part of
the cash w/o disturbing others
- Real-Time systems to seperate RT and general workloads.
- Latency sensitive enterprise workloads
- In theory this also can be used to protect against cache side
channel attacks"
[ Intel RDT is "Resource Director Technology". The interface really is
rather odd and very specific, which delayed this pull request while I
was thinking about it. The pull request itself came in early during
the merge window, I just delayed it until things had calmed down and I
had more time.
But people tell me they'll use this, and the good news is that it is
_so_ specific that it's rather independent of anything else, and no
user is going to depend on the interface since it's pretty rare. So if
push comes to shove, we can just remove the interface and nothing will
break ]
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/intel_rdt: Implement show_options() for resctrlfs
x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled
x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount
x86/intel_rdt: Fix setting of closid when adding CPUs to a group
x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee
x86/intel_rdt: Reset per cpu closids on unmount
x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A
x86/intel_rdt: Prevent deadlock against hotplug lock
x86/intel_rdt: Protect info directory from removal
x86/intel_rdt: Add info files to Documentation
x86/intel_rdt: Export the minimum number of set mask bits in sysfs
x86/intel_rdt: Propagate error in rdt_mount() properly
x86/intel_rdt: Add a missing #include
MAINTAINERS: Add maintainer for Intel RDT resource allocation
x86/intel_rdt: Add scheduler hook
x86/intel_rdt: Add schemata file
x86/intel_rdt: Add tasks files
x86/intel_rdt: Add cpus file
x86/intel_rdt: Add mkdir to resctrl file system
x86/intel_rdt: Add "info" files to resctrl file system
...
Jiri reported the overlap scheduling exceeding its max stack.
Looking at the constraint that triggered this, it turns out the
overlap marker isn't needed.
The comment with EVENT_CONSTRAINT_OVERLAP states: "This is the case if
the counter mask of such an event is not a subset of any other counter
mask of a constraint with an equal or higher weight".
Esp. that latter part is of interest here I think, our overlapping mask
is 0x0e, that has 3 bits set and is the highest weight mask in on the
PMU, therefore it will be placed last. Can we still create a scenario
where we would need to rewind that?
The scenario for AMD Fam15h is we're having masks like:
0x3F -- 111111
0x38 -- 111000
0x07 -- 000111
0x09 -- 001001
And we mark 0x09 as overlapping, because it is not a direct subset of
0x38 or 0x07 and has less weight than either of those. This means we'll
first try and place the 0x09 event, then try and place 0x38/0x07 events.
Now imagine we have:
3 * 0x07 + 0x09
and the initial pick for the 0x09 event is counter 0, then we'll fail to
place all 0x07 events. So we'll pop back, try counter 4 for the 0x09
event, and then re-try all 0x07 events, which will now work.
The masks on the PMU in question are:
0x01 - 0001
0x03 - 0011
0x0e - 1110
0x0c - 1100
But since all the masks that have overlap (0xe -> {0xc,0x3}) and (0x3 ->
0x1) are of heavier weight, it should all work out.
Reported-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20161109155153.GQ3142@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch solves a race condition between PEBS and the PMU handler.
In case multiple PEBS events are sampled at the same time,
it is possible to have GLOBAL_STATUS bit 62 set indicating
PEBS buffer overflow and also seeing at most 3 PEBS counters
having their bits set in the status register. This is a sign
that there was at least one PEBS record pending at the time
of the PMU interrupt. PEBS counters must only be processed
via the drain_pebs() calls, and not via the regular sample
processing loop coming after that the function, otherwise
phony regular samples may be generated in the sampling buffer
not marked with the EXACT tag.
Another possibility is to have one PEBS event and at least
one non-PEBS event whic hoverflows while PEBS has armed. In this
case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
status bit for the PEBS counter will be on Skylake.
To avoid this problem, we systematically ignore the PEBS-enabled
counters from the GLOBAL_STATUS mask and we always process PEBS
events via drain_pebs().
The problem manifested itself by having non-exact samples when
sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
not have the EXACT flag set.
Note that this problem is only present on Skylake processor.
This fix is harmless on older processors.
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A bugfix commit:
45dbea5f55 ("x86/paravirt: Fix native_patch()")
... introduced a harmless warning:
arch/x86/kernel/paravirt_patch_32.c: In function 'native_patch':
arch/x86/kernel/paravirt_patch_32.c:71:1: error: label 'patch_default' defined but not used [-Werror=unused-label]
Fix it by annotating the label as __maybe_unused.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Piotr Gregor <piotrgregor@rsyncme.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 45dbea5f55 ("x86/paravirt: Fix native_patch()")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* acpica:
ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory()
ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users
ACPICA: Tables: Allow FADT to be customized with virtual address
ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel
* acpi-scan:
ACPI: do not warn if _BQC does not exist
Pull parisc updates from Helge Deller:
- add Kernel address space layout randomization support
- re-enable interrupts earlier now that we have a working IRQ stack
- optimize the timer interrupt function to better cope with missed
timer irqs
- fix error return code in parisc perf code (by Dan Carpenter)
- fix PAT debug code
* 'parisc-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Optimize timer interrupt function
parisc: perf: return -EFAULT on error
parisc: Enhance CPU detection code on PAT machines
parisc: Re-enable interrupts early
parisc: Enable KASLR
When we switch to virtual addresses and, especially after
reserve_initrd()->relocate_initrd() have run, we have the updated initrd
address in initrd_start. Use initrd_start then instead of the address
which has been passed to us through boot params. (That still gets used
when we're running the very early routines on the BSP).
Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/20161220144012.lc4cwrg6dphqbyqu@pd.tnic
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since all users are cleaned up, remove the 2 deprecated APIs due to no
users.
As a Linux variable rather than an ACPICA variable, acpi_gbl_permanent_mmap
is renamed to acpi_permanent_mmap to have a consistent coding style across
entire Linux ACPI subsystem.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch removes the users of the deprectated APIs:
acpi_get_table_with_size()
early_acpi_os_unmap_memory()
The following APIs should be used instead of:
acpi_get_table()
acpi_put_table()
The deprecated APIs are invented to be a replacement of acpi_get_table()
during the early stage so that the early mapped pointer will not be stored
in ACPICA core and thus the late stage acpi_get_table() won't return a
wrong pointer. The mapping size is returned just because it is required by
early_acpi_os_unmap_memory() to unmap the pointer during early stage.
But as the mapping size equals to the acpi_table_header.length
(see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when
such a convenient result is returned, driver code will start to use it
instead of accessing acpi_table_header to obtain the length.
Thus this patch cleans up the drivers by replacing returned table size with
acpi_table_header.length, and should be a no-op.
Reported-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull networking fixes and cleanups from David Miller:
1) Use rb_entry() instead of hardcoded container_of(), from Geliang
Tang.
2) Use correct memory barriers in stammac driver, from Pavel Machek.
3) Fix assoc bind address handling in SCTP, from Xin Long.
4) Make the length check for UFO handling consistent between
__ip_append_data() and ip_finish_output(), from Zheng Li.
5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
Li.
6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
Yadav.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
RDS: use rb_entry()
net_sched: sch_netem: use rb_entry()
net_sched: sch_fq: use rb_entry()
net/mlx5: use rb_entry()
ethernet: sfc: Add Kconfig entry for vendor Solarflare
sctp: not copying duplicate addrs to the assoc's bind address list
sctp: reduce indent level in sctp_copy_local_addr_list
ARM: dts: hix5hd2: don't change the existing compatible string
net: hix5hd2_gmac: fix compatible strings name
openvswitch: Add a missing break statement.
net: netcp: ethss: fix 10gbe host port tx pri map configuration
net: netcp: ethss: fix errors in ethtool ops
fsl/fman: enable compilation on ARM64
fsl/fman: A007273 only applies to PPC SoCs
powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
fsl/fman: fix 1G support for QSGMII interfaces
dt: bindings: net: use boolean dt properties for eee broken modes
net: phy: use boolean dt properties for eee broken modes
net: phy: fix sign type error in genphy_config_eee_advert
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
...
Merge final set of updates from Andrew Morton:
- a series to make IMA play better across kexec
- a handful of random fixes
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
ratelimit: fix WARN_ON_RATELIMIT return value
kcov: make kcov work properly with KASLR enabled
arm64: setup: introduce kaslr_offset()
mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
ima: platform-independent hash value
ima: define a canonical binary_runtime_measurements list format
ima: support restoring multiple template formats
ima: store the builtin/custom template definitions in a list
ima: on soft reboot, save the measurement list
powerpc: ima: send the kexec buffer to the next kernel
ima: maintain memory size needed for serializing the measurement list
ima: permit duplicate measurement list entries
ima: on soft reboot, restore the measurement list
powerpc: ima: get the kexec buffer passed by the previous kernel
- Wire-up new syscalls
- Add new codes and fpga families
- Fix return value
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlhZKicACgkQykllyylKDCFe5wCfYFkwbBhSO34IykNdK5m2Iipc
u4EAn3+7AAp4k4nTf92vtQI6VCWw6245
=5PYH
-----END PGP SIGNATURE-----
Merge tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze
Pull arch/microblaze updates from Michal Simek:
- wire-up new syscalls
- add new codes and fpga families
- fix a return value
* tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Add new fpga families
microblaze: Add missing release version code v9.6 and v10
microblaze: Add missing syscalls
microblaze: Fix return value from xilinx_timer_init
Restructure the timer interrupt function to better cope with missed timer irqs.
Optimize the calculation when the next interrupt should happen and skip irqs if
they would happen too shortly after exit of the irq function.
The update_process_times() call is done anyway at every timer irq, so we can
safely drop the prof_counter and prof_multiplier variables from the per_cpu
structure.
Signed-off-by: Helge Deller <deller@gmx.de>
The SoC hix5hd2 compatible string has the suffix "-gmac" and
we should not change it.
We should only add the generic compatible string "hisi-gmac-v1".
Fixes: 0855950ba5 ("ARM: dts: hix5hd2: add gmac generic compatible and clock names")
Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fsl/fman drivers will use of_platform_populate() on all
supported platforms. Call of_platform_populate() to probe the
FMan sub-nodes.
Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
This is the architecture-specific part of setting up the IMA kexec
buffer for the next kernel. It will be used in the next patch.
Link: http://lkml.kernel.org/r/1480554346-29071-6-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "ima: carry the measurement list across kexec", v8.
The TPM PCRs are only reset on a hard reboot. In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement
list of the running kernel must be saved and then restored on the
subsequent boot, possibly of a different architecture.
The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.
Up to now, the binary_runtime_measurements was defined as architecture
native format. The assumption being that userspace could and would
handle any architecture conversions. With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash. To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.
The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).
A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included in
this patch set. The simplified method requires all file measurements be
taken prior to executing the kexec load, as subsequent measurements will
not be carried across the kexec and restored.
This patch (of 10):
The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
The second kernel can check whether the previous kernel sent the buffer
and retrieve it.
This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel. It will be used in the
next patch.
The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.
Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s390 version of commit 334bb77387 ("x86/kbuild: enable modversions
for symbols exported from asm") so we get also rid of all these
warnings:
WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "memcpy" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "memmove" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "memset" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "save_fpu_regs" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "sie64a" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: EXPORT symbol "sie_exit" [vmlinux] version generation failed, symbol will not be versioned.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is a slight misaccounting of system time in vtime_account_user.
This function is called once per HZ tick in interrupt context.
The irq_enter function already accounted the system time up to the
point of the irq_enter call. The system time from irq_enter until
vtime_account_user/do_account_vtime is reached is irq time but it
is accounted to the previous context.
Just drop the hardirq offset from arch/s390/kernel/vtime.c.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>