Before the introduction of XSAVES supervisor states, 'xfeatures_mask' is
used at various places to determine XSAVE buffer components and XCR0 bits.
It contains only user xstates. To support supervisor xstates, it is
necessary to separate user and supervisor xstates:
- First, change 'xfeatures_mask' to 'xfeatures_mask_all', which represents
the full set of bits that should ever be set in a kernel XSAVE buffer.
- Introduce xfeatures_mask_supervisor() and xfeatures_mask_user() to
extract relevant xfeatures from xfeatures_mask_all.
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-4-yu-cheng.yu@intel.com
XCNTXT_MASK is 'all supported xfeatures' before introducing supervisor
xstates. Rename it to XFEATURE_MASK_USER_SUPPORTED to make clear that
these are user xstates.
Replace XFEATURE_MASK_SUPERVISOR with the following:
- XFEATURE_MASK_SUPERVISOR_SUPPORTED: Currently nothing. ENQCMD and
Control-flow Enforcement Technology (CET) will be introduced in separate
series.
- XFEATURE_MASK_SUPERVISOR_UNSUPPORTED: Currently only Processor Trace.
- XFEATURE_MASK_SUPERVISOR_ALL: the combination of above.
Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Link: https://lkml.kernel.org/r/20200512145444.15483-3-yu-cheng.yu@intel.com
The function validate_xstate_header() validates an xstate header coming
from userspace (PTRACE or sigreturn). To make it clear, rename it to
validate_user_xstate_header().
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200512145444.15483-2-yu-cheng.yu@intel.com
- Ensure that direct mapping alias is always flushed when changing page
attributes. The optimization for small ranges failed to do so when
the virtual address was in the vmalloc or module space.
- Unbreak the trace event registration for syscalls without arguments
caused by the refactoring of the SYSCALL_DEFINE0() macro.
- Move the printk in the TSC deadline timer code to a place where it is
guaranteed to only be called once during boot and cannot be rearmed by
clearing warn_once after boot. If it's invoked post boot then lockdep
rightfully complains about a potential deadlock as the calling context
is different.
- A series of fixes for objtool and the ORC unwinder addressing variety
of small issues:
Stack offset tracking for indirect CFAs in objtool ignored subsequent
pushs and pops
Repair the unwind hints in the register clearing entry ASM code
Make the unwinding in the low level exit to usermode code stop after
switching to the trampoline stack. The unwind hint is not longer valid
and the ORC unwinder emits a warning as it can't find the registers
anymore.
Fix the unwind hints in switch_to_asm() and rewind_stack_do_exit()
which caused objtool to generate bogus ORC data.
Prevent unwinder warnings when dumping the stack of a non-current
task as there is no way to be sure about the validity because the
dumped stack can be a moving target.
Make the ORC unwinder behave the same way as the frame pointer
unwinder when dumping an inactive tasks stack and do not skip the
first frame.
Prevent ORC unwinding before ORC data has been initialized
Immediately terminate unwinding when a unknown ORC entry type is
found.
Prevent premature stop of the unwinder caused by IRET frames.
Fix another infinite loop in objtool caused by a negative offset which
was not catched.
Address a few build warnings in the ORC unwinder and add missing
static/ro_after_init annotations
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6363QTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRJHD/4hWjzJLsUZ9xq2NrzhevoeJtxj+wVM
66x9NM3mlFQ30BN4Aye4EnNEhR0iIvNPWWdfEmaJYfPHPwnUjjcOa426HYxP/WXA
DWd5F20wGaaPOJ65LJpy/+pfcxAeQynt4I2cDEWHAplswfOWV/Hv8mSeKAKuq400
lCWaTMkWcO/toexSNn8PVyWi9rHlm+76E1bHkVwuoekGBGt1VloKGlK6OPyElzL2
w9VtrjSLlYQ0MdfCJKQeg44XQPMbf4hZRfc88x9SwDWB01q7aSvb0pWNl9AJKNXA
7fFu5T4F4PABPgRM7eJ5yNk0De9jM1y+6eCp66f9UXoNOeSr7Boz9Xc4xWqAraIi
9Dtx3WliO9CAxwUiD+Cj2iJO5o83AdRK/xhCth2VRnYMS6imfSidEqTC+LhEtkzw
Yplu7sbrWQDa5JTh8vk60clDvbkU+pfdxJisY+KClRguWfQfR6MJNuQnE0NHr7cH
H4VXFFHEE6tDdJneQ9RxA4iF20RTgSlJGK0YlsH6QsxPsRgoHVkGUao8fQhrNvRc
MIdpm9YasWStjJ7ZXbDeStmnLFN3DCj1RC8wmvJ4i/R1sPnBvPvRUt4Lm988a951
Vyr23VIcVrE7zykiqQZVH7bvIv6ULORqTJbIOF1rO/aIut4W8z0ojoVXC0Z7CiwF
S5SGj+hlWciIew==
=0rCi
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of fixes for x86:
- Ensure that direct mapping alias is always flushed when changing
page attributes. The optimization for small ranges failed to do so
when the virtual address was in the vmalloc or module space.
- Unbreak the trace event registration for syscalls without arguments
caused by the refactoring of the SYSCALL_DEFINE0() macro.
- Move the printk in the TSC deadline timer code to a place where it
is guaranteed to only be called once during boot and cannot be
rearmed by clearing warn_once after boot. If it's invoked post boot
then lockdep rightfully complains about a potential deadlock as the
calling context is different.
- A series of fixes for objtool and the ORC unwinder addressing
variety of small issues:
- Stack offset tracking for indirect CFAs in objtool ignored
subsequent pushs and pops
- Repair the unwind hints in the register clearing entry ASM code
- Make the unwinding in the low level exit to usermode code stop
after switching to the trampoline stack. The unwind hint is no
longer valid and the ORC unwinder emits a warning as it can't
find the registers anymore.
- Fix unwind hints in switch_to_asm() and rewind_stack_do_exit()
which caused objtool to generate bogus ORC data.
- Prevent unwinder warnings when dumping the stack of a
non-current task as there is no way to be sure about the
validity because the dumped stack can be a moving target.
- Make the ORC unwinder behave the same way as the frame pointer
unwinder when dumping an inactive tasks stack and do not skip
the first frame.
- Prevent ORC unwinding before ORC data has been initialized
- Immediately terminate unwinding when a unknown ORC entry type
is found.
- Prevent premature stop of the unwinder caused by IRET frames.
- Fix another infinite loop in objtool caused by a negative
offset which was not catched.
- Address a few build warnings in the ORC unwinder and add
missing static/ro_after_init annotations"
* tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES
x86/apic: Move TSC deadline timer debug printk
ftrace/x86: Fix trace event registration for syscalls without arguments
x86/mm/cpa: Flush direct map alias during cpa
objtool: Fix infinite loop in for_offset_range()
x86/unwind/orc: Fix premature unwind stoppage due to IRET frames
x86/unwind/orc: Fix error path for bad ORC entry type
x86/unwind/orc: Prevent unwinding before ORC initialization
x86/unwind/orc: Don't skip the first frame for inactive tasks
x86/unwind: Prevent false warnings for non-current tasks
x86/unwind/orc: Convert global variables to static
x86/entry/64: Fix unwind hints in rewind_stack_do_exit()
x86/entry/64: Fix unwind hints in __switch_to_asm()
x86/entry/64: Fix unwind hints in kernel exit path
x86/entry/64: Fix unwind hints in register clearing code
objtool: Fix stack offset tracking for indirect CFAs
Fix the following warnings seen with !CONFIG_MODULES:
arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable]
29 | static struct orc_entry *cur_orc_table = __start_orc_unwind;
| ^~~~~~~~~~~~~
arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable]
28 | static int *cur_orc_ip_table = __start_orc_unwind_ip;
| ^~~~~~~~~~~~~~~~
Fixes: 153eb2223c ("x86/unwind/orc: Convert global variables to static")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linux Next Mailing List <linux-next@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a
lockdep splat due to a lock order violation between hrtimer_base::lock and
console_sem, when the 'once' condition is reset via
/sys/kernel/debug/clear_warn_once after boot.
The initial printk cannot trigger this because that happens during boot
when the local APIC timer is set up on the boot CPU.
Prevent it by moving the printk to a place which is guaranteed to be only
called once during boot.
Mark the deadline timer check related functions and data __init while at
it.
Reported-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
The following execution path is possible:
fsnotify()
[ realign the stack and store previous SP in R10 ]
<IRQ>
[ only IRET regs saved ]
common_interrupt()
interrupt_entry()
<NMI>
[ full pt_regs saved ]
...
[ unwind stack ]
When the unwinder goes through the NMI and the IRQ on the stack, and
then sees fsnotify(), it doesn't have access to the value of R10,
because it only has the five IRET registers. So the unwind stops
prematurely.
However, because the interrupt_entry() code is careful not to clobber
R10 before saving the full regs, the unwinder should be able to read R10
from the previously saved full pt_regs associated with the NMI.
Handle this case properly. When encountering an IRET regs frame
immediately after a full pt_regs frame, use the pt_regs as a backup
which can be used to get the C register values.
Also, note that a call frame resets the 'prev_regs' value, because a
function is free to clobber the registers. For this fix to work, the
IRET and full regs frames must be adjacent, with no FUNC frames in
between. So replace the FUNC hint in interrupt_entry() with an
IRET_REGS hint.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
If the ORC entry type is unknown, nothing else can be done other than
reporting an error. Exit the function instead of breaking out of the
switch statement.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
If the unwinder is called before the ORC data has been initialized,
orc_find() returns NULL, and it tries to fall back to using frame
pointers. This can cause some unexpected warnings during boot.
Move the 'orc_init' check from orc_find() to __unwind_init(), so that it
doesn't even try to unwind from an uninitialized state.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
When unwinding an inactive task, the ORC unwinder skips the first frame
by default. If both the 'regs' and 'first_frame' parameters of
unwind_start() are NULL, 'state->sp' and 'first_frame' are later
initialized to the same value for an inactive task. Given there is a
"less than or equal to" comparison used at the end of __unwind_start()
for skipping stack frames, the first frame is skipped.
Drop the equal part of the comparison and make the behavior equivalent
to the frame pointer unwinder.
Fixes: ee9f8fce99 ("x86/unwind: Add the ORC unwinder")
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
There's some daring kernel code out there which dumps the stack of
another task without first making sure the task is inactive. If the
task happens to be running while the unwinder is reading the stack,
unusual unwinder warnings can result.
There's no race-free way for the unwinder to know whether such a warning
is legitimate, so just disable unwinder warnings for all non-current
tasks.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
These variables aren't used outside of unwind_orc.c, make them static.
Also annotate some of them with '__ro_after_init', as applicable.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Jones <dsj@fb.com>
Cc: Jann Horn <jannh@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
Improve readability of the function intel_set_max_freq_ratio() by moving
the check for KNL CPUs there, together with checks for GLM and SKX.
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
The static key arch_scale_freq_key only needs to be enabled once (at
boot). This change fixes a bug by which the key was enabled every time cpu0
is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are
started from the idle thread: setting a static key from there means
acquiring a lock and may result in sleeping in the idle task, causing CPU
lockup.
Another consequence of this change is that init_counter_refs() is now
called on each CPU correctly; previously the function on_each_cpu() was
used, but it was called at boot when the only online cpu is cpu0.
[ggherdovich@suse.cz: Tested and wrote changelog]
Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
rightfully report that the 4C turbo ratio is zero. In such cases, use the
1C turbo ratio instead for frequency invariance calculations.
Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Reported-by: Like Xu <like.xu@linux.intel.com>
Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
Some hypervisors such as VMWare ESXi 5.5 advertise support for
X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
clock frequency of the CPU (highest non-turbo frequency), producing a
division by zero when computing the ratio turbo_freq/base_freq necessary
for frequency invariant accounting.
It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
couldn't be done in principle (not that it would make a lot of sense in a
VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
that feature.
Fixes: 1567c3e346 ("x86, sched: Add support for frequency invariance")
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
objtool:
- Ignore the double UD2 which is emitted in BUG() when CONFIG_UBSAN_TRAP
is enabled.
- Support clang non-section symbols in objtool ORC dump
- Fix switch table detection in .text.unlikely
- Make the BP scratch register warning more robust.
x86:
- Increase microcode maximum patch size for AMD to cope with new CPUs
which have a larger patch size.
- Fix a crash in the resource control filesystem when the removal of the
default resource group is attempted.
- Preserve Code and Data Prioritization enabled state accross CPU
hotplug.
- Update split lock cpu matching to use the new X86_MATCH macros.
- Change the split lock enumeration as Intel finaly decided that the
IA32_CORE_CAPABILITIES bits are not architectural contrary to what
the SDM claims. !@#%$^!
- Add Tremont CPU models to the split lock detection cpu match.
- Add a missing static attribute to make sparse happy.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6cWGsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYod2jD/4kZqz+nEzAvx8RC/7zfLr1S6mDYcLb
kqWEblLRfPofFNO3W/1Ri7xUs2VCyBcOJeG9JIugI8YV/b/5LY9j2nW30unXi84y
8DHLWgM7OG+EiNDMvdQwgnjNb9Pdl4F1e9yTTD6IRg0bHOjvtHVyq9bNg7f3iaED
ZE4X5Hh5u4qFK/jmcsTF5HA/wIjELdmT32F4RxceAlmvpa5SUGlOfVVo1cSZpCbx
XkrvUvEzyZhbzY+Gy1q3SHTt+fvzx1++LsnJD0Dyfe5Q47PA1Iy6Zo2+Epn3FnCu
XuQKLaiDhidpkPzTGULZUsubavXbrSEu5/yhFJHyUqMy5WNOmvXBN8eVC4j1I9Ga
tnt43s3AS8noz4qIb7bpoVgETFtoCfWfqwhtZmALPzrfutwxe2Ujtsi9FUca6HtA
T5dKuNwc8G+Q5ZiNi+rPjcV/QGGncZFwtwwRwUl/YKgQ2VgrTgfsPc431tfSl3Q8
hVQIOhQNHCKqe3uGhiCsI29pNMDXVijZcI8w2SSmxnPyrMRXD7bTfLWnPav7SGFO
aSSi9HWtghkU/MsmRgRcZc9PI5bNs6w5IkfQqfXjd/lJwea2yQg1cn1KdmGi3Q33
BNj9FudNMe4K8ITaNWiLdt5rYCDIvWEzmbwawAhevstbKrjVtrAYgNAjvgJEnXAt
mZwTu+Hpd6d+JA==
=raUm
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 and objtool fixes from Thomas Gleixner:
"A set of fixes for x86 and objtool:
objtool:
- Ignore the double UD2 which is emitted in BUG() when
CONFIG_UBSAN_TRAP is enabled.
- Support clang non-section symbols in objtool ORC dump
- Fix switch table detection in .text.unlikely
- Make the BP scratch register warning more robust.
x86:
- Increase microcode maximum patch size for AMD to cope with new CPUs
which have a larger patch size.
- Fix a crash in the resource control filesystem when the removal of
the default resource group is attempted.
- Preserve Code and Data Prioritization enabled state accross CPU
hotplug.
- Update split lock cpu matching to use the new X86_MATCH macros.
- Change the split lock enumeration as Intel finaly decided that the
IA32_CORE_CAPABILITIES bits are not architectural contrary to what
the SDM claims. !@#%$^!
- Add Tremont CPU models to the split lock detection cpu match.
- Add a missing static attribute to make sparse happy"
* tag 'x86-urgent-2020-04-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Add Tremont family CPU models
x86/split_lock: Bits in IA32_CORE_CAPABILITIES are not architectural
x86/resctrl: Preserve CDP enable over CPU hotplug
x86/resctrl: Fix invalid attempt at removing the default resource group
x86/split_lock: Update to use X86_MATCH_INTEL_FAM6_MODEL()
x86/umip: Make umip_insns static
x86/microcode/AMD: Increase microcode PATCH_MAX_SIZE
objtool: Make BP scratch register warning more robust
objtool: Fix switch table detection in .text.unlikely
objtool: Support Clang non-section symbols in ORC generation
objtool: Support Clang non-section symbols in ORC dump
objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings
Tremont CPUs support IA32_CORE_CAPABILITIES bits to indicate whether
specific SKUs have support for split lock detection.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200416205754.21177-4-tony.luck@intel.com
The Intel Software Developers' Manual erroneously listed bit 5 of the
IA32_CORE_CAPABILITIES register as an architectural feature. It is not.
Features enumerated by IA32_CORE_CAPABILITIES are model specific and
implementation details may vary in different cpu models. Thus it is only
safe to trust features after checking the CPU model.
Icelake client and server models are known to implement the split lock
detect feature even though they don't enumerate IA32_CORE_CAPABILITIES
[ tglx: Use switch() for readability and massage comments ]
Fixes: 6650cdd9a8 ("x86/split_lock: Enable split lock detection by kernel")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200416205754.21177-3-tony.luck@intel.com
Resctrl assumes that all CPUs are online when the filesystem is mounted,
and that CPUs remember their CDP-enabled state over CPU hotplug.
This goes wrong when resctrl's CDP-enabled state changes while all the
CPUs in a domain are offline.
When a domain comes online, enable (or disable!) CDP to match resctrl's
current setting.
Fixes: 5ff193fbde ("x86/intel_rdt: Add basic resctrl filesystem support")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200221162105.154163-1-james.morse@arm.com
The default resource group ("rdtgroup_default") is associated with the
root of the resctrl filesystem and should never be removed. New resource
groups can be created as subdirectories of the resctrl filesystem and
they can be removed from user space.
There exists a safeguard in the directory removal code
(rdtgroup_rmdir()) that ensures that only subdirectories can be removed
by testing that the directory to be removed has to be a child of the
root directory.
A possible deadlock was recently fixed with
334b0f4e9b ("x86/resctrl: Fix a deadlock due to inaccurate reference").
This fix involved associating the private data of the "mon_groups"
and "mon_data" directories to the resource group to which they belong
instead of NULL as before. A consequence of this change was that
the original safeguard code preventing removal of "mon_groups" and
"mon_data" found in the root directory failed resulting in attempts to
remove the default resource group that ends in a BUG:
kernel BUG at mm/slub.c:3969!
invalid opcode: 0000 [#1] SMP PTI
Call Trace:
rdtgroup_rmdir+0x16b/0x2c0
kernfs_iop_rmdir+0x5c/0x90
vfs_rmdir+0x7a/0x160
do_rmdir+0x17d/0x1e0
do_syscall_64+0x55/0x1d0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fix this by improving the directory removal safeguard to ensure that
subdirectories of the resctrl root directory can only be removed if they
are a child of the resctrl filesystem's root _and_ not associated with
the default resource group.
Fixes: 334b0f4e9b ("x86/resctrl: Fix a deadlock due to inaccurate reference")
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/884cbe1773496b5dbec1b6bd11bb50cffa83603d.1584461853.git.reinette.chatre@intel.com
The SPLIT_LOCK_CPU() macro escaped the tree-wide sweep for old-style
initialization. Update to use X86_MATCH_INTEL_FAM6_MODEL().
Fixes: 6650cdd9a8 ("x86/split_lock: Enable split lock detection by kernel")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200416205754.21177-2-tony.luck@intel.com
Fix the following sparse warning:
arch/x86/kernel/umip.c:84:12: warning: symbol 'umip_insns' was not declared.
Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lkml.kernel.org/r/20200413082213.22934-1-yanaijie@huawei.com
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl6ViNsTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXuXIB/4nuYRCt4d/XaeHF6dCWU45ThG+tNs7
p/OnBPZmknI0SnZ4uR/XW5caHEFj7g9ndYh+M1afZ/zKdsc+syMSDT5XhuhC/GKV
fQRW0qO8N+IAqXbLzJxyBg6fH2anwfe3w2uy2cKDEZk6d4FD5atTWhRY6R4ISq0l
g7pUyvQN1q+G6KH2snmOaZL8mybFkbHrmwtAZzcjzdzqasdLFiQB8EEFkONG66t9
HeNTyUF0mnbGBIePQLSZSHLj5p4yHG/9pa3jgqO5dsmIdsBvoaVNqEi3pCm1s/5n
BH9FWn6fTwpcKvtF385yzBiFFlzBVgXbetxuSmxxOkWW4P+db5B/GL2Y
=fjSF
-----END PGP SIGNATURE-----
Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- a series from Tianyu Lan to fix crash reporting on Hyper-V
- three miscellaneous cleanup patches
* tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/Hyper-V: Report crash data in die() when panic_on_oops is set
x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not set
x86/Hyper-V: Report crash register data or kmsg before running crash kernel
x86/Hyper-V: Trigger crash enlightenment only once during system crash.
x86/Hyper-V: Free hv_panic_page when fail to register kmsg dump
x86/Hyper-V: Unload vmbus channel in hv panic callback
x86: hyperv: report value of misc_features
hv_debugfs: Make hv_debug_root static
hv: hyperv_vmbus.h: Replace zero-length array with flexible-array member
We want to notify Hyper-V when a Linux guest VM crash occurs, so
there is a record of the crash even when kdump is enabled. But
crash_kexec_post_notifiers defaults to "false", so the kdump kernel
runs before the notifiers and Hyper-V never gets notified. Fix this by
always setting crash_kexec_post_notifiers to be true for Hyper-V VMs.
Fixes: 81b18bce48 ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
Link: https://lore.kernel.org/r/20200406155331.2105-5-Tianyu.Lan@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Without at least minimal handling for split lock detection induced #AC,
VMX will just run into the same problem as the VMWare hypervisor, which
was reported by Kenneth.
It will inject the #AC blindly into the guest whether the guest is
prepared or not.
Provide a function for guest mode which acts depending on the host
SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a
warning, disable SLD and mark the task accordingly. Otherwise force
SIGBUS.
[ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lkml.kernel.org/r/20200410115516.978037132@linutronix.de
Link: https://lkml.kernel.org/r/20200402123258.895628824@linutronix.de
Merge yet more updates from Andrew Morton:
- Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc,
gup, hugetlb, pagemap, memremap)
- Various other things (hfs, ocfs2, kmod, misc, seqfile)
* akpm: (34 commits)
ipc/util.c: sysvipc_find_ipc() should increase position index
kernel/gcov/fs.c: gcov_seq_next() should increase position index
fs/seq_file.c: seq_read(): add info message about buggy .next functions
drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings
change email address for Pali Rohár
selftests: kmod: test disabling module autoloading
selftests: kmod: fix handling test numbers above 9
docs: admin-guide: document the kernel.modprobe sysctl
fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()
kmod: make request_module() return an error when autoloading is disabled
mm/memremap: set caching mode for PCI P2PDMA memory to WC
mm/memory_hotplug: add pgprot_t to mhp_params
powerpc/mm: thread pgprot_t through create_section_mapping()
x86/mm: introduce __set_memory_prot()
x86/mm: thread pgprot_t through init_memory_mapping()
mm/memory_hotplug: rename mhp_restrictions to mhp_params
mm/memory_hotplug: drop the flags field from struct mhp_restrictions
mm/special: create generic fallbacks for pte_special() and pte_mkspecial()
mm/vma: introduce VM_ACCESS_FLAGS
mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS
...
In preparation to support a pgprot_t argument for arch_add_memory().
It's required to move the prototype of init_memory_mapping() seeing the
original location came before the definition of pgprot_t.
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Badger <ebadger@gigaio.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200306170846.9333-4-logang@deltatee.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 944d9fec8d ("hugetlb: add support for gigantic page allocation
at runtime") has added the run-time allocation of gigantic pages.
However it actually works only at early stages of the system loading,
when the majority of memory is free. After some time the memory gets
fragmented by non-movable pages, so the chances to find a contiguous 1GB
block are getting close to zero. Even dropping caches manually doesn't
help a lot.
At large scale rebooting servers in order to allocate gigantic hugepages
is quite expensive and complex. At the same time keeping some constant
percentage of memory in reserved hugepages even if the workload isn't
using it is a big waste: not all workloads can benefit from using 1 GB
pages.
The following solution can solve the problem:
1) On boot time a dedicated cma area* is reserved. The size is passed
as a kernel argument.
2) Run-time allocations of gigantic hugepages are performed using the
cma allocator and the dedicated cma area
In this case gigantic hugepages can be allocated successfully with a
high probability, however the memory isn't completely wasted if nobody
is using 1GB hugepages: it can be used for pagecache, anon memory, THPs,
etc.
* On a multi-node machine a per-node cma area is allocated on each node.
Following gigantic hugetlb allocation are using the first available
numa node if the mask isn't specified by a user.
Usage:
1) configure the kernel to allocate a cma area for hugetlb allocations:
pass hugetlb_cma=10G as a kernel argument
2) allocate hugetlb pages as usual, e.g.
echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
If the option isn't enabled or the allocation of the cma area failed,
the current behavior of the system is preserved.
x86 and arm-64 are covered by this patch, other architectures can be
trivially added later.
The patch contains clean-ups and fixes proposed and implemented by Aslan
Bakirov and Randy Dunlap. It also contains ideas and suggestions
proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks!
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Andreas Schaufler <andreas.schaufler@gmx.de>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@kernel.org>
Cc: Aslan Bakirov <aslan@fb.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Link: http://lkml.kernel.org/r/20200407163840.92263-3-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few kernel features depend on ms_hyperv.misc_features, but unlike its
siblings ->features and ->hints, the value was never reported during boot.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Link: https://lore.kernel.org/r/20200407172739.31371-1-olaf@aepfle.de
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Fix the following sparse warning:
arch/x86/kernel/acpi/boot.c:48:5: warning: symbol 'acpi_nobgrt' was not
declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Update the ACPICA code in the kernel to upstream revision 20200326
including:
* Fix for a typo in a comment field (Bob Moore).
* acpiExec namespace init file fixes (Bob Moore).
* Addition of NHLT to the known tables list (Cezary Rojewski).
* Conversion of PlatformCommChannel ASL keyword to PCC (Erik
Kaneda).
* acpiexec cleanup (Erik Kaneda).
* WSMT-related typo fix (Erik Kaneda).
* sprintf() utility function fix (John Levon).
* IVRS IVHD type 11h parsing implementation (Michał Żygowski).
* IVRS IVHD type 10h reserved field name fix (Michał Żygowski).
- Fix ACPI-related CPU hotplug deadlock on x86 (Qian Cai).
- Fix Intel Tiger Lake ACPI device IDs in several places (Gayatri
Kammela).
- Add ACPI backlight blacklist entry for Acer Aspire 5783z (Hans
de Goede).
- Fix documentation of the "acpi_backlight" kernel command line
switch (Randy Dunlap).
- Clean up the acpi_get_psd_map() CPPC library routine (Liguang
Zhang).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl6LQEASHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxqfoQAI/GPq7xhb8jOofmTfLxa4ahO2NDxK1E
Ye4Tcm8JLv78hro7iMUlbPsRXm15lyDxMldGRfxsiLFTF2xQtYhdTnPx+KZ439j+
QokMHUT6gFEMAV7OPFvXd2r58ShJJHezobbn241zTILx1c3ai66dCQrqyhYjlZ28
0hUCyY4ilgXWuYInlckGW3Rp/Qxc9IVOxzFUV90EW9pTb4vKzoqznjNm+dpY8rHm
QFNb2BkTJygOPmJiumi/yJX+74YSZrzW5fS1PDQS4Lr46j0imvWVVataMd1qbQ0+
fDhvhL7IimHiM/qZg67hKpsAt6AcQPhaZ6JyoEGUoafxpBQN0a7b5rMwmL0P/HWV
pL5mKM+jc7zh0HTb+xkpNotJxT+KBFo1jTRxGyVAnK8SThzlyFhKhetiOwaHCIDv
dNYao6bCNsuGLh3T/09xbAmEeCSt7k+ok892N4o9wzqNfoDg6fX/c0M5ZD1F+Awb
l9agU7XChziyDJwAqTbqndx71DK4ALrhZa1tNKA5PGTY8b5XrojoKsOyYk6PYA1x
CqU20muRV4VAzB0pvdiwBc2Yrtfiv32mv5jMNrqrrv3D6S6R8vBUNhHlWKu/75a9
9muIoEHWnK0/a9kmVJG8CUSXTTTPQpvOovesznruTxvGx3Mp9gw+d3/1tjuA/QNM
ZoOOru5AEyi+
=b3Dp
-----END PGP SIGNATURE-----
Merge tag 'acpi-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more ACPI updates from Rafael Wysocki:
"Additional ACPI updates.
These update the ACPICA code in the kernel to the 20200326 upstream
revision, fix an ACPI-related CPU hotplug deadlock on x86, update
Intel Tiger Lake device IDs in some places, add a new ACPI backlight
blacklist entry, update the "acpi_backlight" kernel command line
switch documentation and clean up a CPPC library routine.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20200326
including:
* Fix for a typo in a comment field (Bob Moore)
* acpiExec namespace init file fixes (Bob Moore)
* Addition of NHLT to the known tables list (Cezary Rojewski)
* Conversion of PlatformCommChannel ASL keyword to PCC (Erik
Kaneda)
* acpiexec cleanup (Erik Kaneda)
* WSMT-related typo fix (Erik Kaneda)
* sprintf() utility function fix (John Levon)
* IVRS IVHD type 11h parsing implementation (Michał Żygowski)
* IVRS IVHD type 10h reserved field name fix (Michał Żygowski)
- Fix ACPI-related CPU hotplug deadlock on x86 (Qian Cai)
- Fix Intel Tiger Lake ACPI device IDs in several places (Gayatri
Kammela)
- Add ACPI backlight blacklist entry for Acer Aspire 5783z (Hans de
Goede)
- Fix documentation of the "acpi_backlight" kernel command line
switch (Randy Dunlap)
- Clean up the acpi_get_psd_map() CPPC library routine (Liguang
Zhang)"
* tag 'acpi-5.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86: ACPI: fix CPU hotplug deadlock
thermal: int340x_thermal: fix: Update Tiger Lake ACPI device IDs
platform/x86: intel-hid: fix: Update Tiger Lake ACPI device ID
ACPI: Update Tiger Lake ACPI device IDs
ACPI: video: Use native backlight on Acer Aspire 5783z
ACPI: video: Docs update for "acpi_backlight" kernel parameter options
ACPICA: Update version 20200326
ACPICA: Fixes for acpiExec namespace init file
ACPICA: Add NHLT table signature
ACPICA: WSMT: Fix typo, no functional change
ACPICA: utilities: fix sprintf()
ACPICA: acpiexec: remove redeclaration of acpi_gbl_db_opt_no_region_support
ACPICA: Change PlatformCommChannel ASL keyword to PCC
ACPICA: Fix IVRS IVHD type 10h reserved field name
ACPICA: Implement IVRS IVHD type 11h parsing
ACPICA: Fix a typo in a comment field
ACPI: CPPC: clean up acpi_get_psd_map()
Similar to commit 0266d81e9b ("acpi/processor: Prevent cpu hotplug
deadlock") except this is for acpi_processor_ffh_cstate_probe():
"The problem is that the work is scheduled on the current CPU from the
hotplug thread associated with that CPU.
It's not required to invoke these functions via the workqueue because
the hotplug thread runs on the target CPU already.
Check whether current is a per cpu thread pinned on the target CPU and
invoke the function directly to avoid the workqueue."
WARNING: possible circular locking dependency detected
------------------------------------------------------
cpuhp/1/15 is trying to acquire lock:
ffffc90003447a28 ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: __flush_work+0x4c6/0x630
but task is already holding lock:
ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x3e/0xc0
irq_calc_affinity_vectors+0x5f/0x91
__pci_enable_msix_range+0x10f/0x9a0
pci_alloc_irq_vectors_affinity+0x13e/0x1f0
pci_alloc_irq_vectors_affinity at drivers/pci/msi.c:1208
pqi_ctrl_init+0x72f/0x1618 [smartpqi]
pqi_pci_probe.cold.63+0x882/0x892 [smartpqi]
local_pci_probe+0x7a/0xc0
work_for_cpu_fn+0x2e/0x50
process_one_work+0x57e/0xb90
worker_thread+0x363/0x5b0
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
-> #0 ((work_completion)(&wfc.work)){+.+.}-{0:0}:
__lock_acquire+0x2244/0x32a0
lock_acquire+0x1a2/0x680
__flush_work+0x4e6/0x630
work_on_cpu+0x114/0x160
acpi_processor_ffh_cstate_probe+0x129/0x250
acpi_processor_evaluate_cst+0x4c8/0x580
acpi_processor_get_power_info+0x86/0x740
acpi_processor_hotplug+0xc3/0x140
acpi_soft_cpu_online+0x102/0x1d0
cpuhp_invoke_callback+0x197/0x1120
cpuhp_thread_fun+0x252/0x2f0
smpboot_thread_fn+0x255/0x440
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
other info that might help us debug this:
Chain exists of:
(work_completion)(&wfc.work) --> cpuhp_state-up --> cpuidle_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(cpuidle_lock);
lock(cpuhp_state-up);
lock(cpuidle_lock);
lock((work_completion)(&wfc.work));
*** DEADLOCK ***
3 locks held by cpuhp/1/15:
#0: ffffffffaf51ab10 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
#1: ffffffffaf51ad40 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x69/0x2f0
#2: ffffffffafa1c0e8 (cpuidle_lock){+.+.}-{3:3}, at: cpuidle_pause_and_lock+0x17/0x20
Call Trace:
dump_stack+0xa0/0xea
print_circular_bug.cold.52+0x147/0x14c
check_noncircular+0x295/0x2d0
__lock_acquire+0x2244/0x32a0
lock_acquire+0x1a2/0x680
__flush_work+0x4e6/0x630
work_on_cpu+0x114/0x160
acpi_processor_ffh_cstate_probe+0x129/0x250
acpi_processor_evaluate_cst+0x4c8/0x580
acpi_processor_get_power_info+0x86/0x740
acpi_processor_hotplug+0xc3/0x140
acpi_soft_cpu_online+0x102/0x1d0
cpuhp_invoke_callback+0x197/0x1120
cpuhp_thread_fun+0x252/0x2f0
smpboot_thread_fn+0x255/0x440
kthread+0x1f4/0x220
ret_from_fork+0x27/0x50
Signed-off-by: Qian Cai <cai@lca.pw>
Tested-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Here are 3 SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your current
tree, that should be trivial to handle (one file modified by two things,
one file deleted.)
All 3 of these have been in linux-next for a while, with no reported
issues other than the merge conflict.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXodg5A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykySQCgy9YDrkz7nWq6v3Gohl6+lW/L+rMAnRM4uTZm
m5AuCzO3Azt9KBi7NL+L
=2Lm5
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx
Pull SPDX updates from Greg KH:
"Here are three SPDX patches for 5.7-rc1.
One fixes up the SPDX tag for a single driver, while the other two go
through the tree and add SPDX tags for all of the .gitignore files as
needed.
Nothing too complex, but you will get a merge conflict with your
current tree, that should be trivial to handle (one file modified by
two things, one file deleted.)
All three of these have been in linux-next for a while, with no
reported issues other than the merge conflict"
* tag 'spdx-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
ASoC: MT6660: make spdxcheck.py happy
.gitignore: add SPDX License Identifier
.gitignore: remove too obvious comments
Pull integrity updates from Mimi Zohar:
"Just a couple of updates for linux-5.7:
- A new Kconfig option to enable IMA architecture specific runtime
policy rules needed for secure and/or trusted boot, as requested.
- Some message cleanup (eg. pr_fmt, additional error messages)"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: add a new CONFIG for loading arch-specific policies
integrity: Remove duplicate pr_fmt definitions
IMA: Add log statements for failure conditions
IMA: Update KBUILD_MODNAME for IMA files to ima
Pull x86 vmware updates from Ingo Molnar:
"The main change in this tree is the addition of 'steal time clock
support' for VMware guests"
* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vmware: Use bool type for vmw_sched_clock
x86/vmware: Enable steal time accounting
x86/vmware: Add steal time clock support for VMware guests
x86/vmware: Remove vmware_sched_clock_setup()
x86/vmware: Make vmware_select_hypercall() __init
Pull x86 fpu updates from Ingo Molnar:
"Misc changes:
- add a pkey sanity check
- three commits to improve and future-proof xstate/xfeature handling
some more"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/pkeys: Add check for pkey "overflow"
x86/fpu/xstate: Warn when checking alignment of disabled xfeatures
x86/fpu/xstate: Fix XSAVES offsets in setup_xstate_comp()
x86/fpu/xstate: Fix last_good_offset in setup_xstate_features()
Pull x86 cleanups from Ingo Molnar:
"This topic tree contains more commits than usual:
- most of it are uaccess cleanups/reorganization by Al
- there's a bunch of prototype declaration (--Wmissing-prototypes)
cleanups
- misc other cleanups all around the map"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/mm/set_memory: Fix -Wmissing-prototypes warnings
x86/efi: Add a prototype for efi_arch_mem_reserve()
x86/mm: Mark setup_emu2phys_nid() static
x86/jump_label: Move 'inline' keyword placement
x86/platform/uv: Add a missing prototype for uv_bau_message_interrupt()
kill uaccess_try()
x86: unsafe_put-style macro for sigmask
x86: x32_setup_rt_frame(): consolidate uaccess areas
x86: __setup_rt_frame(): consolidate uaccess areas
x86: __setup_frame(): consolidate uaccess areas
x86: setup_sigcontext(): list user_access_{begin,end}() into callers
x86: get rid of put_user_try in __setup_rt_frame() (both 32bit and 64bit)
x86: ia32_setup_rt_frame(): consolidate uaccess areas
x86: ia32_setup_frame(): consolidate uaccess areas
x86: ia32_setup_sigcontext(): lift user_access_{begin,end}() into the callers
x86/alternatives: Mark text_poke_loc_init() static
x86/cpu: Fix a -Wmissing-prototypes warning for init_ia32_feat_ctl()
x86/mm: Drop pud_mknotpresent()
x86: Replace setup_irq() by request_irq()
x86/configs: Slightly reduce defconfigs
...
Pull x86 build updates from Ingo Molnar:
"A handful of updates: two linker script cleanups and a stock
defconfig+allmodconfig bootability fix"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Discard .note.gnu.property sections in vDSO
x86, vmlinux.lds: Add RUNTIME_DISCARD_EXIT to generic DISCARDS
x86/Kconfig: Make CMDLINE_OVERRIDE depend on non-empty CMDLINE
Pull x86 boot updates from Ingo Molnar:
"Misc cleanups and small enhancements all around the map"
* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/compressed: Fix debug_puthex() parameter type
x86/setup: Fix static memory detection
x86/vmlinux: Drop unneeded linker script discard of .eh_frame
x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
x86/boot/compressed: Remove .eh_frame section from bzImage
x86/boot/compressed/64: Remove .bss/.pgtable from bzImage
x86/boot/compressed/64: Use 32-bit (zero-extended) MOV for z_output_len
x86/boot/compressed/64: Use LEA to initialize boot stack pointer
- A series of commits to make the MSR derived CPU and TSC frequency more
accurate.
It turned out that the frequency tables which have been taken from the
SDM are inaccurate because the SDM provides truncated and rounded
values, e.g. 83.3Mhz (83.3333...) or 116.7Mhz (116.6666...).
This causes time drift in the range of ~1 second per hour
(20-30 seconds per day). On some of these SoCs it's not possible to
recalibrate the TSC because there is no reference (PIT, HPET) available.
With some reverse engineering it was established that the possible
frequencies are derived from the base clock with fixed multiplier /
divider pairs.
For the CPU models which have a known crystal frequency the kernel now
uses multiplier / divider pairs which bring the frequencies closer to
reality and fix the observed time drift issues.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6CApYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoRM4D/9/lgBQQQ+xilpYHLv4lk5ukmkrLEjt
NqL0dZKthd2v4VoAViCZqCYUSuxmo9uGPCxC0Ol7MMB2mUHXrwPn5q2wwcHSE830
KQv8Dk9tCVeJMMTMk2s5t4QBYEHD95+ueObKK1sofz0NkQW3ea+cpRCh4jt2lrnw
X7uT5rSHk87B1VYMPWzELsBEeqan9kUbvbe9se7My5utesOZumn4gj9rmO/5y9Vc
rNuwGEZX8RpQAZZmfEJ00r5iA+VTdWyQ4rhktlQeeIdb4y4axjxMsWQIuaggjdyn
oRA2vZnoc4+IqNUUBvj1q1D3RETwyf3WT+nxiYUdb3VuSh1o7he5MTzgznKzThnU
s+ViOPXbfzrfUUW8dlk6zd5yovmIuQNb0Xk05USqAB3gVQS1fYPnyy+pb9dFDnnB
0zEq3RAQVCb/bkyWQ0JemgHXda3WTABZRCR812L2e+WZD6KjlqySkdeJJ+kxzQwN
6FRNrdtl+8ULy6SlWIC8y0yuVdSIFfgNSm+5HZMrw8VbqJp1ZVpTvKQ+xczjOunn
z9y24IC1IlhtDsTMzIU0LHhgwhVGcohdTNbu3yX4hVQ7EgQOQPVE/XGM5RdjiXzq
bD5j+PCntjoE7hnnxsPnuhDs9ZqNptTo4UevMwTL6rKAgJaPMAc9PB5LqdUynAEx
hpkkMFGU7BSBDw==
=9eX1
-----END PGP SIGNATURE-----
Merge tag 'x86-timers-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
"A series of commits to make the MSR derived CPU and TSC frequency more
accurate.
It turned out that the frequency tables which have been taken from the
SDM are inaccurate because the SDM provides truncated and rounded
values, e.g. 83.3Mhz (83.3333...) or 116.7Mhz (116.6666...).
This causes time drift in the range of ~1 second per hour (20-30
seconds per day). On some of these SoCs it's not possible to
recalibrate the TSC because there is no reference (PIT, HPET)
available.
With some reverse engineering it was established that the possible
frequencies are derived from the base clock with fixed multiplier /
divider pairs.
For the CPU models which have a known crystal frequency the kernel now
uses multiplier / divider pairs which bring the frequencies closer to
reality and fix the observed time drift issues"
* tag 'x86-timers-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc_msr: Make MSR derived TSC frequency more accurate
x86/tsc_msr: Fix MSR_FSB_FREQ mask for Cherry Trail devices
x86/tsc_msr: Use named struct initializers
- Atomic operations (lock prefixed instructions) which span two cache
lines have to acquire the global bus lock. This is at least 1k cycles
slower than an atomic operation within a cache line and disrupts
performance on other cores. Aside of performance disruption this is
a unpriviledged form of DoS.
Some newer CPUs have the capability to raise an #AC trap when such an
operation is attempted. The detection is by default enabled in warning
mode which will warn once when a user space application is caught. A
command line option allows to disable the detection or to select fatal
mode which will terminate offending applications with SIGBUS.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/uMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYocsAD/9yqpw+XlPKNPsfbm9sbirBDfTrENcL
F44iwn4WnrjoW/gnnZCYmPxJFsTtGVPqxHdUf4eyGemg9r9ZEO0DQftmUHC5Z6KX
aa/b5JoeM61wp9HlpVlD4D1jVt4pWyQODQeZnUXE4DEzmRc3cD/5lSU+/VeaIwwz
lxwUemqmXK7ucH2KA7smOGsl2nU6ED84q3mdOB1b4Cw+gWYMUnPJnuS/ipriBRx4
BYbMItcxsFvtdO9Hx8PvGd5LUK0wW8JOWrYQICD2kLpZtHtGeaHpBzFzL0+nMU7d
1epyDqJQDmX+PAzvj+EYyn3HTfobZlckn+tbxMQkkS+oDk1ywOZd+BancClvn5/5
jMfPIQJF5bGASVnzGMWhzVdwthTZiMG4d1iKsUWOA/hN0ch0+rm1BqraToabsEFg
Sv7/rvl9KtSOtMJTeAmMhlZUMBj9m8BtPFjniDwp6nw/upGgJdST5mrKFNYZvqOj
JnXsEMr/nJVW6bnUvT6LF66xbHlzHdxtodkQWqF+IEsyRaOz1zAGpQamP98KxNLc
dq/XYoEe1KqIFbg4BkNP+GeDL3FQDxjFNwPQnnjQEzWRbjkHlfmq1uKCsR2r8mBO
fYNJ1X8lTyGV0kx/ERpWGazzabpzh+8Lr1yMhnoA3EWvlzUjmpN2PFI4oTpTrtzT
c/q16SCxim3NWA==
=D9x8
-----END PGP SIGNATURE-----
Merge tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Thomas Gleixner:
"Support for 'split lock' detection:
Atomic operations (lock prefixed instructions) which span two cache
lines have to acquire the global bus lock. This is at least 1k cycles
slower than an atomic operation within a cache line and disrupts
performance on other cores. Aside of performance disruption this is a
unpriviledged form of DoS.
Some newer CPUs have the capability to raise an #AC trap when such an
operation is attempted. The detection is by default enabled in warning
mode which will warn once when a user space application is caught. A
command line option allows to disable the detection or to select fatal
mode which will terminate offending applications with SIGBUS"
* tag 'x86-splitlock-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
x86/split_lock: Rework the initialization flow of split lock detection
x86/split_lock: Enable split lock detection by kernel
- Convert the 32bit syscalls to be pt_regs based which removes the
requirement to push all 6 potential arguments onto the stack and
consolidates the interface with the 64bit variant
- The first small portion of the exception and syscall related entry
code consolidation which aims to address the recently discovered
issues vs. RCU, int3, NMI and some other exceptions which can
interrupt any context. The bulk of the changes is still work in
progress and aimed for 5.8.
- A few lockdep namespace cleanups which have been applied into this
branch to keep the prerequisites for the ongoing work confined.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B/TMTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYA6EAC7r/bCMxBelljT3b7LkBbiJcocJ+zK
OSzWU9miJGTAvYqn4/ciLKg4dA424b/1rBFlF1hBTCQ0HL5Cv4lajxdKEZCO5WCC
WWTCz+MC60aWFaH3VNoywiLGb39H2IbqWbS9yNPd/wBkLHiMAD6NPQntOvcPaD4j
1lyrMtLzfrWlrHxvxdI3kt5ZpFLYNXr2xk61xQjTz0ROFQBhf2sDsuhHhiYVLPj7
JwYktpbBiPeaw2+I18NPymNPY+VfY8LCTgLl5M+rbKyCqebKaedZQJ7QXFhAEqKC
Y2f+gJsKWtTDzGP2mk/5kF0uP7cd0vJK35ZCXtLZ9BbcNtFZU6w+ADqRo4pJBHRY
QRzo/AWrdkuTJF0CrP6mcneNC7NwWLSdKrE1z77RQCHUPVvhHhRDZsgdLcZ/KKwx
y1ji22trwNB+7LmI2fUOU5RRHZBIuNvQT+mPt24febJuHpZKul62dd3cqTGeSTC+
MYVknYDSg/+jk+83DhuZnTyb9lWTbq/0Q1HRDu6l2LrMIH7YMPpY5Ea64ZFYzWXy
s0+iHEM4mUzltwNauHIntjbwXi3C0l2k1WQyG0gun2eS6SXfu0lb93V4msFj/N1+
oHavH2n2A4XrRr+Ob87fsl7nfXJibWP7R9xPblrWP2sNdqfjSyGd49rnsvpWqWMK
Fj0d7tQ78+/SwA==
=tWXS
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry code updates from Thomas Gleixner:
- Convert the 32bit syscalls to be pt_regs based which removes the
requirement to push all 6 potential arguments onto the stack and
consolidates the interface with the 64bit variant
- The first small portion of the exception and syscall related entry
code consolidation which aims to address the recently discovered
issues vs. RCU, int3, NMI and some other exceptions which can
interrupt any context. The bulk of the changes is still work in
progress and aimed for 5.8.
- A few lockdep namespace cleanups which have been applied into this
branch to keep the prerequisites for the ongoing work confined.
* tag 'x86-entry-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/entry: Fix build error x86 with !CONFIG_POSIX_TIMERS
lockdep: Rename trace_{hard,soft}{irq_context,irqs_enabled}()
lockdep: Rename trace_softirqs_{on,off}()
lockdep: Rename trace_hardirq_{enter,exit}()
x86/entry: Rename ___preempt_schedule
x86: Remove unneeded includes
x86/entry: Drop asmlinkage from syscalls
x86/entry/32: Enable pt_regs based syscalls
x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments
x86/entry/32: Rename 32-bit specific syscalls
x86/entry/32: Clean up syscall_32.tbl
x86/entry: Remove ABI prefixes from functions in syscall tables
x86/entry/64: Add __SYSCALL_COMMON()
x86/entry: Remove syscall qualifier support
x86/entry/64: Remove ptregs qualifier from syscall table
x86/entry: Move max syscall number calculation to syscallhdr.sh
x86/entry/64: Split X32 syscall table into its own file
x86/entry/64: Move sys_ni_syscall stub to common.c
x86/entry/64: Use syscall wrappers for x32_rt_sigreturn
x86/entry: Refactor SYS_NI macros
...
Core:
- Consolidation of the vDSO build infrastructure to address the
difficulties of cross-builds for ARM64 compat vDSO libraries by
restricting the exposure of header content to the vDSO build.
This is achieved by splitting out header content into separate
headers. which contain only the minimaly required information which is
necessary to build the vDSO. These new headers are included from the
kernel headers and the vDSO specific files.
- Enhancements to the generic vDSO library allowing more fine grained
control over the compiled in code, further reducing architecture
specific storage and preparing for adopting the generic library by PPC.
- Cleanup and consolidation of the exit related code in posix CPU timers.
- Small cleanups and enhancements here and there
Drivers:
- The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
- Correct the clock rate of PIT64b global clock
- setup_irq() cleanup
- Preparation for PWM and suspend support for the TI DM timer
- Expand the fttmr010 driver to support ast2600 systems
- The usual small fixes, enhancements and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7
yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk
xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK
hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y
Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ
uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn
3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11
rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF
1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC
8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc
GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7
hu7aKsr2I2q/WQ==
=5o9G
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timekeeping and timer updates from Thomas Gleixner:
"Core:
- Consolidation of the vDSO build infrastructure to address the
difficulties of cross-builds for ARM64 compat vDSO libraries by
restricting the exposure of header content to the vDSO build.
This is achieved by splitting out header content into separate
headers. which contain only the minimaly required information which
is necessary to build the vDSO. These new headers are included from
the kernel headers and the vDSO specific files.
- Enhancements to the generic vDSO library allowing more fine grained
control over the compiled in code, further reducing architecture
specific storage and preparing for adopting the generic library by
PPC.
- Cleanup and consolidation of the exit related code in posix CPU
timers.
- Small cleanups and enhancements here and there
Drivers:
- The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support
- Correct the clock rate of PIT64b global clock
- setup_irq() cleanup
- Preparation for PWM and suspend support for the TI DM timer
- Expand the fttmr010 driver to support ast2600 systems
- The usual small fixes, enhancements and cleanups all over the
place"
* tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
Revert "clocksource/drivers/timer-probe: Avoid creating dead devices"
vdso: Fix clocksource.h macro detection
um: Fix header inclusion
arm64: vdso32: Enable Clang Compilation
lib/vdso: Enable common headers
arm: vdso: Enable arm to use common headers
x86/vdso: Enable x86 to use common headers
mips: vdso: Enable mips to use common headers
arm64: vdso32: Include common headers in the vdso library
arm64: vdso: Include common headers in the vdso library
arm64: Introduce asm/vdso/processor.h
arm64: vdso32: Code clean up
linux/elfnote.h: Replace elf.h with UAPI equivalent
scripts: Fix the inclusion order in modpost
common: Introduce processor.h
linux/ktime.h: Extract common header for vDSO
linux/jiffies.h: Extract common header for vDSO
linux/time64.h: Extract common header for vDSO
linux/time32.h: Extract common header for vDSO
linux/time.h: Extract common header for vDSO
...
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low level
functions cpu_up/down() are now confined to the core code and not
longer accessible from random code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B9VQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodCyD/0WFYAe7LkOfNjkbLa0IeuyLjF9rnCi
ilcSXMLpaVwwoQvm7MopwkXUDdmEIyeJ0B641j3mC3AKCRap4+O36H2IEg2byrj7
twOvQNCfxpVVmCCD11FTH9aQa74LEB6AikTgjevhrRWj6eHsal7c2Ak26AzCgrt+
0eEkOAOWJbLAlbIiPdHlCZ3TMldcs3gg+lRSYd5QCGQVkZFnwpXzyOvpyJEUGGbb
R/JuvwJoLhRMiYAJDILoQQQg/J07ODuivse/R8PWaH2djkn+2NyRGrD794PhyyOg
QoTU0ZrYD3Z48ACXv+N3jLM7wXMcFzjYtr1vW1E3O/YGA7GVIC6XHGbMQ7tEihY0
ajtwq8DcnpKtuouviYnf7NuKgqdmJXkaZjz3Gms6n8nLXqqSVwuQELWV2CXkxNe6
9kgnnKK+xXMOGI4TUhN8bejvkXqRCmKMeQJcWyf+7RA9UIhAJw5o7WGo8gXfQWUx
tazCqDy/inYjqGxckW615fhi2zHfemlYTbSzIGOuMB1TEPKFcrgYAii/VMsYHQVZ
5amkYUXGQ5brlCOzOn38lzp5OkALBnFzD7xgvOcQgWT3ynVpdqADfBytXiEEHh4J
KSkSgSSRcS58397nIxnDcJgJouHLvAWYyPZ4UC6mfynuQIic31qMHGVqwdbEKMY3
4M5dGgqIfOBgYw==
=jwCg
-----END PGP SIGNATURE-----
Merge tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...
Treewide:
- Cleanup of setup_irq() which is not longer required because the
memory allocator is available early. Most cleanup changes come
through the various maintainer trees, so the final removal of
setup_irq() is postponed towards the end of the merge window.
Core:
- Protection against unsafe invocation of interrupt handlers and unsafe
interrupt injection including a fixup of the offending PCI/AER error
injection mechanism.
Invoking interrupt handlers from arbitrary contexts, i.e. outside of
an actual interrupt, can cause inconsistent state on the fragile
x86 interrupt affinity changing hardware trainwreck.
Drivers:
- Second wave of support for the new ARM GICv4.1
- Multi-instance support for Xilinx and PLIC interrupt controllers
- CPU-Hotplug support for PLIC
- The obligatory new driver for X1000 TCU
- Enhancements, cleanups and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B888THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeMJD/9v8GcI/DSY87Fmo7s4odLFVU0J8zZ6
7QlYjSPm4yWv4pqn1TEnEF2pKz5X9Euhoh8BmdMKtdXBqlS4Ix9N+pH8ModcxyQo
aX97zuRUxvqfeeVE+yQRwbbMREj9jj9RW8FRtA39+l5H3uC1GDcc+2aAMIaykQ7+
8lo/6wBd8ZrZ0gsNf4KjlBwMDYAlQSRWxrff38PQ2XRpGKowdp8JFYZuq5Vp0ljJ
r2cE75ldmFSfmtuhhVroBRY0GAqW4/8v8/syAN3Q9jOEII60qhA0dqR085B9veWa
DHSqgLmzyUFFXN7Ntzt/fDirJVsIM4BE9qGu3ftCYHMaPB8hG+xqjbZe9E3D2e/d
+0Pb3TG8EHVOIwzv1t9+6462qYGkBhmBXtbj6GptPYk2Ai4HZlNaSsa8jUNyHvGz
WDegdRjt7O5RjqDH/VwrQxW/AEp05f/1egweBXbq9aF6j9nqeOur75c/PdxZxAX5
WUMtouXP2WN+sMW8k1T5cmVMGWxLGBB0wwG4LC/mXzHnkDiN1+2wEUHmhS8Voi3q
3HXeYBJeukUYbVvMKRvWVAD330TxFjAyd6pPwCdoNY2ZngJnQWlDD9vbYYX2osoW
kP+KhIANNBVqdK7NqlLoqcr3SdHn01pQYuVHejNzxb7E6/mmpMlaYDJc/rMPi/eM
0/rzl8fAj/WyBQ==
=DZ/G
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Treewide:
- Cleanup of setup_irq() which is not longer required because the
memory allocator is available early.
Most cleanup changes come through the various maintainer trees, so
the final removal of setup_irq() is postponed towards the end of
the merge window.
Core:
- Protection against unsafe invocation of interrupt handlers and
unsafe interrupt injection including a fixup of the offending
PCI/AER error injection mechanism.
Invoking interrupt handlers from arbitrary contexts, i.e. outside
of an actual interrupt, can cause inconsistent state on the
fragile x86 interrupt affinity changing hardware trainwreck.
Drivers:
- Second wave of support for the new ARM GICv4.1
- Multi-instance support for Xilinx and PLIC interrupt controllers
- CPU-Hotplug support for PLIC
- The obligatory new driver for X1000 TCU
- Enhancements, cleanups and fixes all over the place"
* tag 'irq-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
unicore32: Replace setup_irq() by request_irq()
sh: Replace setup_irq() by request_irq()
hexagon: Replace setup_irq() by request_irq()
c6x: Replace setup_irq() by request_irq()
alpha: Replace setup_irq() by request_irq()
irqchip/gic-v4.1: Eagerly vmap vPEs
irqchip/gic-v4.1: Add VSGI property setup
irqchip/gic-v4.1: Add VSGI allocation/teardown
irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer
irqchip/gic-v4.1: Plumb set_vcpu_affinity SGI callbacks
irqchip/gic-v4.1: Plumb get/set_irqchip_state SGI callbacks
irqchip/gic-v4.1: Plumb mask/unmask SGI callbacks
irqchip/gic-v4.1: Add initial SGI configuration
irqchip/gic-v4.1: Plumb skeletal VSGI irqchip
irqchip/stm32: Retrigger both in eoi and unmask callbacks
irqchip/gic-v3: Move irq_domain_update_bus_token to after checking for NULL domain
irqchip/xilinx: Do not call irq_set_default_host()
irqchip/xilinx: Enable generic irq multi handler
irqchip/xilinx: Fill error code when irq domain registration fails
irqchip/xilinx: Add support for multiple instances
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Various NUMA scheduling updates: harmonize the load-balancer and
NUMA placement logic to not work against each other. The intended
result is better locality, better utilization and fewer migrations.
- Introduce Thermal Pressure tracking and optimizations, to improve
task placement on thermally overloaded systems.
- Implement frequency invariant scheduler accounting on (some) x86
CPUs. This is done by observing and sampling the 'recent' CPU
frequency average at ~tick boundaries. The CPU provides this data
via the APERF/MPERF MSRs. This hopefully makes our capacity
estimates more precise and keeps tasks on the same CPU better even
if it might seem overloaded at a lower momentary frequency. (As
usual, turbo mode is a complication that we resolve by observing
the maximum frequency and renormalizing to it.)
- Add asymmetric CPU capacity wakeup scan to improve capacity
utilization on asymmetric topologies. (big.LITTLE systems)
- PSI fixes and optimizations.
- RT scheduling capacity awareness fixes & improvements.
- Optimize the CONFIG_RT_GROUP_SCHED constraints code.
- Misc fixes, cleanups and optimizations - see the changelog for
details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
threads: Update PID limit comment according to futex UAPI change
sched/fair: Fix condition of avg_load calculation
sched/rt: cpupri_find: Trigger a full search as fallback
kthread: Do not preempt current task if it is going to call schedule()
sched/fair: Improve spreading of utilization
sched: Avoid scale real weight down to zero
psi: Move PF_MEMSTALL out of task->flags
MAINTAINERS: Add maintenance information for psi
psi: Optimize switching tasks inside shared cgroups
psi: Fix cpu.pressure for cpu.max and competing cgroups
sched/core: Distribute tasks within affinity masks
sched/fair: Fix enqueue_task_fair warning
thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
sched/rt: Remove unnecessary push for unfit tasks
sched/rt: Allow pulling unfitting task
sched/rt: Optimize cpupri_find() on non-heterogenous systems
sched/rt: Re-instate old behavior in select_task_rq_rt()
sched/rt: cpupri_find: Implement fallback mechanism for !fit case
sched/fair: Fix reordering of enqueue/dequeue_task_fair()
sched/fair: Fix runnable_avg for throttled cfs
...
Pull perf updates from Ingo Molnar:
"The main changes in this cycle were:
Kernel side changes:
- A couple of x86/cpu cleanups and changes were grandfathered in due
to patch dependencies. These clean up the set of CPU model/family
matching macros with a consistent namespace and C99 initializer
style.
- A bunch of updates to various low level PMU drivers:
* AMD Family 19h L3 uncore PMU
* Intel Tiger Lake uncore support
* misc fixes to LBR TOS sampling
- optprobe fixes
- perf/cgroup: optimize cgroup event sched-in processing
- misc cleanups and fixes
Tooling side changes are to:
- perf {annotate,expr,record,report,stat,test}
- perl scripting
- libapi, libperf and libtraceevent
- vendor events on Intel and S390, ARM cs-etm
- Intel PT updates
- Documentation changes and updates to core facilities
- misc cleanups, fixes and other enhancements"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (89 commits)
cpufreq/intel_pstate: Fix wrong macro conversion
x86/cpu: Cleanup the now unused CPU match macros
hwrng: via_rng: Convert to new X86 CPU match macros
crypto: Convert to new CPU match macros
ASoC: Intel: Convert to new X86 CPU match macros
powercap/intel_rapl: Convert to new X86 CPU match macros
PCI: intel-mid: Convert to new X86 CPU match macros
mmc: sdhci-acpi: Convert to new X86 CPU match macros
intel_idle: Convert to new X86 CPU match macros
extcon: axp288: Convert to new X86 CPU match macros
thermal: Convert to new X86 CPU match macros
hwmon: Convert to new X86 CPU match macros
platform/x86: Convert to new CPU match macros
EDAC: Convert to new X86 CPU match macros
cpufreq: Convert to new X86 CPU match macros
ACPI: Convert to new X86 CPU match macros
x86/platform: Convert to new CPU match macros
x86/kernel: Convert to new CPU match macros
x86/kvm: Convert to new CPU match macros
x86/perf/events: Convert to new CPU match macros
...
Pull EFI updates from Ingo Molnar:
"The EFI changes in this cycle are much larger than usual, for two
(positive) reasons:
- The GRUB project is showing signs of life again, resulting in the
introduction of the generic Linux/UEFI boot protocol, instead of
x86 specific hacks which are increasingly difficult to maintain.
There's hope that all future extensions will now go through that
boot protocol.
- Preparatory work for RISC-V EFI support.
The main changes are:
- Boot time GDT handling changes
- Simplify handling of EFI properties table on arm64
- Generic EFI stub cleanups, to improve command line handling, file
I/O, memory allocation, etc.
- Introduce a generic initrd loading method based on calling back
into the firmware, instead of relying on the x86 EFI handover
protocol or device tree.
- Introduce a mixed mode boot method that does not rely on the x86
EFI handover protocol either, and could potentially be adopted by
other architectures (if another one ever surfaces where one
execution mode is a superset of another)
- Clean up the contents of 'struct efi', and move out everything that
doesn't need to be stored there.
- Incorporate support for UEFI spec v2.8A changes that permit
firmware implementations to return EFI_UNSUPPORTED from UEFI
runtime services at OS runtime, and expose a mask of which ones are
supported or unsupported via a configuration table.
- Partial fix for the lack of by-VA cache maintenance in the
decompressor on 32-bit ARM.
- Changes to load device firmware from EFI boot service memory
regions
- Various documentation updates and minor code cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
efi/libstub/arm: Fix spurious message that an initrd was loaded
efi/libstub/arm64: Avoid image_base value from efi_loaded_image
partitions/efi: Fix partition name parsing in GUID partition entry
efi/x86: Fix cast of image argument
efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations
efi: Fix a mistype in comments mentioning efivar_entry_iter_begin()
efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux
efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map()
efi/x86: Ignore the memory attributes table on i386
efi/x86: Don't relocate the kernel unless necessary
efi/x86: Remove extra headroom for setup block
efi/x86: Add kernel preferred address to PE header
efi/x86: Decompress at start of PE image load address
x86/boot/compressed/32: Save the output address instead of recalculating it
efi/libstub/x86: Deal with exit() boot service returning
x86/boot: Use unsigned comparison for addresses
efi/x86: Avoid using code32_start
efi/x86: Make efi32_pe_entry() more readable
efi/x86: Respect 32-bit ABI in efi32_pe_entry()
efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA
...