There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.
cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}(). Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.
Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20190715151641.29210-1-andrew.cooper3@citrix.com
Pull x86 fixes from Thomas Gleixner:
"A set of x86 specific fixes and updates:
- The CR2 corruption fixes which store CR2 early in the entry code
and hand the stored address to the fault handlers.
- Revert a forgotten leftover of the dropped FSGSBASE series.
- Plug a memory leak in the boot code.
- Make the Hyper-V assist functionality robust by zeroing the shadow
page.
- Remove a useless check for dead processes with LDT
- Update paravirt and VMware maintainers entries.
- A few cleanup patches addressing various compiler warnings"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/entry/64: Prevent clobbering of saved CR2 value
x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
x86, boot: Remove multiple copy of static function sanitize_boot_params()
x86/boot/compressed/64: Remove unused variable
x86/boot/efi: Remove unused variables
x86/mm, tracing: Fix CR2 corruption
x86/entry/64: Update comments and sanity tests for create_gap
x86/entry/64: Simplify idtentry a little
x86/entry/32: Simplify common_exception
x86/paravirt: Make read_cr2() CALLEE_SAVE
MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
x86/process: Delete useless check for dead process with LDT
x86: math-emu: Hide clang warnings for 16-bit overflow
x86/e820: Use proper booleans instead of 0/1
x86/apic: Silence -Wtype-limits compiler warnings
x86/mm: Free sme_early_buffer after init
x86/boot: Fix memory leak in default_get_smp_config()
Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
Pull core fixes from Thomas Gleixner:
- A collection of objtool fixes which address recent fallout partially
exposed by newer toolchains, clang, BPF and general code changes.
- Force USER_DS for user stack traces
[ Note: the "objtool fixes" are not all to objtool itself, but for
kernel code that triggers objtool warnings.
Things like missing function size annotations, or code that confuses
the unwinder etc. - Linus]
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
objtool: Support conditional retpolines
objtool: Convert insn type to enum
objtool: Fix seg fault on bad switch table entry
objtool: Support repeated uses of the same C jump table
objtool: Refactor jump table code
objtool: Refactor sibling call detection logic
objtool: Do frame pointer check before dead end check
objtool: Change dead_end_function() to return boolean
objtool: Warn on zero-length functions
objtool: Refactor function alias logic
objtool: Track original function across branches
objtool: Add mcsafe_handle_tail() to the uaccess safe list
bpf: Disable GCC -fgcse optimization for ___bpf_prog_run()
x86/uaccess: Remove redundant CLACs in getuser/putuser error paths
x86/uaccess: Don't leak AC flag into fentry from mcsafe_handle_tail()
x86/uaccess: Remove ELF function annotation from copy_user_handle_tail()
x86/head/64: Annotate start_cpu0() as non-callable
x86/entry: Fix thunk function ELF sizes
x86/kvm: Don't call kvm_spurious_fault() from .fixup
x86/kvm: Replace vmx_vmenter()'s call to kvm_spurious_fault() with UD2
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXTFdBAAKCRCAXGG7T9hj
vkwEAQDKDApCcJymAaq+BP2/lU/kErzFFXQ7seDN84q13ZMfcwEAzDz7vU1zicMP
Sdq1LzFdiuXjk34BBi2PURXZAVoaXgU=
=KkHz
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
"Fixes and features:
- A series to introduce a common command line parameter for disabling
paravirtual extensions when running as a guest in virtualized
environment
- A fix for int3 handling in Xen pv guests
- Removal of the Xen-specific tmem driver as support of tmem in Xen
has been dropped (and it was experimental only)
- A security fix for running as Xen dom0 (XSA-300)
- A fix for IRQ handling when offlining cpus in Xen guests
- Some small cleanups"
* tag 'for-linus-5.3a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: let alloc_xenballooned_pages() fail if not enough memory free
xen/pv: Fix a boot up hang revealed by int3 self test
x86/xen: Add "nopv" support for HVM guest
x86/paravirt: Remove const mark from x86_hyper_xen_hvm variable
xen: Map "xen_nopv" parameter to "nopv" and mark it obsolete
x86: Add "nopv" parameter to disable PV extensions
x86/xen: Mark xen_hvm_need_lapic() and xen_x2apic_para_available() as __init
xen: remove tmem driver
Revert "x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized"
xen/events: fix binding user event channels to cpus
After making a change to improve objtool's sibling call detection, it
started showing the following warning:
arch/x86/kvm/vmx/nested.o: warning: objtool: .fixup+0x15: sibling call from callable instruction with modified stack frame
The problem is the ____kvm_handle_fault_on_reboot() macro. It does a
fake call by pushing a fake RIP and doing a jump. That tricks the
unwinder into printing the function which triggered the exception,
rather than the .fixup code.
Instead of the hack to make it look like the original function made the
call, just change the macro so that the original function actually does
make the call. This allows removal of the hack, and also makes objtool
happy.
I triggered a vmx instruction exception and verified that the stack
trace is still sane:
kernel BUG at arch/x86/kvm/x86.c:358!
invalid opcode: 0000 [#1] SMP PTI
CPU: 28 PID: 4096 Comm: qemu-kvm Not tainted 5.2.0+ #16
Hardware name: Lenovo THINKSYSTEM SD530 -[7X2106Z000]-/-[7X2106Z000]-, BIOS -[TEE113Z-1.00]- 07/17/2017
RIP: 0010:kvm_spurious_fault+0x5/0x10
Code: 00 00 00 00 00 8b 44 24 10 89 d2 45 89 c9 48 89 44 24 10 8b 44 24 08 48 89 44 24 08 e9 d4 40 22 00 0f 1f 40 00 0f 1f 44 00 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 55 49 89 fd 41
RSP: 0018:ffffbf91c683bd00 EFLAGS: 00010246
RAX: 000061f040000000 RBX: ffff9e159c77bba0 RCX: ffff9e15a5c87000
RDX: 0000000665c87000 RSI: ffff9e15a5c87000 RDI: ffff9e159c77bba0
RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9e15a5c87000
R10: 0000000000000000 R11: fffff8f2d99721c0 R12: ffff9e159c77bba0
R13: ffffbf91c671d960 R14: ffff9e159c778000 R15: 0000000000000000
FS: 00007fa341cbe700(0000) GS:ffff9e15b7400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdd38356804 CR3: 00000006759de003 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
loaded_vmcs_init+0x4f/0xe0
alloc_loaded_vmcs+0x38/0xd0
vmx_create_vcpu+0xf7/0x600
kvm_vm_ioctl+0x5e9/0x980
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
? __switch_to_asm+0x34/0x70
? free_one_page+0x13f/0x4e0
do_vfs_ioctl+0xa4/0x630
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x1c0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fa349b1ee5b
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/64a9b64d127e87b6920a97afde8e96ea76f6524e.1563413318.git.jpoimboe@redhat.com
The __raw_callee_save_*() functions have an ELF symbol size of zero,
which confuses objtool and other tools.
Fixes a bunch of warnings like the following:
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pte_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_pgd_val() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pte() is missing an ELF size annotation
arch/x86/xen/mmu_pv.o: warning: objtool: __raw_callee_save_xen_make_pgd() is missing an ELF size annotation
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/afa6d49bb07497ca62e4fc3b27a2d0cece545b4e.1563413318.git.jpoimboe@redhat.com
- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXS88txQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qhaPAQDHaAmu6wXtZjZE6GU4ZP61UNgDECmZ
4wlGrNc1AAlqAQD/QC8339p37aDCp9n27VY1wmJwF3nca+jAHfQLqWkkYgw=
=n/tz
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The main changes in this release include:
- Add user space specific memory reading for kprobes
- Allow kprobes to be executed earlier in boot
The rest are mostly just various clean ups and small fixes"
* tag 'trace-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
tracing: Make trace_get_fields() global
tracing: Let filter_assign_type() detect FILTER_PTR_STRING
tracing: Pass type into tracing_generic_entry_update()
ftrace/selftest: Test if set_event/ftrace_pid exists before writing
ftrace/selftests: Return the skip code when tracing directory not configured in kernel
tracing/kprobe: Check registered state using kprobe
tracing/probe: Add trace_event_call accesses APIs
tracing/probe: Add probe event name and group name accesses APIs
tracing/probe: Add trace flag access APIs for trace_probe
tracing/probe: Add trace_event_file access APIs for trace_probe
tracing/probe: Add trace_event_call register API for trace_probe
tracing/probe: Add trace_probe init and free functions
tracing/uprobe: Set print format when parsing command
tracing/kprobe: Set print format right after parsed command
kprobes: Fix to init kprobes in subsys_initcall
tracepoint: Use struct_size() in kmalloc()
ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS
ftrace: Enable trampoline when rec count returns back to one
tracing/kprobe: Do not run kprobe boot tests if kprobe_event is on cmdline
tracing: Make a separate config for trace event self tests
...
Despite the current efforts to read CR2 before tracing happens there still
exist a number of possible holes:
idtentry page_fault do_page_fault has_error_code=1
call error_entry
TRACE_IRQS_OFF
call trace_hardirqs_off*
#PF // modifies CR2
CALL_enter_from_user_mode
__context_tracking_exit()
trace_user_exit(0)
#PF // modifies CR2
call do_page_fault
address = read_cr2(); /* whoopsie */
And similar for i386.
Fix it by pulling the CR2 read into the entry code, before any of that
stuff gets a chance to run and ruin things.
Reported-by: He Zhe <zhe.he@windriver.com>
Reported-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: rostedt@goodmis.org
Cc: torvalds@linux-foundation.org
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: jgross@suse.com
Cc: joel@joelfernandes.org
Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org
Debugged-by: Steven Rostedt <rostedt@goodmis.org>
Commit 7457c0da02 ("x86/alternatives: Add int3_emulate_call()
selftest") is used to ensure there is a gap setup in int3 exception stack
which could be used for inserting call return address.
This gap is missed in XEN PV int3 exception entry path, then below panic
triggered:
[ 0.772876] general protection fault: 0000 [#1] SMP NOPTI
[ 0.772886] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.2.0+ #11
[ 0.772893] RIP: e030:int3_magic+0x0/0x7
[ 0.772905] RSP: 3507:ffffffff82203e98 EFLAGS: 00000246
[ 0.773334] Call Trace:
[ 0.773334] alternative_instructions+0x3d/0x12e
[ 0.773334] check_bugs+0x7c9/0x887
[ 0.773334] ? __get_locked_pte+0x178/0x1f0
[ 0.773334] start_kernel+0x4ff/0x535
[ 0.773334] ? set_init_arg+0x55/0x55
[ 0.773334] xen_start_kernel+0x571/0x57a
For 64bit PV guests, Xen's ABI enters the kernel with using SYSRET, with
%rcx/%r11 on the stack. To convert back to "normal" looking exceptions,
the xen thunks do 'xen_*: pop %rcx; pop %r11; jmp *'.
E.g. Extracting 'xen_pv_trap xenint3' we have:
xen_xenint3:
pop %rcx;
pop %r11;
jmp xenint3
As xenint3 and int3 entry code are same except xenint3 doesn't generate
a gap, we can fix it by using int3 and drop useless xenint3.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
PVH guest needs PV extentions to work, so "nopv" parameter should be
ignored for PVH but not for HVM guest.
If PVH guest boots up via the Xen-PVH boot entry, xen_pvh is set early,
we know it's PVH guest and ignore "nopv" parameter directly.
If PVH guest boots up via the normal boot entry same as HVM guest, it's
hard to distinguish PVH and HVM guest at that time. In this case, we
have to panic early if PVH is detected and nopv is enabled to avoid a
worse situation later.
Remove static from bool_x86_init_noop/x86_op_int_noop so they could be
used globally. Move xen_platform_hvm() after xen_hvm_guest_late_init()
to avoid compile error.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
.. as "nopv" support needs it to be changeable at boot up stage.
Checkpatch reports warning, so move variable declarations from
hypervisor.c to hypervisor.h
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
In virtualization environment, PV extensions (drivers, interrupts,
timers, etc) are enabled in the majority of use cases which is the
best option.
However, in some cases (kexec not fully working, benchmarking)
we want to disable PV extensions. We have "xen_nopv" for that purpose
but only for XEN. For a consistent admin experience a common command
line parameter "nopv" set across all PV guest implementations is a
better choice.
There are guest types which just won't work without PV extensions,
like Xen PV, Xen PVH and jailhouse. add a "ignore_nopv" member to
struct hypervisor_x86 set to true for those guest types and call
the detect functions only if nopv is false or ignore_nopv is true.
Suggested-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
.. as they are only called at early bootup stage. In fact, other
functions in x86_hyper_xen_hvm.init.* are all marked as __init.
Unexport xen_hvm_need_lapic as it's never used outside.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined
with the long-out-of-date comment can lead to the impression than an
architecture may just enable it (since __add_pages() now "comprehends
device memory" for itself) and expect things to work.
In practice, however, ZONE_DEVICE users have little chance of
functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean
that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper
dependency so the real situation is clearer.
Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.com
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove
it entirely.
Link: http://lkml.kernel.org/r/20190613161155.16946-1-steve@sk2.org
Signed-off-by: Stephen Kitt <steve@sk2.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are many compiler warnings like this,
In file included from ./arch/x86/include/asm/smp.h:13,
from ./arch/x86/include/asm/mmzone_64.h:11,
from ./arch/x86/include/asm/mmzone.h:5,
from ./include/linux/mmzone.h:969,
from ./include/linux/gfp.h:6,
from ./include/linux/mm.h:10,
from arch/x86/kernel/apic/io_apic.c:34:
arch/x86/kernel/apic/io_apic.c: In function 'check_timer':
./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
if ((v) <= apic_verbosity) \
^~
arch/x86/kernel/apic/io_apic.c:2160:2: note: in expansion of macro
'apic_printk'
apic_printk(APIC_QUIET, KERN_INFO "..TIMER: vector=0x%02X "
^~~~~~~~~~~
./arch/x86/include/asm/apic.h:37:11: warning: comparison of unsigned
expression >= 0 is always true [-Wtype-limits]
if ((v) <= apic_verbosity) \
^~
arch/x86/kernel/apic/io_apic.c:2207:4: note: in expansion of macro
'apic_printk'
apic_printk(APIC_QUIET, KERN_ERR "..MP-BIOS bug: "
^~~~~~~~~~~
APIC_QUIET is 0, so silence them by making apic_verbosity type int.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1562621805-24789-1-git-send-email-cai@lca.pw
ASUS WMI driver got a big refactoring in order to support the TUF Gaming
laptops. Besides that, the regression with backlight being permanently off
on various EeePC laptops has been fixed.
Accelerometer on HP ProBook 450 G0 shows wrong measurements due to
X axis being inverted. This has been fixed.
Intel PMC core driver has been extended to be ACPI enumerated
if the DSDT provides device with _HID "INT33A1". This allows
to convert the driver to be pure platform and support new hardware
purely based on ACPI DSDT.
From now on the Intel Speed Select Technology is supported thru
a corresponding driver. This driver provides an access to the features
of the ISST, such as Performance Profile, Core Power, Base frequency and
Turbo Frequency.
Mellanox platform drivers has been refactored and now extended
to support more systems, including new coming ones.
The OLPC XO-1.75 platform is now supported.
CB4063 Beckhoff Automation board is using PMC clocks,
provided via pmc_atom driver, for ethernet controllers in a way
that they can't be managed by the clock driver. The quirk
has been extended to cover this case.
Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile
the information of Chuwi Hi10 Air has been fixed to cover more models
based on the same platform.
Xiaomi notebooks have WMI interface enabled. Thus, the driver to support it
has been provided. It required some extension of the generic WMI library,
which allows to propagate opaque context to the ->probe() of the
individual drivers.
This release includes debugfs clean up from Greg KH for several drivers
that drop return code check and make debugfs absence or failure non-fatal.
Miscellaneous fixes here and there, mostly for Acer WMI and
various Intel drivers.
The listed below commits are duplicated due to previously pushed fixes in v5.2 cycle:
- 1dd93f873d platform/x86: asus-wmi: Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
- 89ae3a0736 platform/x86: intel-vbtn: Report switch events when event wakes device
- fa882fc80d platform/x86: mlx-platform: Fix parent device in i2c-mux-reg device registration
- 0bfcd24b39 platform/mellanox: mlxreg-hotplug: Add devm_free_irq call to remove flow
The following is an automated git shortlog grouped by driver:
acer-wmi:
- Mark expected switch fall-throughs
- no need to check return value of debugfs_create functions
asus-nb-wmi:
- Add microphone mute key code
asus-wmi:
- Use dev_get_drvdata()
- Do not disable keyboard backlight on unloading
- Switch fan boost mode
- Enhance detection of thermal data
- Organize code into sections
- Refactor error handling
- Support WMI event queue
- Refactor WMI event handling
- Improve DSTS WMI method ID detection
- Increase input buffer size of WMI methods
- Fix preserving keyboard backlight intensity on load
- Fix hwmon device cleanup
- no need to check return value of debugfs_create functions
- Only Tell EC the OS will handle display hotkeys from asus_nb_wmi
dell-laptop:
- no need to check return value of debugfs_create functions
hp_accel:
- Add support for HP ProBook 450 G0
ideapad-laptop:
- no need to check return value of debugfs_create functions
intel_int0002_vgpio:
- Get rid of custom ICPU() macro
intel_menlow:
- avoid null pointer deference error
intel_pmc:
- no need to check return value of debugfs_create functions
intel_pmc_core:
- Attach using APCI HID "INT33A1"
- transform Pkg C-state residency from TSC ticks into microseconds
intel_telemetry:
- no need to check return value of debugfs_create functions
intel-vbtn:
- Report switch events when event wakes device
ISST:
- Restore state on resume
- Add Intel Speed Select PUNIT MSR interface
- Add Intel Speed Select mailbox interface via MSRs
- Add Intel Speed Select mailbox interface via PCI
- Add Intel Speed Select mmio interface
- Add IOCTL to Translate Linux logical CPU to PUNIT CPU number
- Store per CPU information
- Add common API to register and handle ioctls
- Update ioctl-number.txt for Intel Speed Select interface
- A tool to validate Intel Speed Select commands
- Add .gitignore file
MAINTAINERS:
- Update for Intel Speed Select Technology
mlx-platform:
- Fix error handling in mlxplat_init()
- Add more reset cause attributes
- Modify DMI matching order
- Add regmap structure for the next generation systems
- Change API for i2c-mlxcpld driver activation
- Move regmap initialization before all drivers activation
- Fix parent device in i2c-mux-reg device registration
- Add new attribute for mlxreg-io sysfs interfaces
pcengines-apuv2:
- Make two symbols static
- Fix PCENGINES_APU2 Kconfig warning
OLPC:
- Add a config menu category for XO 1.75
- Require CONFIG_POWER_SUPPLY for XO-1.75 EC
- Fix olpc_xo175_ec_cmd() return value
- Make olpc_dt_compatible_match() static __init
- Add INPUT dependencies
- Fix build error without CONFIG_SPI
- Add a regulator for the DCON
- Add XO-1.75 EC driver
- Use BIT() and GENMASK() for event masks
- Avoid a warning if the EC didn't register yet
- Move EC-specific functionality out from x86
- Remove an unused include
- Add OLPC XO-1.75 EC bindings
platform/mellanox:
- mlxreg-hotplug: Add devm_free_irq call to remove flow
pmc_atom:
- Add CB4063 Beckhoff Automation board to critclk_systems DMI table
- no need to check return value of debugfs_create functions
Kconfig:
- Remove left-over BACKLIGHT_LCD_SUPPORT
samsung-laptop:
- no need to check return value of debugfs_create functions
touchscreen_dmi:
- Update Hi10 Air filter
- Add info for the CHUWI Hi10 Plus tablet.
wmi:
- add Xiaomi WMI key driver
- add context argument to the probe function
- add context pointer field to struct wmi_device_id
- Add function to get _UID of WMI device
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEqaflIX74DDDzMJJtb7wzTHR8rCgFAl0rP88ACgkQb7wzTHR8
rCis8BAAnRgRgi8x1C7xn66gAUHsDXpY0tF9cp/Fw3HyTmFCQkRSmnLkMM2DqGi+
dvB9U1zPcGWwdwryKFsJXioEK3erYpiYyT2VwLtW4S7P5jQ+N9biT4TZ8yFp0MEr
MZC50LZDV1JTp1a0GQyrMpfoMBnE7UhR2GL8UbGli/WwXFE5BLkrJ1pdrjhYZOHl
rJcgq3HPAhV5qkUkIU7gTC2GGSPydjBqk0OhVIU4dPsYwXIb2gXc0yR0QVwKm5x3
I/NQwBOBMKmdI6uJ8BJyg/p888Strw65YJaTe5wtvG8ljuIbcN/aQ3ZmClNrUnc0
58byqJCpRhg9HN39VpF9rsApEGxKTlitAUAUKy7lgue7/mycHbA1Syzz29AIM+2v
ey2/zgFeeWtgh1cuh2cUWlCE6woW7ED4VpDxhkXlX4xGUp+CILEiFqcsULlcc4j5
sgojCLRPs78roYj9Y84CwYbsd7J/Ce4r2evBpKYPqYxDbUiuH2aVQtEdPTKv9/xC
yHtBuJJSxY7a+sf4OZONRo13dfvRoZIPjcccR8yTOakS2/1Fqph7MpHyDkwFAfeS
M2f+OcJn9IECol1391PTLj9Dx3jApyVk21HJdiIj7sKZgJOSS54AFm0/Ywk0MFpY
XScXKulV48SdL4ZKup5aIpDzyP5zuvXszKQboRitep1dHiR9bl0=
=DC5j
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver updates from Andy Shevchenko:
"Gathered a bunch of x86 platform driver changes. It's rather big,
since includes two big refactors and completely new driver:
- ASUS WMI driver got a big refactoring in order to support the TUF
Gaming laptops. Besides that, the regression with backlight being
permanently off on various EeePC laptops has been fixed.
- Accelerometer on HP ProBook 450 G0 shows wrong measurements due to
X axis being inverted. This has been fixed.
- Intel PMC core driver has been extended to be ACPI enumerated if
the DSDT provides device with _HID "INT33A1". This allows to
convert the driver to be pure platform and support new hardware
purely based on ACPI DSDT.
- From now on the Intel Speed Select Technology is supported thru a
corresponding driver. This driver provides an access to the
features of the ISST, such as Performance Profile, Core Power, Base
frequency and Turbo Frequency.
- Mellanox platform drivers has been refactored and now extended to
support more systems, including new coming ones.
- The OLPC XO-1.75 platform is now supported.
- CB4063 Beckhoff Automation board is using PMC clocks, provided via
pmc_atom driver, for ethernet controllers in a way that they can't
be managed by the clock driver. The quirk has been extended to
cover this case.
- Touchscreen on Chuwi Hi10 Plus tablet has been enabled. Meanwhile
the information of Chuwi Hi10 Air has been fixed to cover more
models based on the same platform.
- Xiaomi notebooks have WMI interface enabled. Thus, the driver to
support it has been provided. It required some extension of the
generic WMI library, which allows to propagate opaque context to
the ->probe() of the individual drivers.
This release includes debugfs clean up from Greg KH for several
drivers that drop return code check and make debugfs absence or
failure non-fatal.
Also miscellaneous fixes here and there, mostly for Acer WMI and
various Intel drivers"
* tag 'platform-drivers-x86-v5.3-1' of git://git.infradead.org/linux-platform-drivers-x86: (74 commits)
platform/x86: Fix PCENGINES_APU2 Kconfig warning
tools/power/x86/intel-speed-select: Add .gitignore file
platform/x86: mlx-platform: Fix error handling in mlxplat_init()
platform/x86: intel_pmc_core: Attach using APCI HID "INT33A1"
platform/x86: intel_pmc_core: transform Pkg C-state residency from TSC ticks into microseconds
platform/x86: asus-wmi: Use dev_get_drvdata()
Documentation/ABI: Add new attribute for mlxreg-io sysfs interfaces
platform/x86: mlx-platform: Add more reset cause attributes
platform/x86: mlx-platform: Modify DMI matching order
platform/x86: mlx-platform: Add regmap structure for the next generation systems
platform/x86: mlx-platform: Change API for i2c-mlxcpld driver activation
platform/x86: mlx-platform: Move regmap initialization before all drivers activation
MAINTAINERS: Update for Intel Speed Select Technology
tools/power/x86: A tool to validate Intel Speed Select commands
platform/x86: ISST: Restore state on resume
platform/x86: ISST: Add Intel Speed Select PUNIT MSR interface
platform/x86: ISST: Add Intel Speed Select mailbox interface via MSRs
platform/x86: ISST: Add Intel Speed Select mailbox interface via PCI
platform/x86: ISST: Add Intel Speed Select mmio interface
platform/x86: ISST: Add IOCTL to Translate Linux logical CPU to PUNIT CPU number
...
The asm-generic changes for 5.3 consist of a cleanup series from
Christoph Hellwig, who explains:
"asm-generic/ptrace.h is a little weird in that it doesn't actually
implement any functionality, but it provided multiple layers of macros
that just implement trivial inline functions. We implement those
directly in the few architectures and be off with a much simpler
design."
Link: https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdKPURAAoJEJpsee/mABjZalgP/idFZKlL9jb32p9eVacW1ngm
CzwKk+49UBpLlimTh3ZtpiSJEHQyXP/QYlJ0/kV65YJriq5FsqlBrkPnWgsgMG9x
HBhJEEfVtXolXK3yNEsFIt/0j0Xh7+uCyBNZNuJrIRy/9x2z2nhBgDAenWPpZNuT
qjpArBAVEWQMsWgmgZUlCKOT7ziSx5+w1bfqiiUZDjwjqimPhLUBfoZmUWHtO49M
4/95RVOIMoLlIcaCUfqsvfkf7v6mfFAADhTrB/FZWVNX839fnpifqQL9BmOlgrEM
kxn5wM/dxRDwRT8+mVRyB8ax4/rIgMIFoaA7Hrv+hoUsiOVD7AkNXynZKQh1hhjl
449j68esoA6vlfdFIhagpKKTiQcWXJDbEgAoSJcM0WIl3JAjc+3nVWShTAAEW65r
Z+Bgy1OczoCsRXbYR/TwpThHj3197xMRQEluzaLnd5Zx5feUDUKuDcxhPpev/ceO
qmV5FeGqxRlZhJjVK8lmcHNZP0e4pkodwrNKC/2NIlIp6EKmMNI0nCjVqINigHGC
97Kc7N94WHdQ3tA7GB8YaUfd8w86W5ZOgRh+uuZ0brPziL1MR5lD/NvzjVSfyvVp
7UHNP7stNbavg20vDhlWGIsWiwoDlJf0YLUA6kXHryb9i/fh8sqWjz99QFu6QIfs
BTgeLtNP8hKhMkgew2XL
=jkfI
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"The asm-generic changes for 5.3 consist of a cleanup series to remove
ptrace.h from Christoph Hellwig, who explains:
'asm-generic/ptrace.h is a little weird in that it doesn't actually
implement any functionality, but it provided multiple layers of
macros that just implement trivial inline functions. We implement
those directly in the few architectures and be off with a much
simpler design.'
at https://lore.kernel.org/lkml/20190624054728.30966-1-hch@lst.de/"
* tag 'asm-generic-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: remove ptrace.h
x86: don't use asm-generic/ptrace.h
sh: don't use asm-generic/ptrace.h
powerpc: don't use asm-generic/ptrace.h
arm64: don't use asm-generic/ptrace.h
* support for chained PMU counters in guests
* improved SError handling
* handle Neoverse N1 erratum #1349291
* allow side-channel mitigation status to be migrated
* standardise most AArch64 system register accesses to msr_s/mrs_s
* fix host MPIDR corruption on 32bit
* selftests ckleanups
x86:
* PMU event {white,black}listing
* ability for the guest to disable host-side interrupt polling
* fixes for enlightened VMCS (Hyper-V pv nested virtualization),
* new hypercall to yield to IPI target
* support for passing cstate MSRs through to the guest
* lots of cleanups and optimizations
Generic:
* Some txt->rST conversions for the documentation
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdJzdIAAoJEL/70l94x66DQDoH/i83/8kX4I8AWDlushPru4ts
Q4lCE5VAPha+o4pLb1dtfFL3gTmSbsB1N++JSlqK3JOo6LphIOy6b0wBjQBbAa6U
3CT1dJaHJoScLLj09vyBlvClGUH2ZKEQTWOiquCCf7JfPofxwPUA6vJ7TYsdkckx
zR3ygbADWmnfS7hFfiqN3JzuYh9eoooGNWSU+Giq6VF41SiL3IqhBGZhWS0zE9c2
2c5lpqqdeHmAYNBqsyzNiDRKp7+zLFSmZ7Z5/0L755L8KYwR6F5beTnmBMHvb4lA
PWH/SWOC8EYR+PEowfrH+TxKZwp0gMn1kcAKjilHk0uCRwG1IzuHAr2jlNxICCk=
=t/Oq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- support for chained PMU counters in guests
- improved SError handling
- handle Neoverse N1 erratum #1349291
- allow side-channel mitigation status to be migrated
- standardise most AArch64 system register accesses to msr_s/mrs_s
- fix host MPIDR corruption on 32bit
- selftests ckleanups
x86:
- PMU event {white,black}listing
- ability for the guest to disable host-side interrupt polling
- fixes for enlightened VMCS (Hyper-V pv nested virtualization),
- new hypercall to yield to IPI target
- support for passing cstate MSRs through to the guest
- lots of cleanups and optimizations
Generic:
- Some txt->rST conversions for the documentation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (128 commits)
Documentation: virtual: Add toctree hooks
Documentation: kvm: Convert cpuid.txt to .rst
Documentation: virtual: Convert paravirt_ops.txt to .rst
KVM: x86: Unconditionally enable irqs in guest context
KVM: x86: PMU Event Filter
kvm: x86: Fix -Wmissing-prototypes warnings
KVM: Properly check if "page" is valid in kvm_vcpu_unmap
KVM: arm/arm64: Initialise host's MPIDRs by reading the actual register
KVM: LAPIC: Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane
kvm: LAPIC: write down valid APIC registers
KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s
KVM: doc: Add API documentation on the KVM_REG_ARM_WORKAROUNDS register
KVM: arm/arm64: Add save/restore support for firmware workaround state
arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests
KVM: arm/arm64: Support chained PMU counters
KVM: arm/arm64: Remove pmc->bitmask
KVM: arm/arm64: Re-create event when setting counter value
KVM: arm/arm64: Extract duplicated code to own function
KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functions
KVM: LAPIC: ARBPRI is a reserved register for x2APIC
...
- Rework some vmbus code to separate architecture specifics out to
arch/x86/. This is part of the work of adding arm64 support to Hyper-V.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAl0nfBEACgkQ3qZv95d3
LNyf8Q//fn+Eb+Pun3a5Cus8qYBQIYrVpt/bUmMuisV8A+MdRIOSwUU/oEbzCEwa
RhKKGKkNSkItKHz0QyqC9FwHZ18+6iLmSF1VwHzKVV+GJ1kQqI9+r4dC23yO/OCz
cuB6Uc+Jd7eLV9021U0jgmYI2HOLOfd5IUNqnyxuw6BPYYaJhrf3J9qSnlKmJ2tW
qhzSfKC2nn0+QbYSC5ww990o10vPMw3mcoXu23HKOU7Q2aqKkdK0HWjNrv49eent
NPbtjFoSABSVS+4eTzW1BrK2Rkwa+TVpAC75a4OPOd4KPUvibqCWjmEl8IE55Pre
BXwUoZnZN5cC1ayUGYuhyOmzFJtwxSWqgXlZRrmr+/XDgNk4k4V2bv253yFegZw2
1hMTbGwYrk4MrGNQB3YvnzJz8ObrTmNR98JvJz/bKPtHKmKmIbHcCWkR5gYAApuy
cY9PqKR5wWQ5A+leePzpXIXDtWlDHy7SDhUU14mbEj2bvQ3tC0yodHabpzl0Lx8e
feXGuqt/g5Lop+KGcRJj32tx23JBRkwfgIyagfRAdzVCs4urp9pt+tKezPkEpWW6
dd+Gc43Rnjd94neTCoo1seS9KH7n56ldnmeygMEDzI0rgW0q1m/Qwys0+6BIQR2G
lmcL1qJlquy3Fjfs1aYkkPqkQYoIqDwGQivSUgIDZ0TLBIsWAfY=
=wIj3
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyper-v updates from Sasha Levin:
- Add a module description to the Hyper-V vmbus module.
- Rework some vmbus code to separate architecture specifics out to
arch/x86/. This is part of the work of adding arm64 support to
Hyper-V.
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Break out ISA independent parts of mshyperv.h
drivers: hv: Add a module description line to the hv_vmbus driver
Most architectures have identical or very similar implementation of
pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and
pte_free().
Add a generic implementation that can be reused across architectures and
enable its use on x86.
The generic implementation uses
GFP_KERNEL | __GFP_ZERO
for the kernel page tables and
GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT
for the user page tables.
The "base" functions for PTE allocation, namely __pte_alloc_one_kernel()
and __pte_alloc_one() are intended for the architectures that require
additional actions after actual memory allocation or must use non-default
GFP flags.
x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and
pte_free().
x86 still implements pte_alloc_one() to allow run-time control of GFP
flags required for "userpte" command line option.
Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The split low/high access is the only non-READ_ONCE version of gup_get_pte
that did show up in the various arch implemenations. Lift it to common
code and drop the ifdef based arch override.
Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass in the already calculated end value instead of recomputing it, and
leave the end > start check in the callers instead of duplicating them in
the arch code.
Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Miller <davem@davemloft.net>
Cc: James Hogan <jhogan@kernel.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a new header to asm-generic to allow optionally instrumenting
architecture-specific asm implementations of bitops.
This change includes the required change for x86 as reference and
changes the kernel API doc to point to bitops-instrumented.h instead.
Rationale: the functions in x86's bitops.h are no longer the kernel API
functions, but instead the arch_ prefixed functions, which are then
instrumented via bitops-instrumented.h.
Other architectures can similarly add support for asm implementations of
bitops.
The documentation text was derived from x86 and existing bitops
asm-generic versions: 1) references to x86 have been removed; 2) as a
result, some of the text had to be reworded for clarity and consistency.
Tested using lib/test_kasan with bitops tests (pre-requisite patch).
Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439
Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Thomas Gleixner:
"A collection of assorted fixes:
- Fix for the pinned cr0/4 fallout which escaped all testing efforts
because the kvm-intel module was never loaded when the kernel was
compiled with CONFIG_PARAVIRT=n. The cr0/4 accessors are moved out
of line and static key is now solely used in the core code and
therefore can stay in the RO after init section. So the kvm-intel
and other modules do not longer reference the (read only) static
key which the module loader tried to update.
- Prevent an infinite loop in arch_stack_walk_user() by breaking out
of the loop once the return address is detected to be 0.
- Prevent the int3_emulate_call() selftest from corrupting the stack
when KASAN is enabled. KASASN clobbers more registers than covered
by the emulated call implementation. Convert the int3_magic()
selftest to a ASM function so the compiler cannot KASANify it.
- Unbreak the build with old GCC versions and with the Gold linker by
reverting the 'Move of _etext to the actual end of .text'. In both
cases the build fails with 'Invalid absolute R_X86_64_32S
relocation: _etext'
- Initialize the context lock for init_mm, which was never an issue
until the alternatives code started to use a temporary mm for
patching.
- Fix a build warning vs. the LOWMEM_PAGES constant where clang
complains rightfully about a signed integer overflow in the shift
operation by converting the operand to an ULL.
- Adjust the misnamed ENDPROC() of common_spurious in the 32bit entry
code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/stacktrace: Prevent infinite loop in arch_stack_walk_user()
x86/asm: Move native_write_cr0/4() out of line
x86/pgtable/32: Fix LOWMEM_PAGES constant
x86/alternatives: Fix int3_emulate_call() selftest stack corruption
x86/entry/32: Fix ENDPROC of common_spurious
Revert "x86/build: Move _etext to actual end of .text"
x86/ldt: Initialize the context lock for init_mm
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXSMhhgAKCRCRxhvAZXjc
or7kAP9VzDcQaK/WoDd2ezh2C7Wh5hNy9z/qJVCa6Tb+N+g1UgEAxbhFUg55uGOA
JNf7fGar5JF5hBMIXR+NqOi1/sb4swg=
=ELWo
-----END PGP SIGNATURE-----
Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3 system call from Christian Brauner:
"This adds the clone3 syscall which is an extensible successor to clone
after we snagged the last flag with CLONE_PIDFD during the 5.2 merge
window for clone(). It cleanly supports all of the flags from clone()
and thus all legacy workloads.
There are few user visible differences between clone3 and clone.
First, CLONE_DETACHED will cause EINVAL with clone3 so we can reuse
this flag. Second, the CSIGNAL flag is deprecated and will cause
EINVAL to be reported. It is superseeded by a dedicated "exit_signal"
argument in struct clone_args thus freeing up even more flags. And
third, clone3 gives CLONE_PIDFD a dedicated return argument in struct
clone_args instead of abusing CLONE_PARENT_SETTID's parent_tidptr
argument.
The clone3 uapi is designed to be easy to handle on 32- and 64 bit:
/* uapi */
struct clone_args {
__aligned_u64 flags;
__aligned_u64 pidfd;
__aligned_u64 child_tid;
__aligned_u64 parent_tid;
__aligned_u64 exit_signal;
__aligned_u64 stack;
__aligned_u64 stack_size;
__aligned_u64 tls;
};
and a separate kernel struct is used that uses proper kernel typing:
/* kernel internal */
struct kernel_clone_args {
u64 flags;
int __user *pidfd;
int __user *child_tid;
int __user *parent_tid;
int exit_signal;
unsigned long stack;
unsigned long stack_size;
unsigned long tls;
};
The system call comes with a size argument which enables the kernel to
detect what version of clone_args userspace is passing in. clone3
validates that any additional bytes a given kernel does not know about
are set to zero and that the size never exceeds a page.
A nice feature is that this patchset allowed us to cleanup and
simplify various core kernel codepaths in kernel/fork.c by making the
internal _do_fork() function take struct kernel_clone_args even for
legacy clone().
This patch also unblocks the time namespace patchset which wants to
introduce a new CLONE_TIMENS flag.
Note, that clone3 has only been wired up for x86{_32,64}, arm{64}, and
xtensa. These were the architectures that did not require special
massaging.
Other architectures treat fork-like system calls individually and
after some back and forth neither Arnd nor I felt confident that we
dared to add clone3 unconditionally to all architectures. We agreed to
leave this up to individual architecture maintainers. This is why
there's an additional patch that introduces __ARCH_WANT_SYS_CLONE3
which any architecture can set once it has implemented support for
clone3. The patch also adds a cond_syscall(clone3) for architectures
such as nios2 or h8300 that generate their syscall table by simply
including asm-generic/unistd.h. The hope is to get rid of
__ARCH_WANT_SYS_CLONE3 and cond_syscall() rather soon"
* tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
arch: handle arches who do not yet define clone3
arch: wire-up clone3() syscall
fork: add clone3
- Add support for chained PMU counters in guests
- Improve SError handling
- Handle Neoverse N1 erratum #1349291
- Allow side-channel mitigation status to be migrated
- Standardise most AArch64 system register accesses to msr_s/mrs_s
- Fix host MPIDR corruption on 32bit
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl0kge4VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDYyQP/3XY5tFcLKkp/h9rnGaCXwAxhNzn
TyF/IZEFBKFTSoDMXKLLc8KllvoPQ7aUl03heYbuayYpyKR1+LCx7lDwu1MYyEf+
aSSuOKlbG//tLUEGp09pTRCgjs2mhhZYqOj5GF2mZ7xpovFVSNOPzTazbXDNQ7tw
zUAs43YNg+bUMwj+SLWpBlizjrLr7T34utIr6daKJE/GSfmIrcYXhGbZqUh0zbO0
z5LNasebws8/pHyeGI7+/yoMIKaQ8foMgywTpsRpBsx6YI+AbOLjEmCk2IBOPcEK
pm9KkSIBZEO2CSxZKl3NQiEow/Qd/lnz2xLMCSfh4XrYoI2Th4gNcsbJpiBDWP5a
0eZ5jSiexxKngIbM+to7jR3m0yc9RgcuzceJg3Uly7Ya0vb5RqKwOX4Ge4XP4VDT
DzIVFdQjxDKdVIf3EvGp1cj4P7dRUU3xbZcbzyuRPEmT3vgjEnbxawmPLs3QMAl1
31Wd2wIsPB86kSxzSMel27Vs5VgMhgyHE26zN91R745CvhDXaDKydIWjGjdVMHsB
GuX/h2kL+ohx+N/OpZPgwsVUAGLSOQFP3pE/EcGtqc2kkfqa+bx12DKcZ3zdmJvy
+cu5ixU8q5thPH/pZob/C3hKUY/eLy02emS34RK0Jh2sZHbQgAOtMsiqUxNHEjUm
6TkpdWa5SRd7CtGV
=yfCs
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.3
- Add support for chained PMU counters in guests
- Improve SError handling
- Handle Neoverse N1 erratum #1349291
- Allow side-channel mitigation status to be migrated
- Standardise most AArch64 system register accesses to msr_s/mrs_s
- Fix host MPIDR corruption on 32bit
Some events can provide a guest with information about other guests or the
host (e.g. L3 cache stats); providing the capability to restrict access
to a "safe" set of events would limit the potential for the PMU to be used
in any side channel attacks. This change introduces a new VM ioctl that
sets an event filter. If the guest attempts to program a counter for
any blacklisted or non-whitelisted event, the kernel counter won't be
created, so any RDPMC/RDMSR will show 0 instances of that event.
Signed-off-by: Eric Hankland <ehankland@google.com>
[Lots of changes. All remaining bugs are probably mine. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading
the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n.
The reason is that the static key which controls the pinning is marked RO
after init. The kvm_intel module contains a CR4 write which requires to
update the static key entry list. That obviously does not work when the key
is in a RO section.
With CONFIG_PARAVIRT enabled this does not happen because the CR4 write
uses the paravirt indirection and the actual write function is built in.
As the key is intended to be immutable after init, move
native_write_cr0/4() out of line.
While at it consolidate the update of the cr4 shadow variable and store the
value right away when the pinning is initialized on a booting CPU. No point
in reading it back 20 instructions later. This allows to confine the static
key and the pinning variable to cpu/common and allows to mark them static.
Fixes: 8dbec27a24 ("x86/asm: Pin sensitive CR0 bits")
Fixes: 873d50d58f ("x86/asm: Pin sensitive CR4 bits")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Xi Ruoyao <xry111@mengyan1223.wang>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Xi Ruoyao <xry111@mengyan1223.wang>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de
clang points out that the computation of LOWMEM_PAGES causes a signed
integer overflow on 32-bit x86:
arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]
(PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT);
^~~~~~~~~~~~
arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES'
#define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT))
~^ ~~
arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE'
#define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD)
Use the _ULL() macro to make it a 64-bit constant.
Fixes: 1e620f9b23 ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with other
trees, unfortunately. He has a lot more of these waiting on the wings
that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos, and one
on Spectre vulnerabilities.
- Various improvements to the build system, including automatic markup of
function() references because some people, for reasons I will never
understand, were of the opinion that :c:func:``function()`` is
unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl0krAEPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Yg98H/AuLqO9LpOgUjF4LhyjxGPdzJkY9RExSJ7km
gznyreLCZgFaJR+AY6YDsd4Jw6OJlPbu1YM/Qo3C3WrZVFVhgL/s2ebvBgCo50A8
raAFd8jTf4/mGCHnAqRotAPQ3mETJUk315B66lBJ6Oc+YdpRhwXWq8ZW2bJxInFF
3HDvoFgMf0KhLuMHUkkL0u3fxH1iA+KvDu8diPbJYFjOdOWENz/CV8wqdVkXRSEW
DJxIq89h/7d+hIG3d1I7Nw+gibGsAdjSjKv4eRKauZs4Aoxd1Gpl62z0JNk6aT3m
dtq4joLdwScydonXROD/Twn2jsu4xYTrPwVzChomElMowW/ZBBY=
=D0eO
-----END PGP SIGNATURE-----
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Pull x865 kdump updates from Thomas Gleixner:
"Yet more kexec/kdump updates:
- Properly support kexec when AMD's memory encryption (SME) is
enabled
- Pass reserved e820 ranges to the kexec kernel so both PCI and SME
can work"
* 'x86-kdump-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active
x86/kexec: Set the C-bit in the identity map page table when SEV is active
x86/kexec: Do not map kexec area as decrypted when SEV is active
x86/crash: Add e820 reserved ranges to kdump kernel's e820 table
x86/mm: Rework ioremap resource mapping determination
x86/e820, ioport: Add a new I/O resource descriptor IORES_DESC_RESERVED
x86/mm: Create a workarea in the kernel for SME early encryption
x86/mm: Identify the end of the kernel area to be reserved
The mutex mm->context->lock for init_mm is not initialized for init_mm.
This wasn't a problem because it remained unused. This changed however
since commit
4fc19708b1 ("x86/alternatives: Initialize temporary mm for patching")
Initialize the mutex for init_mm.
Fixes: 4fc19708b1 ("x86/alternatives: Initialize temporary mm for patching")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20190701173354.2pe62hhliok2afea@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull force_sig() argument change from Eric Biederman:
"A source of error over the years has been that force_sig has taken a
task parameter when it is only safe to use force_sig with the current
task.
The force_sig function is built for delivering synchronous signals
such as SIGSEGV where the userspace application caused a synchronous
fault (such as a page fault) and the kernel responded with a signal.
Because the name force_sig does not make this clear, and because the
force_sig takes a task parameter the function force_sig has been
abused for sending other kinds of signals over the years. Slowly those
have been fixed when the oopses have been tracked down.
This set of changes fixes the remaining abusers of force_sig and
carefully rips out the task parameter from force_sig and friends
making this kind of error almost impossible in the future"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
signal: Remove the signal number and task parameters from force_sig_info
signal: Factor force_sig_info_to_task out of force_sig_info
signal: Generate the siginfo in force_sig
signal: Move the computation of force into send_signal and correct it.
signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
signal: Remove the task parameter from force_sig_fault
signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
signal: Explicitly call force_sig_fault on current
signal/unicore32: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from __do_user_fault
signal/arm: Remove tsk parameter from ptrace_break
signal/nds32: Remove tsk parameter from send_sigtrap
signal/riscv: Remove tsk parameter from do_trap
signal/sh: Remove tsk parameter from force_sig_info_fault
signal/um: Remove task parameter from send_sigtrap
signal/x86: Remove task parameter from send_sigtrap
signal: Remove task parameter from force_sig_mceerr
signal: Remove task parameter from force_sig
signal: Remove task parameter from force_sigsegv
...
Pull x86 topology updates from Ingo Molnar:
"Implement multi-die topology support on Intel CPUs and expose the die
topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.
These changes should have no effect on the kernel's existing
understanding of topologies, i.e. there should be no behavioral impact
on cache, NUMA, scheduler, perf and other topologies and overall
system performance"
* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support
perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support
hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages
perf/x86/intel/cstate: Support multi-die/package
perf/x86/intel/rapl: Support multi-die/package
perf/x86/intel/uncore: Support multi-die/package
topology: Create core_cpus and die_cpus sysfs attributes
topology: Create package_cpus sysfs attribute
hwmon/coretemp: Support multi-die/package
powercap/intel_rapl: Update RAPL domain name and debug messages
thermal/x86_pkg_temp_thermal: Support multi-die/package
powercap/intel_rapl: Support multi-die/package
powercap/intel_rapl: Simplify rapl_find_package()
x86/topology: Define topology_logical_die_id()
x86/topology: Define topology_die_id()
cpu/topology: Export die_id
x86/topology: Create topology_max_die_per_package()
x86/topology: Add CPUID.1F multi-die/package support
Pull x86 platform updayes from Ingo Molnar:
"Most of the commits add ACRN hypervisor guest support, plus two
cleanups"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/jailhouse: Mark jailhouse_x2apic_available() as __init
x86/platform/geode: Drop <linux/gpio.h> includes
x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
x86: Add support for Linux guests on an ACRN hypervisor
x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol
Pull x86 paravirt updates from Ingo Molnar:
"A handful of paravirt patching code enhancements to make it more
robust against patching failures, and related cleanups and not so
related cleanups - by Thomas Gleixner and myself"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Rename paravirt_patch_site::instrtype to paravirt_patch_site::type
x86/paravirt: Standardize 'insn_buff' variable names
x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering
x86/paravirt: Replace the paravirt patch asm magic
x86/paravirt: Unify the 32/64 bit paravirt patching code
x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()
x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns()
x86/paravirt: Remove bogus extern declarations
Pull x86 asm updates from Ingo Molnar:
"Most of the changes relate to Peter Zijlstra's cleanup of ptregs
handling, in particular the i386 part is now much simplified and
standardized - no more partial ptregs stack frames via the esp/ss
oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump
and kgdb.
There's also a CR4 hardening enhancements by Kees Cook, to make the
generic platform functions such as native_write_cr4() less useful as
ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0
against similar attacks.
The rest is smaller cleanups/fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Add int3_emulate_call() selftest
x86/stackframe/32: Allow int3_emulate_push()
x86/stackframe/32: Provide consistent pt_regs
x86/stackframe, x86/ftrace: Add pt_regs frame annotations
x86/stackframe, x86/kprobes: Fix frame pointer annotations
x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
x86/entry/32: Clean up return from interrupt preemption path
x86/asm: Pin sensitive CR0 bits
x86/asm: Pin sensitive CR4 bits
Documentation/x86: Fix path to entry_32.S
x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:
"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:
40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
After the patchset, they became:
40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.
Another microbenchmark shows how across the spectrum the
improvements are:
"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:
# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"
The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.
- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.
- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.
- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.
- A set of percpu ops cleanups and fixes by Peter Zijlstra.
- Misc other changes - please see the Git log for more details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...
Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Pull x86 pti updates from Thomas Gleixner:
"The speculative paranoia departement delivers a few more plugs for
possible (probably theoretical) spectre/mds leaks"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
x86/speculation/mds: Eliminate leaks by trace_hardirqs_on()
Pull x86 timer updates from Thomas Gleixner:
"A rather large series consolidating the HPET code, which was triggered
by the attempt to bolt HPET NMI watchdog support on to the existing
maze with the usual duct tape and super glue approach.
This mainly removes two separate partially redundant storage layers
and consolidates them into a single one which provides a consistent
view of the different HPET channels and their usage and allows to
integrate HPET NMI watchdog support (if it turns out to be feasible)
in a non intrusive way"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/hpet: Use channel for legacy clockevent storage
x86/hpet: Use common init for legacy clockevent
x86/hpet: Carve out shareable parts of init_one_hpet_msi_clockevent()
x86/hpet: Consolidate clockevent functions
x86/hpet: Wrap legacy clockevent in hpet_channel
x86/hpet: Use cached info instead of extra flags
x86/hpet: Move clockevents into channels
x86/hpet: Rename variables to prepare for switching to channels
x86/hpet: Add function to select a /dev/hpet channel
x86/hpet: Add mode information to struct hpet_channel
x86/hpet: Use cached channel data
x86/hpet: Introduce struct hpet_base and struct hpet_channel
x86/hpet: Coding style cleanup
x86/hpet: Clean up comments
x86/hpet: Make naming consistent
x86/hpet: Remove not required includes
x86/hpet: Decapitalize and rename EVT_TO_HPET_DEV
x86/hpet: Simplify counter validation
x86/hpet: Separate counter check out of clocksource register code
x86/hpet: Shuffle code around for readability sake
...
Pull x86 CPU feature updates from Thomas Gleixner:
"Updates for x86 CPU features:
- Support for UMWAIT/UMONITOR, which allows to use MWAIT and MONITOR
instructions in user space to save power e.g. in HPC workloads
which spin wait on synchronization points.
The maximum time a MWAIT can halt in userspace is controlled by the
kernel and can be adjusted by the sysadmin.
- Speed up the MTRR handling code on CPUs which support cache
self-snooping correctly.
On those CPUs the wbinvd() invocations can be omitted which speeds
up the MTRR setup by a factor of 50.
- Support for the new x86 vendor Zhaoxin who develops processors
based on the VIA Centaur technology.
- Prevent 'cat /proc/cpuinfo' from affecting isolated NOHZ_FULL CPUs
by sending IPIs to retrieve the CPU frequency and use the cached
values instead.
- The addition and late revert of the FSGSBASE support. The revert
was required as it turned out that the code still has hard to
diagnose issues. Yet another engineering trainwreck...
- Small fixes, cleanups, improvements and the usual new Intel CPU
family/model addons"
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
x86/fsgsbase: Revert FSGSBASE support
selftests/x86/fsgsbase: Fix some test case bugs
x86/entry/64: Fix and clean up paranoid_exit
x86/entry/64: Don't compile ignore_sysret if 32-bit emulation is enabled
selftests/x86: Test SYSCALL and SYSENTER manually with TF set
x86/mtrr: Skip cache flushes on CPUs with cache self-snooping
x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata
Documentation/ABI: Document umwait control sysfs interfaces
x86/umwait: Add sysfs interface to control umwait maximum time
x86/umwait: Add sysfs interface to control umwait C0.2 state
x86/umwait: Initialize umwait control values
x86/cpufeatures: Enumerate user wait instructions
x86/cpu: Disable frequency requests via aperfmperf IPI for nohz_full CPUs
x86/acpi/cstate: Add Zhaoxin processors support for cache flush policy in C3
ACPI, x86: Add Zhaoxin processors support for NONSTOP TSC
x86/cpu: Create Zhaoxin processors architecture support file
x86/cpu: Split Tremont based Atoms from the rest
Documentation/x86/64: Add documentation for GS/FS addressing mode
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit
...
Pull x86 FPU updates from Thomas Gleixner:
"A small set of updates for the FPU code:
- Make the no387/nofxsr command line options useful by restricting
them to 32bit and actually clearing all dependencies to prevent
random crashes and malfunction.
- Simplify and cleanup the kernel_fpu_*() helpers"
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Inline fpu__xstate_clear_all_cpu_caps()
x86/fpu: Make 'no387' and 'nofxsr' command line options useful
x86/fpu: Remove the fpu__save() export
x86/fpu: Simplify kernel_fpu_begin()
x86/fpu: Simplify kernel_fpu_end()
Pull x86 vsyscall updates from Thomas Gleixner:
"Further hardening of the legacy vsyscall by providing support for
execute only mode and switching the default to it.
This prevents a certain class of attacks which rely on the vsyscall
page being accessible at a fixed address in the canonical kernel
address space"
* 'x86-entry-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/x86: Add a test for process_vm_readv() on the vsyscall page
x86/vsyscall: Add __ro_after_init to global variables
x86/vsyscall: Change the default vsyscall mode to xonly
selftests/x86/vsyscall: Verify that vsyscall=none blocks execution
x86/vsyscall: Document odd SIGSEGV error code for vsyscalls
x86/vsyscall: Show something useful on a read fault
x86/vsyscall: Add a new vsyscall=xonly mode
Documentation/admin: Remove the vsyscall=native documentation