Pull livepatch updates from Jiri Kosina:
- a per-task consistency model is being added for architectures that
support reliable stack dumping (extending this, currently rather
trivial set, is currently in the works).
This extends the nature of the types of patches that can be applied
by live patching infrastructure. The code stems from the design
proposal made [1] back in November 2014. It's a hybrid of SUSE's
kGraft and RH's kpatch, combining advantages of both: it uses
kGraft's per-task consistency and syscall barrier switching combined
with kpatch's stack trace switching. There are also a number of
fallback options which make it quite flexible.
Most of the heavy lifting done by Josh Poimboeuf with help from
Miroslav Benes and Petr Mladek
[1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
- module load time patch optimization from Zhou Chengming
- a few assorted small fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: add missing printk newlines
livepatch: Cancel transition a safe way for immediate patches
livepatch: Reduce the time of finding module symbols
livepatch: make klp_mutex proper part of API
livepatch: allow removal of a disabled patch
livepatch: add /proc/<pid>/patch_state
livepatch: change to a per-task consistency model
livepatch: store function sizes
livepatch: use kstrtobool() in enabled_store()
livepatch: move patching functions into patch.c
livepatch: remove unnecessary object loaded check
livepatch: separate enabled and patched states
livepatch/s390: add TIF_PATCH_PENDING thread flag
livepatch/s390: reorganize TIF thread flag bits
livepatch/powerpc: add TIF_PATCH_PENDING thread flag
livepatch/x86: add TIF_PATCH_PENDING thread flag
livepatch: create temporary klp_update_patch_state() stub
x86/entry: define _TIF_ALLWORK_MASK flags explicitly
stacktrace/x86: add function for detecting reliable stack traces
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Pull s390 updates from Martin Schwidefsky:
- three merges for KVM/s390 with changes for vfio-ccw and cpacf. The
patches are included in the KVM tree as well, let git sort it out.
- add the new 'trng' random number generator
- provide the secure key verification API for the pkey interface
- introduce the z13 cpu counters to perf
- add a new system call to set up the guarded storage facility
- simplify TASK_SIZE and arch_get_unmapped_area
- export the raw STSI data related to CPU topology to user space
- ... and the usual churn of bug-fixes and cleanups.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits)
s390/crypt: use the correct module alias for paes_s390.
s390/cpacf: Introduce kma instruction
s390/cpacf: query instructions use unique parameters for compatibility with KMA
s390/trng: Introduce s390 TRNG device driver.
s390/crypto: Provide s390 specific arch random functionality.
s390/crypto: Add new subfunctions to the cpacf PRNO function.
s390/crypto: Renaming PPNO to PRNO.
s390/pageattr: avoid unnecessary page table splitting
s390/mm: simplify arch_get_unmapped_area[_topdown]
s390/mm: make TASK_SIZE independent from the number of page table levels
s390/gs: add regset for the guarded storage broadcast control block
s390/kvm: Add use_cmma field to mm_context_t
s390/kvm: Add PGSTE manipulation functions
vfio: ccw: improve error handling for vfio_ccw_mdev_remove
vfio: ccw: remove unnecessary NULL checks of a pointer
s390/spinlock: remove compare and delay instruction
s390/spinlock: use atomic primitives for spinlocks
s390/cpumf: simplify detection of guest samples
s390/pci: remove forward declaration
s390/pci: increase the PCI_NR_FUNCTIONS default
...
Pull x86 mm updates from Ingo Molnar:
"The main x86 MM changes in this cycle were:
- continued native kernel PCID support preparation patches to the TLB
flushing code (Andy Lutomirski)
- various fixes related to 32-bit compat syscall returning address
over 4Gb in applications, launched from 64-bit binaries - motivated
by C/R frameworks such as Virtuozzo. (Dmitry Safonov)
- continued Intel 5-level paging enablement: in particular the
conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)
- x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)
- ... plus misc updates, fixes and cleanups"
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
x86/mm: Fix flush_tlb_page() on Xen
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm/64: Fix crash in remove_pagetable()
Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
x86/boot/e820: Remove a redundant self assignment
x86/mm: Fix dump pagetables for 4 levels of page tables
x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
x86/espfix: Add support for 5-level paging
x86/kasan: Extend KASAN to support 5-level paging
x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
x86/paravirt: Add 5-level support to the paravirt code
x86/mm: Define virtual memory map for 5-level paging
x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
x86/boot: Detect 5-level paging support
x86/mm/numa: Remove numa_nodemask_from_meminfo()
...
Pull x86 asm updates from Ingo Molnar:
"The main changes in this cycle were:
- unwinder fixes and enhancements
- improve ftrace interaction with the unwinder
- optimize the code footprint of WARN() and related debugging
constructs
- ... plus misc updates, cleanups and fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/unwind: Dump all stacks in unwind_dump()
x86/unwind: Silence more entry-code related warnings
x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder
x86/unwind: Remove unused 'sp' parameter in unwind_dump()
x86/unwind: Prepend hex mask value with '0x' in unwind_dump()
x86/unwind: Properly zero-pad 32-bit values in unwind_dump()
x86/unwind: Ensure stack pointer is aligned
debug: Avoid setting BUGFLAG_WARNING twice
x86/unwind: Silence entry-related warnings
x86/unwind: Read stack return address in update_stack_state()
x86/unwind: Move common code into update_stack_state()
debug: Fix __bug_table[] in arch linker scripts
debug: Add _ONCE() logic to report_bug()
x86/debug: Define BUG() again for !CONFIG_BUG
x86/debug: Implement __WARN() using UD0
x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o
x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set
x86/ftrace: Clean up ftrace_regs_caller
x86/ftrace: Add stack frame pointer to ftrace_caller
x86/ftrace: Move the ftrace specific code out of entry_32.S
...
Pull timer updates from Thomas Gleixner:
"The timer departement delivers:
- more year 2038 rework
- a massive rework of the arm achitected timer
- preparatory patches to allow NTP correction of clock event devices
to avoid early expiry
- the usual pile of fixes and enhancements all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (91 commits)
timer/sysclt: Restrict timer migration sysctl values to 0 and 1
arm64/arch_timer: Mark errata handlers as __maybe_unused
Clocksource/mips-gic: Remove redundant non devicetree init
MIPS/Malta: Probe gic-timer via devicetree
clocksource: Use GENMASK_ULL in definition of CLOCKSOURCE_MASK
acpi/arm64: Add SBSA Generic Watchdog support in GTDT driver
clocksource: arm_arch_timer: add GTDT support for memory-mapped timer
acpi/arm64: Add memory-mapped timer support in GTDT driver
clocksource: arm_arch_timer: simplify ACPI support code.
acpi/arm64: Add GTDT table parse driver
clocksource: arm_arch_timer: split MMIO timer probing.
clocksource: arm_arch_timer: add structs to describe MMIO timer
clocksource: arm_arch_timer: move arch_timer_needs_of_probing into DT init call
clocksource: arm_arch_timer: refactor arch_timer_needs_probing
clocksource: arm_arch_timer: split dt-only rate handling
x86/uv/time: Set ->min_delta_ticks and ->max_delta_ticks
unicore32/time: Set ->min_delta_ticks and ->max_delta_ticks
um/time: Set ->min_delta_ticks and ->max_delta_ticks
tile/time: Set ->min_delta_ticks and ->max_delta_ticks
score/time: Set ->min_delta_ticks and ->max_delta_ticks
...
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
For automatic module loading (e.g. as it is used with cryptsetup)
an alias "paes" for the paes_s390 kernel module is needed.
Correct the paes_s390 module alias from "aes-all" to "paes".
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide a kma instruction definition for use by callers of __cpacf_query.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The new KMA instruction requires unique parameters. Update __cpacf_query to
generate a compatible assembler instruction.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Acked-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch introduces s390 specific arch random functionality.
There exists a generic kernel API for arch specific random
number implementation (see include/linux/random.h). Here
comes the header file and a very small static code part
implementing the arch_random_* API based on the TRNG
subfunction coming with the reworked PRNG instruction.
The arch random implementation hooks into the kernel
initialization and checks for availability of the TRNG
function. In accordance to the arch random API all functions
return false if the TRNG is not available. Otherwise the new
high quality entropy source provides fresh random on each
invocation.
The s390 arch random feature build is controlled via
CONFIG_ARCH_RANDOM. This config option located in
arch/s390/Kconfig is enabled by default and appears
as entry "s390 architectural random number generation API"
in the submenu "Processor type and features" for s390 builds.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is a new TRNG extension in the subcodes for the cpacf
PRNO function. This patch introduces new defines and a new
cpacf_trng inline function to provide these new features for
other kernel code parts.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The PPNO (Perform Pseudorandom Number Operation) instruction
has been renamed to PRNO (Perform Random Number Operation).
To avoid confusion and conflicts with future extensions with
this instruction (like e.g. provide a true random number
generator) this patch renames all occurences in cpacf.h and
adjusts the only exploiter code which is the prng device
driver and one line in the s390 kvm feature check.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The kernel page table splitting code will split page tables even for
features the CPU does not support. E.g. a CPU may not support the NX
feature.
In order to avoid this, remove those bits from the flags parameter
that correlate with unsupported CPU features within __set_memory(). In
addition add an early exit if the flags parameter does not have any
bits set afterwards.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With TASK_SIZE now reflecting the maximum size of the address space for
a process the code for arch_get_unmapped_area[_topdown] can be simplified.
Just let the logic pick a suitable address and deal with the page table
upgrade after the address has been selected.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The TASK_SIZE for a process should be maximum possible size of the address
space, 2GB for a 31-bit process and 8PB for a 64-bit process. The number
of page table levels required for a given memory layout is a consequence
of the mapped memory areas and their location.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Both conflict were simple overlapping changes.
In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.
Signed-off-by: David S. Miller <davem@davemloft.net>
The guarded storage interface allows to register a control block for
each thread that is activated with the guarded storage broadcast event.
To retrieve the complete state of a process from the kernel a register
set for the stored broadcast control block is required.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add use_cmma field to mm_context_t, like we do for storage keys.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Acked-by: Janosch Frank <frankja@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add PGSTE manipulation functions:
* set_pgste_bits sets specific bits in a PGSTE
* get_pgste returns the whole PGSTE
* pgste_perform_essa manipulates a PGSTE to set specific storage states
* ESSA_[SG]ET_* macros used to indicate the action for manipulate_pgste
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Conflicts were simply overlapping changes. In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.
In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for making the clockevents core NTP correction aware,
all clockevent device drivers must set ->min_delta_ticks and
->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
clockevent device's rate is going to change dynamically and thus, the
ratio of ns to ticks ceases to stay invariant.
Currently, the s390's CPU timer clockevent device is initialized as
follows:
cd->min_delta_ns = 1;
cd->max_delta_ns = LONG_MAX;
Note that the device's time to cycle conversion factor, i.e.
cd->mult / (2^cd->shift), is approx. equal to 4.
Hence, this would translate to
cd->min_delta_ticks = 4;
cd->max_delta_ticks = 4 * LONG_MAX;
However, a minimum value of 1ns is in the range of noise anyway and the
clockevent core will take care of this by increasing it to 1us or so.
Furthermore, 4*LONG_MAX would overflow the unsigned long argument the
clockevent devices gets programmed with.
Thus, initialize ->min_delta_ticks with 1 and ->max_delta_ticks with
ULONG_MAX.
This patch alone doesn't introduce any change in functionality as the
clockevents core still looks exclusively at the (untouched) ->min_delta_ns
and ->max_delta_ns. As soon as this has changed, a followup patch will
purge the initialization of ->min_delta_ns and ->max_delta_ns from this
driver.
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The CAD instruction never worked quite as expected for the spinlock
code. It has been disabled by default with git commit 61b0b01686,
if the "cad" kernel parameter is specified it is enabled for both user
space and the spinlock code. Leave the option to enable the instruction
for user space but remove it from the spinlock code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a couple more __atomic_xxx function to atomic_ops.h and use them
to replace the compare-and-swap inlines in the spinlock code. This
changes the type of the lock value from unsigned int to int.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On heavy paging with KSM I see guest data corruption. Turns out that
KSM will add pages to its tree, where the mapping return true for
pte_unused (or might become as such later). KSM will unmap such pages
and reinstantiate with different attributes (e.g. write protected or
special, e.g. in replace_page or write_protect_page)). This uncovered
a bug in our pagetable handling: We must remove the unused flag as
soon as an entry becomes present again.
Cc: stable@vger.kernel.org
Signed-of-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a new getsockopt operation to retrieve the socket cookie
for a specific socket based on the socket fd. It returns a unique
non-decreasing cookie for each socket.
Tested: https://android-review.googlesource.com/#/c/358163/
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJY5/X8AAoJEED/6hsPKofo8hQH/As3CbihZMysaK6JJTx5oMZw
b3W8p8xVXVu4dKM8WnXa6m5xBDFmOa7eBB+CtT3gP68XnFvMpr/vPmDv6v6i9p8q
7VyALDqqk2fxDmgHEwuETw9XZyuhdyCz/GaINCdnAJs25wTFOA7r0WEW5W8qRJpA
9nQirapdJcknymIch1JqeWlYYmbIaFzT8jItfA9QQ7F9mG4pxC8D1k2D56lNYwTf
FJIgXgkMPe7CPDXmgc/KqT5+iVsc/+SgzP/WdH6bX/007TV71sksxxfz6fIrao0X
RtcL2WIZTXBdSNrvXflHhCfYgogPgCnYp8AsYTIa+IEijcfteJx7UiET47Ne0Ow=
=/SPG
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- Fix a problem with GICv3 userspace save/restore
- Clarify GICv2 userspace save/restore ABI
- Be more careful in clearing GIC LRs
- Add missing synchronization primitive to our MMU handling code
PPC:
- Check for a NULL return from kzalloc
s390:
- Prevent translation exception errors on valid page tables for the
instruction-exection-protection support
x86:
- Fix Page-Modification Logging when running a nested guest"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
KVM: nVMX: initialize PML fields in vmcs02
KVM: nVMX: do not leak PML full vmexit to L1
KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
KVM: arm64: Ensure LRs are clear when they should be
kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
KVM: s390: remove change-recording override support
arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).
Signed-off-by: David S. Miller <davem@davemloft.net>
There are three different code levels in regard to the identification
of guest samples. They differ in the way the LPP instruction is used.
1) Old kernels without the LPP instruction. The guest program parameter
is always zero.
2) Newer kernels load the process pid into the program parameter with LPP.
The guest program parameter is non-zero if the guest executes in a
process != idle.
3) The latest kernels load ((1UL << 31) | pid) with LPP to make the value
non-zero even for the idle task. The guest program parameter is non-zero
if the guest is running.
All kernels load the process pid to CR4 on context switch. The CPU sampling
code uses the value in CR4 to decide between guest and host samples in case
the guest program parameter is zero. The three cases:
1) CR4==pid, gpp==0
2) CR4==pid, gpp==pid
3) CR4==pid, gpp==((1UL << 31) | pid)
The load-control instruction to load the pid into CR4 is expensive and the
goal is to remove it. To distinguish the host CR4 from the guest pid for
the idle process the maximum value 0xffff for the PASN is used.
This adds a fourth case for a guest OS with an updated kernel:
4) CR4==0xffff, gpp=((1UL << 31) | pid)
The host kernel will have CR4==0xffff and will use (gpp!=0 || CR4!==0xffff)
to identify guest samples. This works nicely with all 4 cases, the only
possible issue would be a guest with an old kernel (gpp==0) and a process
pid of 0xffff. Well, don't do that..
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move a struct definition to get rid of a forward declaration.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Users complained that they are hitting the limit of 64 functions (some
physical functions can spawn lots of virtual functions). Double the
default limit. With the latest savings in static data usage this
increases the image size by only 520 bytes.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit c506fff3d3 ("s390/pci: resize iomap") reduced the iomap
to NR_FUNCTIONS * PCI_BAR_COUNT elements. Since we only support
functions with 64bit BARs we can cut that number in half.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Address space identifiers are already defined in <asm/pci_insn.h>.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
barsize was never used. Get rid of it.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The 32-bit lctl instruction is quite a bit slower than the 64-bit
counter part lctlg. Use the faster instruction.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is a fix that prevents translation exception errors
on valid page tables for the instruction-exection-protection
support. This feature was added during the 4.11 merge window.
We have to remove an old check that would trigger if the
change-recording override is not available (e.g. edat1 disabled
via cpu model).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJY4ktxAAoJEBF7vIC1phx8lroP/3miwUoUReBrQ4s4dFmMdRs9
YJ7y9uqHFNlGHiyLANEjQN5BQzyCZ7SeTvzQqvnBojhVVWV1ylFRIQltopJ4Hd3+
DIApcUXxrSLfSv3rXih/g1/RS7ZbN+HK8jb+uHj47drs96Y6g9/qomea060eIZiV
6WU7MYcf9hf+xVM/r8Hjo40iz+LFKk5qIqEGOGjR8bRvIdjJWyFYjEShyuX+O7F9
F83XryU386xJjLcDgpKIRmJpQ9h0a32e0TLxKSzh5YElIirmeUCNDrHkuBcljHBm
tUp3P2ST6XRA1KtybLedNSS00HP+2pZsq2pZPmm/ACsKQtSItm1VUHjO13SUI1sB
RHlJvsObMtSPbmlbctmTFtJ1L87BTygGTepYTwZIp9LzlmX0ddC7pCJHO+9OJlL1
N8kPmK1P8evVEntO0ej6R7JRPw/kLBb7+gXHBsMifqQQT7akVLMZKolv9z2yYms2
s1rc2XtbS/9QrexqsGu74dWrnjoTfd+SFslMNJK1GDOhgKljbZmU4Ca6wnsopTds
1JGf4Rc++pFANpxKOhkhavo4m/uDd7T6mCD4yPuhJA0lnGMBYUb26ycpL3gbeA1S
PWo1quee1fmGcBjpDHJElyvc5DcgcaAm5b58nBwCusiRMUSmmMt+6sBEiH+n873h
zwDoIwU9K/D0ogyEhvJD
=G97j
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
From: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: Fix instruction-execution-protection/change-recording override
This is a fix that prevents translation exception errors
on valid page tables for the instruction-exection-protection
support. This feature was added during the 4.11 merge window.
We have to remove an old check that would trigger if the
change-recording override is not available (e.g. edat1 disabled
via cpu model).
Pull s390 fixes from Martin Schwidefsky:
"Four bug fixes, two of them for stable:
- avoid initrd corruptions in the kernel decompressor
- prevent inconsistent dumps if the boot CPU does not have address
zero
- fix the new pkey interface added with the merge window for 4.11
- a fix for a fix, another issue with user copy zero padding"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: get_user() should zero on failure (again)
s390/pkey: Fix wrong handling of secure key with old MKVP
s390/smp: fix ipl from cpu with non-zero address
s390/decompressor: fix initrd corruption caused by bss clear
Change-recording override (CO) was never implemented in any
machine. According to the architecture it is unpredictable if a
translation-specification exception will be recognized if the bit is
set and EDAT1 does not apply.
Therefore the easiest solution is to simply ignore the bit.
This also fixes commit cd1836f583 ("KVM: s390:
instruction-execution-protection support"). A guest may enable
instruction-execution-protection (IEP) but not EDAT1. In such a case
the guest_translate() function (arch/s390/kvm/gaccess.c) will report a
specification exception on pages that have the IEP bit set while it
should not.
It might make sense to add full IEP support to guest_translate() and
the GACC_IFETCH case. However, as far as I can tell the GACC_IFETCH
case is currently only used after an instruction was executed in order
to fetch the failing instruction. So there is no additional problem
*currently*.
Fixes: cd1836f583 ("KVM: s390: instruction-execution-protection support")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
A section name for .data..ro_after_init was added by both:
commit d07a980c1b ("s390: add proper __ro_after_init support")
and
commit d7c19b066d ("mm: kmemleak: scan .data.ro_after_init")
The latter adds incorrect wrapping around the existing s390 section, and
came later. I'd prefer the s390 naming, so this moves the s390-specific
name up to the asm-generic/sections.h and renames the section as used by
kmemleak (and in the future, kernel/extable.c).
Link: http://lkml.kernel.org/r/20170327192213.GA129375@beast
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390 parts]
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull vfio-ccw branch to add the basic channel I/O passthrough
intrastructure based on vfio.
The focus is on supporting dasd-eckd(cu_type/dev_type = 0x3990/0x3390)
as the target device.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Although Linux does not use format-0 channel command words (CCW0)
these are a non-optional part of the platform spec, and for the sake
of platform compliance, and possibly some non-Linux guests, we have
to support CCW0.
Making the kernel execute a format 0 channel program is too much hassle
because we would need to allocate and use memory which can be addressed
by 24 bit physical addresses (because of CCW0.cda). So we implement CCW0
support by translating the channel program into an equivalent CCW1
program instead.
Based upon an orginal patch by Kai Yue Wang.
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170317031743.40128-16-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
To make vfio support subchannel devices, we need to leverage the
mediated device framework to create a mediated device for the
subchannel device.
This registers the subchannel device to the mediated device
framework during probe to enable mediated device creation.
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Dong Jia Shi <bjsdjshi@linux.vnet.ibm.com>
Message-Id: <20170317031743.40128-7-bjsdjshi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>