.boot.preserved.data is a better fit for ipl block than .boot.data
which is discarded after init. Reusing .boot.preserved.data allows to
simplify code a little bit and avoid copying data from .boot.data to
persistent variables.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce .boot.preserve.data section which is similar to .boot.data and
"shared" between the decompressor code and the decompressed kernel. The
decompressor will store values in it, and copy over to the decompressed
image before starting it. This method allows to avoid using pre-defined
addresses and other hacks to pass values between those boot phases.
Unlike .boot.data section .boot.preserved.data is NOT a part of init data,
and hence will be preserved for the kernel life time.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Hook up asm-generic/mmiowb.h to Kbuild for all architectures so that we
can subsequently include asm/mmiowb.h from core code.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As the generic rwsem-xadd code is using the appropriate acquire and
release versions of the atomic operations, the arch specific rwsem.h
files will not be that much faster than the generic code as long as the
atomic functions are properly implemented. So we can remove those arch
specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
effort.
Currently, only x86, alpha and ia64 have implemented architecture
specific fast paths. I don't have access to alpha and ia64 systems for
testing, but they are legacy systems that are not likely to be updated
to the latest kernel anyway.
By using a rwsem microbenchmark, the total locking rates on a 4-socket
56-core 112-thread x86-64 system before and after the patch were as
follows (mixed means equal # of read and write locks):
Before Patch After Patch
# of Threads wlock rlock mixed wlock rlock mixed
------------ ----- ----- ----- ----- ----- -----
1 29,201 30,143 29,458 28,615 30,172 29,201
2 6,807 13,299 1,171 7,725 15,025 1,804
4 6,504 12,755 1,520 7,127 14,286 1,345
8 6,762 13,412 764 6,826 13,652 726
16 6,693 15,408 662 6,599 15,938 626
32 6,145 15,286 496 5,549 15,487 511
64 5,812 15,495 60 5,858 15,572 60
There were some run-to-run variations for the multi-thread tests. For
x86-64, using the generic C code fast path seems to be a little bit
faster than the assembly version with low lock contention. Looking at
the assembly version of the fast paths, there are assembly to/from C
code wrappers that save and restore all the callee-clobbered registers
(7 registers on x86-64). The assembly generated from the generic C
code doesn't need to do that. That may explain the slight performance
gain here.
The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
with no code change as no other code other than those under
kernel/locking needs to access the internal rwsem macros and functions.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-c6x-dev@linux-c6x.org
Cc: linux-m68k@lists.linux-m68k.org
Cc: linux-riscv@lists.infradead.org
Cc: linux-um@lists.infradead.org
Cc: linux-xtensa@linux-xtensa.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: nios2-dev@lists.rocketboards.org
Cc: openrisc@lists.librecores.org
Cc: uclinux-h8-devel@lists.sourceforge.jp
Link: https://lkml.kernel.org/r/20190322143008.21313-2-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Move the mmu_gather::page_size things into the generic code instead of
PowerPC specific bits.
No change in behavior intended.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed
with the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the
heap of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcnIq7AAoJEDjwexyKj9rgddUH/3VQP6BMvq2fwAsLqx8JeYgT
082xzP2nHli3tO6m8fFHmtqrSg5KTEDfuQVafqp92LeEMKUNWQI6kRu7rXeAVBct
M6hx21mqkm9VNjAlAjSq8IAUXP2K6/K0BMD5mYInYYYVRvJm3on4sHnkEj0kvXbm
OGxwnNBd9UnH5g6ti2vW4cyDvs0aqj1eDbSudy5KedumQz5J2XdFPn4f4Ej6p2+t
nuvlZFDnZ2Z4rliE3RFCuKExZR+YFZgS1urm6pcklncfvbJRsqFJ+nvhurskDUI3
4gOp1Yv1tvGNv/cNVEtnz8g/Kg8/sI7evjQBtxhtEsV/W0sbZPnjCt+28Cf1DN4=
=4nL7
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Improvements and bug fixes for 5.1-rc2:
- Fix early free of the channel program in vfio
- On AP device removal make sure that all messages are flushed with
the driver still attached that queued the message
- Limit brk randomization to 32MB to reduce the chance that the heap
of ld.so is placed after the main stack
- Add a rolling average for the steal time of a CPU, this will be
needed for KVM to decide when to do busy waiting
- Fix a warning in the CPU-MF code
- Add a notification handler for AP configuration change to react
faster to new AP devices"
* tag 's390-5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/cpumf: Fix warning from check_processor_id
zcrypt: handle AP Info notification from CHSC SEI command
vfio: ccw: only free cp on final interrupt
s390/vtime: steal time exponential moving average
s390/zcrypt: revisit ap device remove procedure
s390: limit brk randomization to 32MB
The generic-y is redundant under the following condition:
- arch has its own implementation
- the same header is added to generated-y
- the same header is added to mandatory-y
If a redundant generic-y is found, the warning like follows is displayed:
scripts/Makefile.asm-generic:20: redundant generic-y found in arch/arm/include/asm/Kbuild: timex.h
I fixed up arch Kbuild files found by this.
Suggested-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
for 32-bit guests
s390: interrupt cleanup, introduction of the Guest Information Block,
preparation for processor subfunctions in cpu models
PPC: bug fixes and improvements, especially related to machine checks
and protection keys
x86: many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations; plus AVIC fixes.
Generic: memcg accounting
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJci+7XAAoJEL/70l94x66DUMkIAKvEefhceySHYiTpfefjLjIC
16RewgHa+9CO4Oo5iXiWd90fKxtXLXmxDQOS4VGzN0rxvLGRw/fyXIxL1MDOkaAO
l8SLSNuewY4XBUgISL3PMz123r18DAGOuy9mEcYU/IMesYD2F+wy5lJ17HIGq6X2
RpoF1p3qO1jfkPTKOob6Ixd4H5beJNPKpdth7LY3PJaVhDxgouj32fxnLnATVSnN
gENQ10fnt8BCjshRYW6Z2/9bF15JCkUFR1xdBW2/xh1oj+kvPqqqk2bEN1eVQzUy
2hT/XkwtpthqjSbX8NNavWRSFnOnbMLTRKQyIXmFVsM5VoSrwtiGsCFzBgcT++I=
=XIzU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"ARM:
- some cleanups
- direct physical timer assignment
- cache sanitization for 32-bit guests
s390:
- interrupt cleanup
- introduction of the Guest Information Block
- preparation for processor subfunctions in cpu models
PPC:
- bug fixes and improvements, especially related to machine checks
and protection keys
x86:
- many, many cleanups, including removing a bunch of MMU code for
unnecessary optimizations
- AVIC fixes
Generic:
- memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
kvm: vmx: fix formatting of a comment
KVM: doc: Document the life cycle of a VM and its resources
MAINTAINERS: Add KVM selftests to existing KVM entry
Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
KVM: PPC: Fix compilation when KVM is not enabled
KVM: Minor cleanups for kvm_main.c
KVM: s390: add debug logging for cpu model subfunctions
KVM: s390: implement subfunction processor calls
arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
KVM: arm/arm64: Remove unused timer variable
KVM: PPC: Book3S: Improve KVM reference counting
KVM: PPC: Book3S HV: Fix build failure without IOMMU support
Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
x86: kvmguest: use TSC clocksource if invariant TSC is exposed
KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
...
The current AP bus implementation periodically polls the AP configuration
to detect changes. When the AP configuration is dynamically changed via the
SE or an SCLP instruction, the changes will not be reflected to sysfs until
the next time the AP configuration is polled. The CHSC architecture
provides a Store Event Information (SEI) command to make notification of an
AP configuration change. This patch introduces a handler to process
notification from the CHSC SEI command by immediately kicking off an AP bus
scan-after-event.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Harald Freudenberger <FREUDE@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Merge misc updates from Andrew Morton:
- a few misc things
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (159 commits)
tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
proc: more robust bulk read test
proc: test /proc/*/maps, smaps, smaps_rollup, statm
proc: use seq_puts() everywhere
proc: read kernel cpu stat pointer once
proc: remove unused argument in proc_pid_lookup()
fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
fs/proc/self.c: code cleanup for proc_setup_self()
proc: return exit code 4 for skipped tests
mm,mremap: bail out earlier in mremap_to under map pressure
mm/sparse: fix a bad comparison
mm/memory.c: do_fault: avoid usage of stale vm_area_struct
writeback: fix inode cgroup switching comment
mm/huge_memory.c: fix "orig_pud" set but not used
mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
mm/memcontrol.c: fix bad line in comment
mm/cma.c: cma_declare_contiguous: correct err handling
mm/page_ext.c: fix an imbalance with kmemleak
mm/compaction: pass pgdat to too_many_isolated() instead of zone
mm: remove zone_lru_lock() function, access ->lru_lock directly
...
To be able to judge the current overcommitment ratio for a CPU add
a lowcore field with the exponential moving average of the steal time.
The average is updated every tick.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For a 64-bit process the randomization of the program break is quite
large with 1GB. That is as big as the randomization of the anonymous
mapping base, for a test case started with '/lib/ld64.so.1 <exec>'
it can happen that the heap is placed after the stack. To avoid
this limit the program break randomization to 32MB for 64-bit and
keep 8MB for 31-bit.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Architectures like ppc64 require to do a conditional tlb flush based on
the old and new value of pte. Enable that by passing old pte value as
the arg.
Link: http://lkml.kernel.org/r/20190116085035.29729-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "NestMMU pte upgrade workaround for mprotect", v5.
We can upgrade pte access (R -> RW transition) via mprotect. We need to
make sure we follow the recommended pte update sequence as outlined in
commit bd5050e38a ("powerpc/mm/radix: Change pte relax sequence to
handle nest MMU hang") for such updates. This patch series does that.
This patch (of 5):
Some architectures may want to call flush_tlb_range from these helpers.
Link: http://lkml.kernel.org/r/20190116085035.29729-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull year 2038 updates from Thomas Gleixner:
"Another round of changes to make the kernel ready for 2038. After lots
of preparatory work this is the first set of syscalls which are 2038
safe:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
The syscall numbers are identical all over the architectures"
* 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
riscv: Use latest system call ABI
checksyscalls: fix up mq_timedreceive and stat exceptions
unicore32: Fix __ARCH_WANT_STAT64 definition
asm-generic: Make time32 syscall numbers optional
asm-generic: Drop getrlimit and setrlimit syscalls from default list
32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
compat ABI: use non-compat openat and open_by_handle_at variants
y2038: add 64-bit time_t syscalls to all 32-bit architectures
y2038: rename old time and utime syscalls
y2038: remove struct definition redirects
y2038: use time32 syscall names on 32-bit
syscalls: remove obsolete __IGNORE_ macros
y2038: syscalls: rename y2038 compat syscalls
x86/x32: use time64 versions of sigtimedwait and recvmmsg
timex: change syscalls to use struct __kernel_timex
timex: use __kernel_timex internally
sparc64: add custom adjtimex/clock_adjtime functions
time: fix sys_timer_settime prototype
time: Add struct __kernel_timex
time: make adjtime compat handling available for 32 bit
...
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the
CPU measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and
store thenn as event raw data.
- Bug fixes and cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcfh4QAAoJEDjwexyKj9rgXVAH/RzVbi3vznldujSNfCFTZKPu
EmFFAZIfbhifW3szfylyOJL52pFhxjcWzY0hkFEkbs2t90sn8l1BNkDscYZtfNHC
XvN3N9LsHyxOeyxvQuWLSio58qm+Lr1L0UrIhbMvqyAVkOLmIHvybFwi83OkMptm
djoL8NbuNsAA2s26y2bZLNtU7FmOW5smJIlnt7H4dmK4SFylqZKS/EnUZxGDgn+7
UrrTTOQUir0QZ8vraANsP1M0/LqPcd2YusLmj4jOdZ5Muc2Ch2AA991FofqdKShO
/8cGlsIzwHWGgdnP/YDea5gbetvonayYduixKy3EnYpWQ9iogiBjH4G7QNxcncs=
=v26J
-----END PGP SIGNATURE-----
Merge tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
Pull networking updates from David Miller:
"Here we go, another merge window full of networking and #ebpf changes:
1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
range without dealing with floods of ARP traffic, from Linus
Lüssing.
2) Throttle buffered multicast packet transmission in mt76, from
Felix Fietkau.
3) Support adaptive interrupt moderation in ice, from Brett Creeley.
4) A lot of struct_size conversions, from Gustavo A. R. Silva.
5) Add peek/push/pop commands to bpftool, as well as bash completion,
from Stanislav Fomichev.
6) Optimize sk_msg_clone(), from Vakul Garg.
7) Add SO_BINDTOIFINDEX, from David Herrmann.
8) Be more conservative with local resends due to local congestion,
from Yuchung Cheng.
9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.
10) Add health buffer support to devlink, from Eran Ben Elisha.
11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.
12) Add statistics to basic packet scheduler filter, from Cong Wang.
13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.
14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.
15) Add 3ad stats to bonding, from Nikolay Aleksandrov.
16) Lots of probing improvements for bpftool, from Quentin Monnet.
17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.
18) Allow #ebpf programs to access gso_segs from skb shared info, from
Eric Dumazet.
19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.
20) Support 22260 iwlwifi devices, from Luca Coelho.
21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.
22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.
23) Add spinlock support to #ebpf, from Alexei Starovoitov.
24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.
25) Add device infomation API to devlink, from Jakub Kicinski.
26) Add new timestamping socket options which are y2038 safe, from
Deepa Dinamani.
27) Add RX checksum offloading for various sh_eth chips, from Sergei
Shtylyov.
28) Flow offload infrastructure, from Pablo Neira Ayuso.
29) Numerous cleanups, improvements, and bug fixes to the PHY layer
and many drivers from Heiner Kallweit.
30) Lots of changes to try and make packet scheduler classifiers run
lockless as much as possible, from Vlad Buslov.
31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.
32) Add concurrency tests to tc-tests infrastructure, from Vlad
Buslov.
33) Add hwmon support to aquantia, from Heiner Kallweit.
34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.
And I would be remiss if I didn't thank the various major networking
subsystem maintainers for integrating much of this work before I even
saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
Johannes Berg, Kalle Valo, and many others. Thank you!"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
net/sched: avoid unused-label warning
net: ignore sysctl_devconf_inherit_init_net without SYSCTL
phy: mdio-mux: fix Kconfig dependencies
net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
selftest/net: Remove duplicate header
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net/mlx5e: Update tx reporter status in case channels were successfully opened
devlink: Add support for direct reporter health state update
devlink: Update reporter state to error even if recover aborted
sctp: call iov_iter_revert() after sending ABORT
team: Free BPF filter when unregistering netdev
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
cxgb4/chtls: Prefix adapter flags with CXGB4
net-sysfs: Switch to bitmap_zalloc()
mellanox: Switch to bitmap_zalloc()
bpf: add test cases for non-pointer sanitiation logic
mlxsw: i2c: Extend initialization by querying resources data
...
Every in-kernel use of this function defined it to KERNEL_DS (either as
an actual define, or as an inline function). It's an entirely
historical artifact, and long long long ago used to actually read the
segment selector valueof '%ds' on x86.
Which in the kernel is always KERNEL_DS.
Inspired by a patch from Jann Horn that just did this for a very small
subset of users (the ones in fs/), along with Al who suggested a script.
I then just took it to the logical extreme and removed all the remaining
gunk.
Roughly scripted with
git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'
plus manual fixups to remove a few unusual usage patterns, the couple of
inline function cases and to fix up a comment that had become stale.
The 'get_ds()' function remains in an x86 kvm selftest, since in user
space it actually does something relevant.
Inspired-by: Jann Horn <jannh@google.com>
Inspired-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Clarify KVM related kernel messages
- Interrupt cleanup
- Introduction of the Guest Information Block (GIB)
- Preparation for processor subfunctions in cpu model
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJcb9iJAAoJEONU5rjiOLn4mI8P/1J3IyN3imnZUkTM5JOMWhNM
TSwh5obvn7dT6URZ6BgM1DWIz9E/FKrEb2kU4xr1hwf/a69Q1cYKVmHSnzzIpxHQ
ZNjr7QbcBCsVJ8LtasOoMmgGnVvtBYKKHr4J8UcqeW9raP3YfPJmqyETufiE2lFy
G50r8EBFr9rPh7nK7ImAabKC/7Q/qxZ0729m71cu729/uBb/Wf6frqaDmFlA8362
YZC7KY+xEHZbWqKQqAt/x1TWAOb7nA5dCzemeRckNrs5+FN7rSBrje6SbWApZPfn
weteCVbJMLCoRMUTFjRy3YNz1x0gAC9VQT6Qz5Kz7dColVfJjTPWdYuKpbRsj+n1
PEv1uuDBNbDqdS29KG3Dk9cfzUgAU12g+Xsb+3168HsQbU7XU1v6gCoRaR8ccaoq
3k8Em0xusHa+uGI6K4knKmWboRrCA6FWHIaink4B2K7qIaVdWqTebhHaDiDx8qB8
JRNjxQDho92FpRzxHyajHtamFKPjGT/Guc0yWMIrPHBn97GktUnDD6E5AdhTRVxs
aXTZv7XFq5j307lc3qWsdAf4zGEaPbi9f2nHgFK8hJf+z560CmNbye9Rw6L96Lil
gy0rvSQgN+3xBtSKvq3DNrgoouupOS6kFu5iyYLBS8UUOztXttKEzTCs+M87/3AP
fphwixKEEXsMRWR2SJvG
=YeIb
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
KVM: s390: Features for 5.1
- Clarify KVM related kernel messages
- Interrupt cleanup
- Introduction of the Guest Information Block (GIB)
- Preparation for processor subfunctions in cpu model
While we will not implement interception for query functions yet, we can
and should disable functions that have a control bit based on the given
CPU model.
Let us start with enabling the subfunction interface.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Introduce a PMU device named cpum_cf_diag. It extracts the
values of all counters in all authorized counter sets and stores
them as event raw data. This is done with the STORE CPU COUNTER
MULTIPLE instruction to speed up access. All counter sets
fit into one buffer. The values of each counter are taken
when the event is started on the performance sub-system and when
the event is stopped.
This results in counter values available at the start and
at the end of the measurement time frame. The difference is
calculated for each counter. The differences of all
counters are then saved as event raw data in the perf.data
file.
The counter values are accompanied by the time stamps
when the counter set was started and when the counter set
was stopped. This data is part of a trailer entry which
describes the time frame, counter set version numbers,
CPU speed, and machine type for later analysis.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce the ctr_stcctm() function as wrapper function to extract counters
from a particular counter set. Note that the counter set is part of the
stcctm instruction opcode, few indirections are necessary to specify the
counter set as variable.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A preparation to move out common CPU-MF counter facility support
functions, first introduce a function that indicates whether the
support is ready to use.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the stcctm5() function to extract counters from the MT-diagnostic
counter set with the stcctm() function. For readability, introduce an
enum to map the counter sets names to respective numbers for the stcctm
instruction.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for the STORE CPU COUNTER MULTIPLE instruction to extract
a range of counters from a counter set.
An assembler macro is used to create the instruction opcode because
the counter set identifier is part of the instruction and, thus,
cannot be easily specified as parameter.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a minimal interface for doing counter measurements of small
units of work within the kernel. Use the kernel_cpumcf_begin() function
start a measurement session and, later, stop it with kernel_cpumcf_end().
During the measreument session, you can enable and start/stop counter sets
by using ctr_set_* functions. To make these changes effective use the
lcctl() function. You can then use the ecctr() function to extract counters
from the different counter sets.
Please note that you have to check whether the counter sets to be enabled
are authorized.
Note that when a measurement session is active, other users cannot perform
counter measurements. In such cases, kernel_cpumcf_begin() indicates this
with returning -EBUSY. If the counter facility is not available,
kernel_cpumcf_begin() returns -ENODEV.
Note that this interface is restricted to the current CPU and, thus,
preemption must be turned off.
Example:
u32 state, err;
u64 cycles, insn;
err = kernel_cpumcf_begin();
if (err)
goto out_busy;
state = 0;
ctr_set_enable(&state, CPUMF_CTR_SET_BASIC);
ctr_set_start(&state, CPUMF_CTR_SET_BASIC);
err = lcctl(state);
if (err)
goto ;
/* ... do your work ... */
ctr_set_stop(&state, CPUMF_CTR_SET_BASIC);
err = lcctl(state);
if (err)
goto out;
cycles = insn = 0;
ecctr(0, &cycles);
ecctr(1, &insn);
/* ... */
kernel_cpumcf_end();
out_busy:
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During a __kernel_cpumcf_begin()/end() session, save measurement alerts
for the counter facility in the per-CPU cpu_cf_events variable.
Users can obtain and, optionally, clear the alerts by calling
kernel_cpumcf_alert() to specifically handle alerts.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make the struct cpu_cf_events and the respective per-CPU variable available
to in-kernel users. Access to this per-CPU variable shall be done between
the calls to __kernel_cpumcf_begin() and __kernel_cpumcf_end().
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Prepare the counter facility support to be used by other in-kernel
users. The first step introduces the __kernel_cpumcf_begin() and
__kernel_cpumcf_end() functions to reserve the counter facility
for doing measurements and to release after the measurements are
done.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move counter set specific controls and functions to the asm/cpu_mcf.h
header file containg all counter facility support definitions. Also
adapt few variable names and header file includes. No functional changes.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.
Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.
Fixes: e59dbe09f8 ("KVM: Introduce kvm_arch_memslots_updated()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[heiko.carstens@de.ibm.com]:
-----
Laura Abbott reported that the kernel doesn't build anymore with gcc 9,
due to the "X" constraint. Ilya provided the gcc 9 patch "S/390:
Introduce jdd constraint" which introduces the new "jdd" constraint
which fixes this.
-----
The support for section anchors on S/390 introduced in gcc9 has changed
the behavior of "X" constraint, which can now produce register
references. Since existing constraints, in particular, "i", do not fit
the intended use case on S/390, the new machine-specific "jdd"
constraint was introduced. This patch makes jump labels use "jdd"
constraint when building with gcc9.
Reported-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This series finally gets us to the point of having system calls with
64-bit time_t on all architectures, after a long time of incremental
preparation patches.
There was actually one conversion that I missed during the summer,
i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes
and review comments.
The following system calls are now added on all 32-bit architectures
using the same system call numbers:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
Each one of these corresponds directly to an existing system call
that includes a 'struct timespec' argument, or a structure containing
a timespec or (in case of clock_adjtime) timeval. Not included here
are new versions of getitimer/setitimer and getrusage/waitid, which
are planned for the future but only needed to make a consistent API
rather than for correct operation beyond y2038. These four system
calls are based on 'timeval', and it has not been finally decided
what the replacement kernel interface will use instead.
So far, I have done a lot of build testing across most architectures,
which has found a number of bugs. Runtime testing so far included
testing LTP on 32-bit ARM with the existing system calls, to ensure
we do not regress for existing binaries, and a test with a 32-bit
x86 build of LTP against a modified version of the musl C library
that has been adapted to the new system call interface [3].
This library can be used for testing on all architectures supported
by musl-1.1.21, but it is not how the support is getting integrated
into the official musl release. Official musl support is planned
but will require more invasive changes to the library.
Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/
Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/
Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcXf7/AAoJEGCrR//JCVInPSUP/RhsQSCKMGtONB/vVICQhwep
PybhzBSpHWFxszzTi6BEPN1zS9B069G9mDollRBYZCckyPqL/Bv6sI/vzQZdNk01
Q6Nw92OnNE1QP8owZ5TjrZhpbtopWdqIXjsbGZlloUemvuJP2JwvKovQUcn5CPTQ
jbnqU04CVyFFJYVxAnGJ+VSeWNrjW/cm/m+rhLFjUcwW7Y3aodxsPqPP6+K9hY9P
yIWfcH42WBeEWGm1RSBOZOScQl4SGCPUAhFydl/TqyEQagyegJMIyMOv9wZ5AuTT
xK644bDVmNsrtJDZDpx+J8hytXCk1LrnKzkHR/uK80iUIraF/8D7PlaPgTmEEjko
XcrywEkvkXTVU3owCm2/sbV+8fyFKzSPipnNfN1JNxEX71A98kvMRtPjDueQq/GA
Yh81rr2YLF2sUiArkc2fNpENT7EGhrh1q6gviK3FB8YDgj1kSgPK5wC/X0uolC35
E7iC2kg4NaNEIjhKP/WKluCaTvjRbvV+0IrlJLlhLTnsqbA57ZKCCteiBrlm7wQN
4csUtCyxchR9Ac2o/lj+Mf53z68Zv74haIROp18K2dL7ZpVcOPnA3XHeauSAdoyp
wy2Ek6ilNvlNB+4x+mRntPoOsyuOUGv7JXzB9JvweLWUd9G7tvYeDJQp/0YpDppb
K4UWcKnhtEom0DgK08vY
=IZVb
-----END PGP SIGNATURE-----
Merge tag 'y2038-new-syscalls' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038
Pull y2038 - time64 system calls from Arnd Bergmann:
This series finally gets us to the point of having system calls with 64-bit
time_t on all architectures, after a long time of incremental preparation
patches.
There was actually one conversion that I missed during the summer,
i.e. Deepa's timex series, which I now updated based the 5.0-rc1 changes
and review comments.
The following system calls are now added on all 32-bit architectures using
the same system call numbers:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
Each one of these corresponds directly to an existing system call that
includes a 'struct timespec' argument, or a structure containing a timespec
or (in case of clock_adjtime) timeval. Not included here are new versions
of getitimer/setitimer and getrusage/waitid, which are planned for the
future but only needed to make a consistent API rather than for correct
operation beyond y2038. These four system calls are based on 'timeval', and
it has not been finally decided what the replacement kernel interface will
use instead.
So far, I have done a lot of build testing across most architectures, which
has found a number of bugs. Runtime testing so far included testing LTP on
32-bit ARM with the existing system calls, to ensure we do not regress for
existing binaries, and a test with a 32-bit x86 build of LTP against a
modified version of the musl C library that has been adapted to the new
system call interface [3]. This library can be used for testing on all
architectures supported by musl-1.1.21, but it is not how the support is
getting integrated into the official musl release. Official musl support is
planned but will require more invasive changes to the library.
Link: https://lore.kernel.org/lkml/20190110162435.309262-1-arnd@arndb.de/T/
Link: https://lore.kernel.org/lkml/20190118161835.2259170-1-arnd@arndb.de/
Link: https://git.linaro.org/people/arnd/musl-y2038.git/ [2]
The system call tables have diverged a bit over the years, and a number
of the recent additions never made it into all architectures, for one
reason or another.
This is an attempt to clean it up as far as we can without breaking
compatibility, doing a number of steps:
- Add system calls that have not yet been integrated into all
architectures but that we definitely want there. This includes
{,f}statfs64() and get{eg,eu,g,p,u,pp}id() on alpha, which have
been missing traditionally.
- The s390 compat syscall handling is cleaned up to be more like
what we do on other architectures, while keeping the 31-bit
pointer extension. This was merged as a shared branch by the
s390 maintainers and is included here in order to base the other
patches on top.
- Add the separate ipc syscalls on all architectures that
traditionally only had sys_ipc(). This version is done without
support for IPC_OLD that is we have in sys_ipc. The
new semtimedop_time64 syscall will only be added here, not
in sys_ipc
- Add syscall numbers for a couple of syscalls that we probably
don't need everywhere, in particular pkey_* and rseq,
for the purpose of symmetry: if it's in asm-generic/unistd.h,
it makes sense to have it everywhere. I expect that any future
system calls will get assigned on all platforms together, even
when they appear to be specific to a single architecture.
- Prepare for having the same system call numbers for any future
calls. In combination with the generated tables, this hopefully
makes it easier to add new calls across all architectures
together.
All of the above are technically separate from the y2038 work,
but are done as preparation before we add the new 64-bit time_t
system calls everywhere, providing a common baseline set of system
calls.
I expect that glibc and other libraries that want to use 64-bit
time_t will require linux-5.1 kernel headers for building in
the future, and at a much later point may also require linux-5.1
or a later version as the minimum kernel at runtime. Having a
common baseline then allows the removal of many architecture or
kernel version specific workarounds.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJcXf6XAAoJEGCrR//JCVInIm4P/AlkMmQRa/B2ziWMW6PifPoI
v18r44017rA1BPENyZvumJUdM5mDvNofOW8F2DYQ7Uiys2YtXenwe/Cf8LHn2n6c
TMXGQryQpvNmfDCyU+0UjF8m2+poFMrL4aRTXtjODh1YTsPNgeDC+KFMCAAtZmZd
cVbXFudtbdYKD/pgCX4SI1CWAMBiXe2e+ukPdJVr+iqusCMTApf+GOuyvDBZY9s/
vURb+tIS87HZ/jehWfZFSuZt+Gu7b3ijUXNC8v9qSIxNYekw62vBNl6F09HE79uB
Bv4OujAODqKvI9gGyydBzLJNzaMo0ryQdusyqcJHT7MY/8s+FwcYAXyTlQ3DbbB4
2u/c+58OwJ9Zk12p4LXZRA47U+vRhQt2rO4+zZWs2txNNJY89ZvCm/Z04KOiu5Xz
1Nnj607KGzthYRs2gs68AwzGGyf0uykIQ3RcaJLIBlX1Nd8BWO0ZgAguCvkXbQMX
XNXJTd92HmeuKKpiO0n/M4/mCeP0cafBRPCZbKlHyTl0Jeqd/HBQEO9Z8Ifwyju3
mXz9JCR9VlPCkX605keATbjtPGZf3XQtaXlQnezitDudXk8RJ33EpPcbhx76wX7M
Rux37ByqEOzk4wMGX9YQyNU7z7xuVg4sJAa2LlJqYeKXHtym+u3gG7SGP5AsYjmk
6mg2+9O2yZuLhQtOtrwm
=s4wf
-----END PGP SIGNATURE-----
Merge tag 'y2038-syscall-cleanup' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground into timers/2038
Pull preparatory work for y2038 changes from Arnd Bergmann:
System call unification and cleanup
The system call tables have diverged a bit over the years, and a number of
the recent additions never made it into all architectures, for one reason
or another.
This is an attempt to clean it up as far as we can without breaking
compatibility, doing a number of steps:
- Add system calls that have not yet been integrated into all architectures
but that we definitely want there. This includes {,f}statfs64() and
get{eg,eu,g,p,u,pp}id() on alpha, which have been missing traditionally.
- The s390 compat syscall handling is cleaned up to be more like what we
do on other architectures, while keeping the 31-bit pointer
extension. This was merged as a shared branch by the s390 maintainers
and is included here in order to base the other patches on top.
- Add the separate ipc syscalls on all architectures that traditionally
only had sys_ipc(). This version is done without support for IPC_OLD
that is we have in sys_ipc. The new semtimedop_time64 syscall will only
be added here, not in sys_ipc
- Add syscall numbers for a couple of syscalls that we probably don't need
everywhere, in particular pkey_* and rseq, for the purpose of symmetry:
if it's in asm-generic/unistd.h, it makes sense to have it everywhere. I
expect that any future system calls will get assigned on all platforms
together, even when they appear to be specific to a single architecture.
- Prepare for having the same system call numbers for any future calls. In
combination with the generated tables, this hopefully makes it easier to
add new calls across all architectures together.
All of the above are technically separate from the y2038 work, but are done
as preparation before we add the new 64-bit time_t system calls everywhere,
providing a common baseline set of system calls.
I expect that glibc and other libraries that want to use 64-bit time_t will
require linux-5.1 kernel headers for building in the future, and at a much
later point may also require linux-5.1 or a later version as the minimum
kernel at runtime. Having a common baseline then allows the removal of many
architecture or kernel version specific workarounds.
There is no need to define these PNETID related constants in
the pnet.h file, since they are just used locally within pnet.c.
Just code cleanup, no functional change.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The z14 introduced alignment hints to increase the performance of
vector loads and stores. The kernel uses an implicit alignmenet
of 8 bytes for the vector registers, set the alignment hint to 3.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no need to use void pointers, all drivers are in agreement
about the underlying data structure of the SBAL arrays.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The time, stime, utime, utimes, and futimesat system calls are only
used on older architectures, and we do not provide y2038 safe variants
of them, as they are replaced by clock_gettime64, clock_settime64,
and utimensat_time64.
However, for consistency it seems better to have the 32-bit architectures
that still use them call the "time32" entry points (leaving the
traditional handlers for the 64-bit architectures), like we do for system
calls that now require two versions.
Note: We used to always define __ARCH_WANT_SYS_TIME and
__ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
__ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
mode. The resulting asm/unistd.h changes look a bit counterintuitive.
This is only a cleanup patch and it should not change any behavior.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
These are all for ignoring the lack of obsolete system calls,
which have been marked the same way in scripts/checksyscall.sh,
so these can be removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The patch implements a handler for GIB alert interruptions
on the host. Its task is to alert guests that interrupts are
pending for them.
A GIB alert interrupt statistic counter is added as well:
$ cat /proc/interrupts
CPU0 CPU1
...
GAL: 23 37 [I/O] GIB Alert
...
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <20190131085247.13826-14-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add the Interruption Alert Mask (IAM) to the architecture specific
kvm struct. This mask in the GISA is used to define for which ISC
a GIB alert will be issued.
The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister()
are used to (un)register a GISC (guest ISC) with a virtual machine and
its GISA.
Upon successful completion, kvm_s390_gisc_register() returns the
ISC to be used for GIB alert interruptions. A negative return code
indicates an error during registration.
Theses functions will be used by other adapter types like AP and PCI to
request pass-through interruption support.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-12-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Adding the kvm reference to struct sie_page2 will allow to
determine the kvm a given gisa belongs to:
container_of(gisa, struct sie_page2, gisa)->kvm
This functionality will be required to process a gisa in
gib alert interruption context.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-11-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The Guest Information Block (GIB) links the GISA of all guests
that have adapter interrupts pending. These interrupts cannot be
delivered because all vcpus of these guests are currently in WAIT
state or have masked the respective Interruption Sub Class (ISC).
If enabled, a GIB alert is issued on the host to schedule these
guests to run suitable vcpus to consume the pending interruptions.
This mechanism allows to process adapter interrupts for currently
not running guests.
The GIB is created during host initialization and associated with
the Adapter Interruption Facility in case an Adapter Interruption
Virtualization Facility is available.
The GIB initialization and thus the activation of the related code
will be done in an upcoming patch of this series.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-10-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch implements the Set Guest Information Block operation
to request association or disassociation of a Guest Information
Block (GIB) with the Adapter Interruption Facility. The operation
is required to receive GIB alert interrupts for guest adapters
in conjunction with AIV and GISA.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20190131085247.13826-9-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use this struct analog to the kvm interruption structs
for kvm emulated floating and local interruptions.
GIB handling will add further fields to this structure as
required.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-8-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The vcpu idle_mask state is used by but not specific
to the emulated floating interruptions. The state is
relevant to gisa related interruptions as well.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-4-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use a consistent bitmap declaration throughout the code.
Signed-off-by: Michael Mueller <mimu@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Message-Id: <20190131085247.13826-3-mimu@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Sebastian Ott <sebott@linux.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390x diagnose 318 instruction sets the control program name code (CPNC)
and control program version code (CPVC) to provide useful information
regarding the OS during debugging. The CPNC is explicitly set to 4 to
indicate a Linux/KVM environment.
The CPVC is a 7-byte value containing:
- 3-byte Linux version code, currently set to 0
- 3-byte unique value, currently set to 0
- 1-byte trailing null
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <1544135405-22385-2-git-send-email-walling@linux.ibm.com>
[set version code to 0 until the structure is fully defined]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Most architectures define system call numbers for the rseq and pkey system
calls, even when they don't support the features, and perhaps never will.
Only a few architectures are missing these, so just define them anyway
for consistency. If we decide to add them later to one of these, the
system call numbers won't get out of sync then.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
When converting to autogenerated compat syscall wrappers all system
call entry points got a different symbol name: they all got a __s390x_
prefix.
This caused breakage with system call tracing, since an appropriate
arch_syscall_match_sym_name() was not provided. Add this function, and
while at it also add code to avoid compat system call tracing. s390
has different system call tables for native 64 bit system calls and
compat system calls. This isn't really supported in the common
code. However there are hardly any compat binaries left, therefore
just ignore compat system calls, like x86 and arm64 also do for the
same reason.
Fixes: aa0d6e70d3 ("s390: autogenerate compat syscall wrappers")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid using arch specific implementations of string/memory functions
with KASAN since gcc cannot instrument asm code memory accesses and
many bugs could be missed.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Any system call that takes a pointer argument on s390 requires
a wrapper function to do a 31-to-64 zero-extension, these are
currently generated in arch/s390/kernel/compat_wrapper.c.
On arm64 and x86, we already generate similar wrappers for all
system calls in the place of their definition, just for a different
purpose (they load the arguments from pt_regs).
We can do the same thing here, by adding an asm/syscall_wrapper.h
file with a copy of all the relevant macros to override the generic
version. Besides the addition of the compat entry point, these also
rename the entry points with a __s390_ or __s390x_ prefix, similar
to what we do on arm64 and x86. This in turn requires renaming
a few things, and adding a proper ni_syscall() entry point.
In order to still compile system call definitions that pass an
loff_t argument, the __SC_COMPAT_CAST() macro checks for that
and forces an -ENOSYS error, which was the best I could come up
with. Those functions must obviously not get called from user
space, but instead require hand-written compat_sys_*() handlers,
which fortunately already exist.
Link: https://lore.kernel.org/lkml/20190116131527.2071570-5-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
[heiko.carstens@de.ibm.com: compile fix for !CONFIG_COMPAT]
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
While "s390/vdso: avoid 64-bit vdso mapping for compat tasks" fixed
64-bit vdso mapping for compat tasks under gdb it introduced another
problem. "compat_mm" flag is not inherited during fork and when
31-bit process forks a child (but does not perform exec) it ends up
with 64-bit vdso. To address that, init_new_context (which is called
during fork and exec) now initialize compat_mm based on thread TIF_31BIT
flag. Later compat_mm is adjusted in arch_setup_additional_pages, which
is called during exec.
Fixes: d1befa6582 ("s390/vdso: avoid 64-bit vdso mapping for compat tasks")
Reported-by: Stefan Liebler <stli@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@vger.kernel.org> # v4.20+
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The ASCE of an mm_struct can be modified after a task has been created,
e.g. via crst_table_downgrade for a compat process. The active_mm logic
to avoid the switch_mm call if the next task is a kernel thread can
lead to a situation where switch_mm is called where 'prev == next' is
true but 'prev->context.asce == next->context.asce' is not.
This can lead to a situation where a CPU uses the outdated ASCE to run
a task. The result can be a crash, endless loops and really subtle
problem due to TLBs being created with an invalid ASCE.
Cc: stable@kernel.org # v3.15+
Fixes: 53e857f308 ("s390/mm,tlb: race of lazy TLB flush vs. recreation")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Merge more updates from Andrew Morton:
- procfs updates
- various misc bits
- lib/ updates
- epoll updates
- autofs
- fatfs
- a few more MM bits
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (58 commits)
mm/page_io.c: fix polled swap page in
checkpatch: add Co-developed-by to signature tags
docs: fix Co-Developed-by docs
drivers/base/platform.c: kmemleak ignore a known leak
fs: don't open code lru_to_page()
fs/: remove caller signal_pending branch predictions
mm/: remove caller signal_pending branch predictions
arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
kernel/sched/: remove caller signal_pending branch predictions
kernel/locking/mutex.c: remove caller signal_pending branch predictions
mm: select HAVE_MOVE_PMD on x86 for faster mremap
mm: speed up mremap by 20x on large regions
mm: treewide: remove unused address argument from pte_alloc functions
initramfs: cleanup incomplete rootfs
scripts/gdb: fix lx-version string output
kernel/kcov.c: mark write_comp_data() as notrace
kernel/sysctl: add panic_print into sysctl
panic: add options to print system info when panic happens
bfs: extra sanity checking and static inode bitmap
exec: separate MM_ANONPAGES and RLIMIT_STACK accounting
...
Patch series "Add support for fast mremap".
This series speeds up the mremap(2) syscall by copying page tables at
the PMD level even for non-THP systems. There is concern that the extra
'address' argument that mremap passes to pte_alloc may do something
subtle architecture related in the future that may make the scheme not
work. Also we find that there is no point in passing the 'address' to
pte_alloc since its unused. This patch therefore removes this argument
tree-wide resulting in a nice negative diff as well. Also ensuring
along the way that the enabled architectures do not do anything funky
with the 'address' argument that goes unnoticed by the optimization.
Build and boot tested on x86-64. Build tested on arm64. The config
enablement patch for arm64 will be posted in the future after more
testing.
The changes were obtained by applying the following Coccinelle script.
(thanks Julia for answering all Coccinelle questions!).
Following fix ups were done manually:
* Removal of address argument from pte_fragment_alloc
* Removal of pte_alloc_one_fast definitions from m68k and microblaze.
// Options: --include-headers --no-includes
// Note: I split the 'identifier fn' line, so if you are manually
// running it, please unsplit it so it runs for you.
virtual patch
@pte_alloc_func_def depends on patch exists@
identifier E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
type T2;
@@
fn(...
- , T2 E2
)
{ ... }
@pte_alloc_func_proto_noarg depends on patch exists@
type T1, T2, T3, T4;
identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1, T2);
+ T3 fn(T1);
|
- T3 fn(T1, T2, T4);
+ T3 fn(T1, T2);
)
@pte_alloc_func_proto depends on patch exists@
identifier E1, E2, E4;
type T1, T2, T3, T4;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
(
- T3 fn(T1 E1, T2 E2);
+ T3 fn(T1 E1);
|
- T3 fn(T1 E1, T2 E2, T4 E4);
+ T3 fn(T1 E1, T2 E2);
)
@pte_alloc_func_call depends on patch exists@
expression E2;
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
@@
fn(...
-, E2
)
@pte_alloc_macro depends on patch exists@
identifier fn =~
"^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
identifier a, b, c;
expression e;
position p;
@@
(
- #define fn(a, b, c) e
+ #define fn(a, b) e
|
- #define fn(a, b) e
+ #define fn(a) e
)
Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When testing in userspace, UBSAN pointed out that shifting into the sign
bit is undefined behaviour. It doesn't really make sense to ask for the
highest set bit of a negative value, so just turn the argument type into
an unsigned int.
Some architectures (eg ppc) already had it declared as an unsigned int,
so I don't expect too many problems.
Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.org
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.
It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access. But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.
A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model. And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.
This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
There were a couple of notable cases:
- csky still had the old "verify_area()" name as an alias.
- the iter_iov code had magical hardcoded knowledge of the actual
values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
really used it)
- microblaze used the type argument for a debug printout
but other than those oddities this should be a total no-op patch.
I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something. Any missed conversion should be trivially fixable, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- A larger update for the zcrypt / AP bus code
+ Update two inline assemblies in the zcrypt driver to make gcc happy
+ Add a missing reply code for invalid special commands for zcrypt
+ Allow AP device reset to be triggered from user space
+ Split the AP scan function into smaller, more readable functions
- Updates for vfio-ccw and vfio-ap
+ Add maintainers and reviewer for vfio-ccw
+ Include facility.h in vfio_ap_drv.c to avoid fragile include chain
+ Simplicy vfio-ccw state machine
- Use the common code version of bust_spinlocks
- Make use of the DEFINE_SHOW_ATTRIBUTE
- Fix three incorrect file permissions in the DASD driver
- Remove bit spin-lock from the PCI interrupt handler
- Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJcLGoIAAoJEDjwexyKj9rgyN8IANaQvHbVBA3vz/Ssb6ZiR/K6
rTBoXjJQqyJ/cf6RZeFi1b4Douv4QWJw3s06KXbrdmK/ONm5rypXVfXlAhY71pg5
40BUb92MGXhJw6JFDQ50Udd6Z5r7r6RYR1puyg4tzHmBuNVL7FB5RqFm92UOkMOD
ZI03G1sfA6/1XUKhNfCfNBB6Jt6V+iAAex8bgrp09wAeoGnAO20oFuis9u7pLlNm
a5Cp9n7faXEN+qes1iBtVDr5o7opuhanwWKnhvsYTAbpOo7jGJ/47IPKT2Wfmurd
wkMZBEC+Ntk/IfkaBzp7azeISZD5EbucTcgo/I9nzq/aWeflfXXeYl7My0aQB48=
=Lqrh
-----END PGP SIGNATURE-----
Merge tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A larger update for the zcrypt / AP bus code:
+ Update two inline assemblies in the zcrypt driver to make gcc happy
+ Add a missing reply code for invalid special commands for zcrypt
+ Allow AP device reset to be triggered from user space
+ Split the AP scan function into smaller, more readable functions
- Updates for vfio-ccw and vfio-ap
+ Add maintainers and reviewer for vfio-ccw
+ Include facility.h in vfio_ap_drv.c to avoid fragile include chain
+ Simplicy vfio-ccw state machine
- Use the common code version of bust_spinlocks
- Make use of the DEFINE_SHOW_ATTRIBUTE
- Fix three incorrect file permissions in the DASD driver
- Remove bit spin-lock from the PCI interrupt handler
- Fix GFP_ATOMIC vs GFP_KERNEL in the PCI code
* tag 's390-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: rework ap scan bus code
s390/zcrypt: make sysfs reset attribute trigger queue reset
s390/pci: fix sleeping in atomic during hotplug
s390/pci: remove bit_lock usage in interrupt handler
s390/drivers: fix proc/debugfs file permissions
s390: convert to DEFINE_SHOW_ATTRIBUTE
MAINTAINERS/vfio-ccw: add Farhan and Eric, make Halil Reviewer
vfio: ccw: Merge BUSY and BOXED states
s390: use common bust_spinlocks()
s390/zcrypt: improve special ap message cmd handling
s390/ap: rework assembler functions to use unions for in/out register variables
s390: vfio-ap: include <asm/facility> for test_facility()
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch
code where it can potentially be implemented using either a different
bit in the preempt count or as an entirely separate entity.
Cc: Robert Love <rml@tech9.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The inline assembler functions ap_aqic() and ap_qact() used two
variables declared on the very same register. One variable was for
input only, the other for output. Looks like newer versions of the gcc
don't like this. Anyway it is a better coding to use one variable
(which may have a union data type) on one register for input and
output. So this patch introduces unions and uses only one variable now
for input and output for GR1 for the PQAP(QACT) and PQAP(QIC)
invocation.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- A fix for the pgtable_bytes misaccounting on s390. The patch changes
common code part in regard to page table folding and adds extra
checks to mm_[inc|dec]_nr_[pmds|puds].
- Add FORCE for all build targets using if_changed
- Use non-loadable phdr for the .vmlinux.info section to avoid
a segment overlap that confuses kexec
- Cleanup the attribute definition for the diagnostic sampling
- Increase stack size for CONFIG_KASAN=y builds
- Export __node_distance to fix a build error
- Correct return code of a PMU event init function
- An update for the default configs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJb5TIMAAoJEDjwexyKj9rgIH8H/0daZTyxcLwY9gbigaq1Qs4R
/ScmAJJc2U/Qj8b9UskhsmHAUuAufF2oljU16SquP7CBGhtkLRrjPtdh1AMiiZGM
reVF7X5LU8MH0QUoNnKPWAL4DD1q2E99IAEH5TeGIODUG6srqvIHBNtXDWNLPtBf
fpOhJ/NssgxyuYUXi/WnoEjIyP8KABeG6SlpcLzYbmY1hUOIXcixuv39UrL0G5OO
P8ciL+W5rTcPZCnpJ1Xk9hKploT8gWXhMT5QhNnakgMF/25v80+TZy5xRZMuLAmQ
T5SFP6B71o05nLK7fLi3VAIKPv/QibjiyJOEf9uUHdo1XZcD5uRu0EQ/LklLUBU=
=4H06
-----END PGP SIGNATURE-----
Merge tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
- A fix for the pgtable_bytes misaccounting on s390. The patch changes
common code part in regard to page table folding and adds extra
checks to mm_[inc|dec]_nr_[pmds|puds].
- Add FORCE for all build targets using if_changed
- Use non-loadable phdr for the .vmlinux.info section to avoid a
segment overlap that confuses kexec
- Cleanup the attribute definition for the diagnostic sampling
- Increase stack size for CONFIG_KASAN=y builds
- Export __node_distance to fix a build error
- Correct return code of a PMU event init function
- An update for the default configs
* tag 's390-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/perf: Change CPUM_CF return code in event init function
s390: update defconfigs
s390/mm: Fix ERROR: "__node_distance" undefined!
s390/kasan: increase instrumented stack size to 64k
s390/cpum_sf: Rework attribute definition for diagnostic sampling
s390/mm: fix mis-accounting of pgtable_bytes
mm: add mm_pxd_folded checks to pgtable_bytes accounting functions
mm: introduce mm_[p4d|pud|pmd]_folded
mm: make the __PAGETABLE_PxD_FOLDED defines non-empty
s390: avoid vmlinux segments overlap
s390/vdso: add missing FORCE to build targets
s390/decompressor: add missing FORCE to build targets
The __no_sanitize_address_or_inline and __no_kasan_or_inline defines
are almost identical. The only difference is that __no_kasan_or_inline
does not have the 'notrace' attribute.
To be able to replace __no_sanitize_address_or_inline with the older
definition, add 'notrace' to __no_kasan_or_inline and change to two
users of __no_sanitize_address_or_inline in the s390 code.
The 'notrace' option is necessary for e.g. the __load_psw_mask function
in arch/s390/include/asm/processor.h. Without the option it is possible
to trace __load_psw_mask which leads to kernel stack overflow.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pointed-out-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case a fork or a clone system fails in copy_process and the error
handling does the mmput() at the bad_fork_cleanup_mm label, the
following warning messages will appear on the console:
BUG: non-zero pgtables_bytes on freeing mm: 16384
The reason for that is the tricks we play with mm_inc_nr_puds() and
mm_inc_nr_pmds() in init_new_context().
A normal 64-bit process has 3 levels of page table, the p4d level and
the pud level are folded. On process termination the free_pud_range()
function in mm/memory.c will subtract 16KB from pgtable_bytes with a
mm_dec_nr_puds() call, but there actually is not really a pud table.
One issue with this is the fact that pgtable_bytes is usually off
by a few kilobytes, but the more severe problem is that for a failed
fork or clone the free_pgtables() function is not called. In this case
there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
BUG message. The message itself is purely cosmetic, but annoying.
To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
function to check for the true size of the address space.
Reported-by: Li Wang <liwang@redhat.com>
Tested-by: Li Wang <liwang@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Prefer _THIS_IP_ defined in linux/kernel.h.
Most definitions of current_text_addr were the same as _THIS_IP_, but
a few archs had inline assembly instead.
This patch removes the final call site of current_text_addr, making all
of the definitions dead code.
[akpm@linux-foundation.org: fix arch/csky/include/asm/processor.h]
Link: http://lkml.kernel.org/r/20180911182413.180715-1-ndesaulniers@google.com
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ARM:
- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups
- Port of dirty_log_test selftest
PPC:
- Nested HV KVM support for radix guests on POWER9. The performance is
much better than with PR KVM. Migration and arbitrary level of
nesting is supported.
- Disable nested HV-KVM on early POWER9 chips that need a particular hardware
bug workaround
- One VM per core mode to prevent potential data leaks
- PCI pass-through optimization
- merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base
s390:
- Initial version of AP crypto virtualization via vfio-mdev
- Improvement for vfio-ap
- Set the host program identifier
- Optimize page table locking
x86:
- Enable nested virtualization by default
- Implement Hyper-V IPI hypercalls
- Improve #PF and #DB handling
- Allow guests to use Enlightened VMCS
- Add migration selftests for VMCS and Enlightened VMCS
- Allow coalesced PIO accesses
- Add an option to perform nested VMCS host state consistency check
through hardware
- Automatic tuning of lapic_timer_advance_ns
- Many fixes, minor improvements, and cleanups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJb0FINAAoJEED/6hsPKofoI60IAJRS3vOAQ9Fav8cJsO1oBHcX
3+NexfnBke1bzrjIR3SUcHKGZbdnVPNZc+Q4JjIbPpPmmOMU5jc9BC1dmd5f4Vzh
BMnQ0yCvgFv3A3fy/Icx1Z8NJppxosdmqdQLrQrNo8aD3cjnqY2yQixdXrAfzLzw
XEgKdIFCCz8oVN/C9TT4wwJn6l9OE7BM5bMKGFy5VNXzMu7t64UDOLbbjZxNgi1g
teYvfVGdt5mH0N7b2GPPWRbJmgnz5ygVVpVNQUEFrdKZoCm6r5u9d19N+RRXAwan
ZYFj10W2T8pJOUf3tryev4V33X7MRQitfJBo4tP5hZfi9uRX89np5zP1CFE7AtY=
=yEPW
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"ARM:
- Improved guest IPA space support (32 to 52 bits)
- RAS event delivery for 32bit
- PMU fixes
- Guest entry hardening
- Various cleanups
- Port of dirty_log_test selftest
PPC:
- Nested HV KVM support for radix guests on POWER9. The performance
is much better than with PR KVM. Migration and arbitrary level of
nesting is supported.
- Disable nested HV-KVM on early POWER9 chips that need a particular
hardware bug workaround
- One VM per core mode to prevent potential data leaks
- PCI pass-through optimization
- merge ppc-kvm topic branch and kvm-ppc-fixes to get a better base
s390:
- Initial version of AP crypto virtualization via vfio-mdev
- Improvement for vfio-ap
- Set the host program identifier
- Optimize page table locking
x86:
- Enable nested virtualization by default
- Implement Hyper-V IPI hypercalls
- Improve #PF and #DB handling
- Allow guests to use Enlightened VMCS
- Add migration selftests for VMCS and Enlightened VMCS
- Allow coalesced PIO accesses
- Add an option to perform nested VMCS host state consistency check
through hardware
- Automatic tuning of lapic_timer_advance_ns
- Many fixes, minor improvements, and cleanups"
* tag 'kvm-4.20-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
KVM/nVMX: Do not validate that posted_intr_desc_addr is page aligned
Revert "kvm: x86: optimize dr6 restore"
KVM: PPC: Optimize clearing TCEs for sparse tables
x86/kvm/nVMX: tweak shadow fields
selftests/kvm: add missing executables to .gitignore
KVM: arm64: Safety check PSTATE when entering guest and handle IL
KVM: PPC: Book3S HV: Don't use streamlined entry path on early POWER9 chips
arm/arm64: KVM: Enable 32 bits kvm vcpu events support
arm/arm64: KVM: Rename function kvm_arch_dev_ioctl_check_extension()
KVM: arm64: Fix caching of host MDCR_EL2 value
KVM: VMX: enable nested virtualization by default
KVM/x86: Use 32bit xor to clear registers in svm.c
kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD
kvm: vmx: Defer setting of DR6 until #DB delivery
kvm: x86: Defer setting of CR2 until #PF delivery
kvm: x86: Add payload operands to kvm_multiple_exception
kvm: x86: Add exception payload fields to kvm_vcpu_events
kvm: x86: Add has_payload and payload to kvm_queued_exception
KVM: Documentation: Fix omission in struct kvm_vcpu_events
KVM: selftests: add Enlightened VMCS test
...
Pull timekeeping updates from Thomas Gleixner:
"The timers and timekeeping departement provides:
- Another large y2038 update with further preparations for providing
the y2038 safe timespecs closer to the syscalls.
- An overhaul of the SHCMT clocksource driver
- SPDX license identifier updates
- Small cleanups and fixes all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
tick/sched : Remove redundant cpu_online() check
clocksource/drivers/dw_apb: Add reset control
clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE
clocksource/drivers: Unify the names to timer-* format
clocksource/drivers/sh_cmt: Add R-Car gen3 support
dt-bindings: timer: renesas: cmt: document R-Car gen3 support
clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer
clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines
clocksource/drivers/sh_cmt: Fixup for 64-bit machines
clocksource/drivers/sh_tmu: Convert to SPDX identifiers
clocksource/drivers/sh_mtu2: Convert to SPDX identifiers
clocksource/drivers/sh_cmt: Convert to SPDX identifiers
clocksource/drivers/renesas-ostm: Convert to SPDX identifiers
clocksource: Convert to using %pOFn instead of device_node.name
tick/broadcast: Remove redundant check
RISC-V: Request newstat syscalls
y2038: signal: Change rt_sigtimedwait to use __kernel_timespec
y2038: socket: Change recvmmsg to use __kernel_timespec
y2038: sched: Change sched_rr_get_interval to use __kernel_timespec
y2038: utimes: Rework #ifdef guards for compat syscalls
...
Pull locking and misc x86 updates from Ingo Molnar:
"Lots of changes in this cycle - in part because locking/core attracted
a number of related x86 low level work which was easier to handle in a
single tree:
- Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
McKenney, Andrea Parri)
- lockdep scalability improvements and micro-optimizations (Waiman
Long)
- rwsem improvements (Waiman Long)
- spinlock micro-optimization (Matthew Wilcox)
- qspinlocks: Provide a liveness guarantee (more fairness) on x86.
(Peter Zijlstra)
- Add support for relative references in jump tables on arm64, x86
and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)
- Be a lot less permissive on weird (kernel address) uaccess faults
on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
Horn)
- macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
Amit)
- ... and a handful of other smaller changes as well"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
locking/lockdep: Make global debug_locks* variables read-mostly
locking/lockdep: Fix debug_locks off performance problem
locking/pvqspinlock: Extend node size when pvqspinlock is configured
locking/qspinlock_stat: Count instances of nested lock slowpaths
locking/qspinlock, x86: Provide liveness guarantee
x86/asm: 'Simplify' GEN_*_RMWcc() macros
locking/qspinlock: Rework some comments
locking/qspinlock: Re-order code
locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
futex: Replace spin_is_locked() with lockdep
locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
x86/refcount: Work around GCC inlining bug
x86/objtool: Use asm macros to work around GCC inlining bugs
...
When the kernel is built with:
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
"stfle" function used by kasan initialization code makes additional
call to preempt_count_add/preempt_count_sub. To avoid removing kasan
instrumentation from sched code where those functions leave split stfle
function and provide __stfle variant without preemption handling to be
used by Kasan.
Reported-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a new ioctl API and in-kernel API to transform
a variable length key blob of any supported type into a
protected key.
Transforming a secure key blob uses the already existing
function pkey_sec2protk().
Transforming a protected key blob also verifies if the
protected key is still valid. If not, -ENODEV is returned.
Both APIs are described in detail in the header files
arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h.
Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a new ioctl API and in-kernel API to verify if a
random protected key is still valid. A protected key is
invalid when its wrapping key verification pattern does not
match the verification pattern of the LPAR. Each time an LPAR
is activated, a new LPAR wrapping key is generated and the
wrapping key verification pattern is updated.
Both APIs are described in detail in the header files
arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h.
Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces a new ioctl API and in-kernel API to
generate a random protected key. The protected key is generated
in a way that the effective clear key is never exposed in clear.
Both APIs are described in detail in the header files
arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h.
Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Early boot stack uses predefined 4 pages of memory 0x8000-0xC000. This
stack is used to run not instumented decompressor/facilities
verification C code. It doesn't make sense to double its size when
the kernel is built with KASAN support. BOOT_STACK_ORDER is introduced
to avoid that.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
By default 3-level paging is used when the kernel is compiled with
kasan support. Add 4-level paging option to support systems with more
then 3TB of physical memory and to cover 4-level paging specific code
with kasan as well.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Kasan initialization code is changed to populate persistent shadow
first, save allocator position into pgalloc_freeable and proceed with
early identity mapping creation. This way early identity mapping paging
structures could be freed at once after switching to swapper_pg_dir
when early identity mapping is not needed anymore.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
By defining KASAN_SHADOW_OFFSET in Kconfig stack and global variables
memory access check instrumentation is enabled. gcc version 4.9.2 or
newer is also required.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Some functions from both arch/s390/kernel/ipl.c and
arch/s390/kernel/machine_kexec.c are called without DAT enabled
(or with and without DAT enabled code paths). There is no easy way
to partially disable kasan for those files without a substantial
rework. Disable kasan for both files for now.
To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is
enabled in diag308 call. pcpu_delegate which disables DAT is marked
with __no_sanitize_address to disable instrumentation for that one
function.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
smp_start_secondary function is called without DAT enabled. To avoid
disabling kasan instrumentation for entire arch/s390/kernel/smp.c
smp_start_secondary has been split in 2 parts. smp_start_secondary has
instrumentation disabled, it does minimal setup and enables DAT. Then
instrumentated __smp_start_secondary is called to do the rest.
__load_psw_mask function instrumentation has been disabled as well
to be able to call it from smp_start_secondary.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To lower memory footprint and speed up kasan initialisation detect
EDAT availability and use large pages if possible. As we know how
much memory is needed for initialisation, another simplistic large
page allocator is introduced to avoid memory fragmentation.
Since facilities list is retrieved anyhow, detect noexec support and
adjust pages attributes. Handle noexec kernel option to avoid inconsistent
kasan shadow memory pages flags.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Kasan stack instrumentation pads stack variables with redzones, which
increases stack frames size significantly. Stack sizes are increased
from 16k to 32k in the code, as well as for the kernel stack overflow
detection option (CHECK_STACK).
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Kasan needs 1/8 of kernel virtual address space to be reserved as the
shadow area. And eventually it requires the shadow memory offset to be
known at compile time (passed to the compiler when full instrumentation
is enabled). Any value picked as the shadow area offset for 3-level
paging would eat up identity mapping on 4-level paging (with 1PB
shadow area size). So, the kernel sticks to 3-level paging when kasan
is enabled. 3TB border is picked as the shadow offset. The memory
layout is adjusted so, that physical memory border does not exceed
KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END.
Due to the fact that on s390 paging is set up very late and to cover
more code with kasan instrumentation, temporary identity mapping and
final shadow memory are set up early. The shadow memory mapping is
later carried over to init_mm.pgd during paging_init.
For the needs of paging structures allocation and shadow memory
population a primitive allocator is used, which simply chops off
memory blocks from the end of the physical memory.
Kasan currenty doesn't track vmemmap and vmalloc areas.
Current memory layout (for 3-level paging, 2GB physical memory).
---[ Identity Mapping ]---
0x0000000000000000-0x0000000000100000
---[ Kernel Image Start ]---
0x0000000000100000-0x0000000002b00000
---[ Kernel Image End ]---
0x0000000002b00000-0x0000000080000000 2G <- physical memory border
0x0000000080000000-0x0000030000000000 3070G PUD I
---[ Kasan Shadow Start ]---
0x0000030000000000-0x0000030010000000 256M PMD RW X <- shadow for 2G memory
0x0000030010000000-0x0000037ff0000000 523776M PTE RO NX <- kasan zero ro page
0x0000037ff0000000-0x0000038000000000 256M PMD RW X <- shadow for 2G modules
---[ Kasan Shadow End ]---
0x0000038000000000-0x000003d100000000 324G PUD I
---[ vmemmap Area ]---
0x000003d100000000-0x000003e080000000
---[ vmalloc Area ]---
0x000003e080000000-0x000003ff80000000
---[ Modules Area ]---
0x000003ff80000000-0x0000040000000000 2G
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add pgd_page primitive which is required by kasan common code.
Also fixes typo in p4d_page definition.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Kasan common code requires MAX_PTRS_PER_P4D definition, which in case
of s390 is always PTRS_PER_P4D.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Follow the common kasan approach:
"KASan replaces memory functions with manually instrumented
variants. Original functions declared as weak symbols so strong
definitions in mm/kasan/kasan.c could replace them. Original
functions have aliases with '__' prefix in name, so we could call
non-instrumented variant if needed."
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
kasan common code uses pfn_to_kaddr, which is defined by many other
architectures. Adding it as well to avoid a build error.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To distinguish zfcpdump case and to be able to parse some of the kernel
command line arguments early (e.g. mem=) ipl block retrieval and command
line construction code is moved to the early boot phase.
"memory_end" is set up correctly respecting "mem=" and hsa_size in case
of the zfcpdump.
arch/s390/boot/string.c is introduced to provide string handling and
command line parsing functions to early boot phase code for the compressed
kernel image case.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce sclp_early_get_hsa_size function to be used during early
memory detection. This function allows to find a memory limit imposed
during zfcpdump.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In a situation when other memory detection methods are not available
(no SCLP and no z/VM diag260), continuous online memory is assumed.
Replacing tprot loop with faster binary search, as only online memory
end has to be found.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When neither SCLP storage info, nor z/VM diag260 "storage configuration"
are available assume a continuous online memory of size specified by
SCLP info.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In the case when z/VM memory is defined with "define storage config"
command, SCLP storage info is not available. Utilize diag260 "storage
configuration" call, to get information about z/VM specific guest memory
definitions with potential memory holes.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SCLP storage info allows to detect continuous and non-continuous online
memory under LPAR, z/VM and KVM, when standby memory is defined.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move memory detection to early boot phase. To store online memory
regions "struct mem_detect_info" has been introduced together with
for_each_mem_detect_block iterator. mem_detect_info is later converted
to memblock.
Also introduces sclp_early_get_meminfo function to get maximum physical
memory and maximum increment number.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To enable early online memory detection sclp_early_read_info has
been moved to sclp_early_core.c. sclp_info_sccb has been made a part
of .boot.data, which allows to reuse it later during early kernel
startup and make sclp_early_read_info call just once.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce .boot.data section which is "shared" between the decompressor
code and the decompressed kernel. The decompressor will store values in
it, and copy over to the decompressed image before starting it. This
method allows to avoid using pre-defined addresses and other hacks to
pass values between those boot phases.
.boot.data section is a part of init data, and will be freed after kernel
initialization is complete.
For uncompressed kernel image, .boot.data section is basically the same
as .init.data
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove STACK_ORDER and STACK_SIZE in favour of identical THREAD_SIZE_ORDER
and THREAD_SIZE definitions. THREAD_SIZE and THREAD_SIZE_ORDER naming is
misleading since it is used as general kernel stack size information. But
both those definitions are used in the common code and throughout
architectures specific code, so changing the naming is problematic.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With virtually mapped kernel stacks the kernel stack overflow detection
is now fault based, every stack has a guard page in the vmalloc space.
The panic_stack is renamed to nodat_stack and is used for all function
that need to run without DAT, e.g. memcpy_real or do_start_kdump.
The main effect is a reduction in the kernel image size as with vmap
stacks the old style overflow checking that adds two instructions per
function is not needed anymore. Result from bloat-o-meter:
add/remove: 20/1 grow/shrink: 13/26854 up/down: 2198/-216240 (-214042)
In regard to performance the micro-benchmark for fork has a hit of a
few microseconds, allocating 4 pages in vmalloc space is more expensive
compare to an order-2 page allocation. But with real workload I could
not find a noticeable difference.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In preparation for CONFIG_VMAP_STACK=y move the allocation of the
struct appldata_parameter_list to the caller of appldata_asm().
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide function to find a ccwgroup device by its busid.
Acked-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
kvm_arch_crypto_set_masks is a new function to centralize
the setup the APCB masks inside the CRYCB SIE satellite.
To trace APCB mask changes, we add KVM_EVENT() tracing to
both kvm_arch_crypto_set_masks and kvm_arch_crypto_clear_masks.
Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Message-Id: <1538728270-10340-2-git-send-email-pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
A host program identifier (HPID) provides information regarding the
underlying host environment. A level-2 (VM) guest will have an HPID
denoting Linux/KVM, which is set during VCPU setup. A level-3 (VM on a
VM) and beyond guest will have an HPID denoting KVM vSIE, which is set
for all shadow control blocks, overriding the original value of the
HPID.
Signed-off-by: Collin Walling <walling@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Message-Id: <1535734279-10204-4-git-send-email-walling@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Implements the open callback on the mediated matrix device.
The function registers a group notifier to receive notification
of the VFIO_GROUP_NOTIFY_SET_KVM event. When notified,
the vfio_ap device driver will get access to the guest's
kvm structure. The open callback must ensure that only one
mediated device shall be opened per guest.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Acked-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-12-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Introduces a new KVM function to clear the APCB0 and APCB1 in the guest's
CRYCB. This effectively clears all bits of the APM, AQM and ADM masks
configured for the guest. The VCPUs are taken out of SIE to ensure the
VCPUs do not get out of sync.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Tested-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180925231641.4954-11-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch refactors the code that initializes and sets up the
crypto configuration for a guest. The following changes are
implemented via this patch:
1. Introduces a flag indicating AP instructions executed on
the guest shall be interpreted by the firmware. This flag
is used to set a bit in the guest's state description
indicating AP instructions are to be interpreted.
2. Replace code implementing AP interfaces with code supplied
by the AP bus to query the AP configuration.
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Message-Id: <20180925231641.4954-4-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When we change the crycb (or execution controls), we also have to make sure
that the vSIE shadow datastructures properly consider the changed
values before rerunning the vSIE. We can achieve that by simply using a
VCPU request now.
This has to be a synchronous request (== handled before entering the
(v)SIE again).
The request will make sure that the vSIE handler is left, and that the
request will be processed (NOP), therefore forcing a reload of all
vSIE data (including rebuilding the crycb) when re-entering the vSIE
interception handler the next time.
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180925231641.4954-3-akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
I've stumbled over this too many times now... AOBs are only ever used on
Output Queues. So in qdio_kick_handler(), move the call to their handler
into the Output-only path, and get rid of the convoluted contains_aobs()
helper. No functional change.
While at it, also remove
1. the unused sbal_state->aob field. For processing an async completion,
upper-layer drivers get their AOB pointer from the CQ buffer.
2. an unused EXPORT for qdio_allocate_aob(). External users would have
no way of passing an allocated AOB back into qdio.ko anyways...
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
vdso_fault used is_compat_task function (on s390 it tests "current"
thread_info flags) to distinguish compat tasks and map 31-bit vdso
pages. But "current" task might not correspond to mm context.
When 31-bit compat inferior is executed under gdb, gdb does
PTRACE_PEEKTEXT on vdso page, causing vdso_fault with "current" being
64-bit gdb process. So, 31-bit inferior ends up with 64-bit vdso mapped.
To avoid this problem a new compat_mm flag has been introduced into
mm context. This flag is used in vdso_fault and vdso_mremap instead
of is_compat_task.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The SCLP event 24 "Adapter Error Notification" supports three different
action qualifier of which 'adapter reset' is currently not enabled in
the sysfs interface. However, userspace tools might want to be able
to use the reset functionality as well. Enable the 'adapter reset'
qualifier.
Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The resume code checks if the resume cpu is the same as the suspend cpu.
If not, and if it is also not possible to switch to the suspend cpu, an
error message should be printed and the resume process should be stopped
by loading a disabled wait psw.
The current logic is broken in multiple ways, the message is never printed,
and the disabled wait psw never loaded because the kernel panics before that:
- sam31 and SIGP_SET_ARCHITECTURE to ESA mode is wrong, this will break
on the first 64bit instruction in sclp_early_printk().
- The init stack should be used, but the stack pointer is not set up correctly
(missing aghi %r15,-STACK_FRAME_OVERHEAD).
- __sclp_early_printk() checks the sclp_init_state. If it is not
sclp_init_state_uninitialized, it simply returns w/o printing anything.
In the resumed kernel however, sclp_init_state will never be uninitialized.
This patch fixes those issues by removing the sam31/ESA logic, adding a
correct init stack pointer, and also introducing sclp_early_printk_force()
to allow using sclp_early_printk() even when sclp_init_state is not
uninitialized.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have to do down_write on the mm semaphore to set a bitfield in the
mm context.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
After changing over to 64-bit time_t syscalls, many architectures will
want compat_sys_utimensat() but not respective handlers for utime(),
utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to
complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that
support CONFIG_COMPAT set it, but future 64-bit architectures will not
(tile would not have needed it either, but got removed).
As older 32-bit architectures get converted to using CONFIG_64BIT_TIME,
they will have to use __ARCH_WANT_SYS_UTIME32 instead of
__ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't
need either of them as they never had a utime syscall.
Since the compat_utimbuf structure is now required outside of
CONFIG_COMPAT, I'm moving it into compat_time.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
changed from last version:
- renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32
The sys_llseek sytem call is needed on all 32-bit architectures and
none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard
and simplify the include/asm-generic/unistd.h header further.
Since 32-bit tasks can run either natively or in compat mode on 64-bit
architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT.
There are a few 64-bit architectures that also reference sys_llseek
in their 64-bit ABI (e.g. sparc), but I verified that those all
select CONFIG_COMPAT, so the #if check is still correct here. It's
a bit odd to include it in the syscall table though, as it's the
same as sys_lseek() on 64-bit, but with strange calling conventions.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
While converting compat system call handlers to work on 32-bit
architectures, I found a number of types used in those handlers
that are identical between all architectures.
Let's move all the identical ones into asm-generic/compat.h to avoid
having to add even more identical definitions of those types.
For unknown reasons, mips defines __compat_gid32_t, __compat_uid32_t
and compat_caddr_t as signed, while all others have them unsigned.
This seems to be a mistake, but I'm leaving it alone here. The other
types all differ by size or alignment on at least on architecture.
compat_aio_context_t is currently defined in linux/compat.h but
also needed for compat_sys_io_getevents(), so let's move it into
the same place.
While we still have not decided whether the 32-bit time handling
will always use the compat syscalls, or in which form, I think this
is a useful cleanup that we can merge regardless.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
We have four generations of stat() syscalls:
- the oldstat syscalls that are only used on the older architectures
- the newstat family that is used on all 64-bit architectures but
lacked support for large files on 32-bit architectures.
- the stat64 family that is used mostly on 32-bit architectures to
replace newstat
- statx() to replace all of the above, adding 64-bit timestamps among
other things.
We already compile stat64 only on those architectures that need it,
but newstat is always built, including on those that don't reference
it. This adds a new __ARCH_WANT_NEW_STAT symbol along the lines of
__ARCH_WANT_OLD_STAT and __ARCH_WANT_STAT64 to control compilation of
newstat. All architectures that need it use an explict define, the
others now get a little bit smaller, and future architecture (including
64-bit targets) won't ever see it.
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull s390 updates from Martin Schwidefsky:
- A couple of patches for the zcrypt driver:
+ Add two masks to determine which AP cards and queues are host
devices, this will be useful for KVM AP device passthrough
+ Add-on patch to improve the parsing of the new apmask and aqmask
+ Some code beautification
- Second try to reenable the GCC plugins, the first patch set had a
patch to do this but the merge somehow missed this
- Remove the s390 specific GCC version check and use the generic one
- Three patches for kdump, two bug fixes and one cleanup
- Three patches for the PCI layer, one bug fix and two cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: remove gcc version check (4.3 or newer)
s390/zcrypt: hex string mask improvements for apmask and aqmask.
s390/zcrypt: AP bus support for alternate driver(s)
s390/zcrypt: code beautify
s390/zcrypt: switch return type to bool for ap_instructions_available()
s390/kdump: Remove kzalloc_panic
s390/kdump: Fix memleak in nt_vmcoreinfo
s390/kdump: Make elfcorehdr size calculation ABI compliant
s390/pci: remove fmb address from debug output
s390/pci: remove stale rc
s390/pci: fix out of bounds access during irq setup
s390/zcrypt: fix ap_instructions_available() returncodes
s390: reenable gcc plugins for real
- Restructure of lockdep and latency tracers
This is the biggest change. Joel Fernandes restructured the hooks
from irqs and preemption disabling and enabling. He got rid of
a lot of the preprocessor #ifdef mess that they caused.
He turned both lockdep and the latency tracers to use trace events
inserted in the preempt/irqs disabling paths. But unfortunately,
these started to cause issues in corner cases. Thus, parts of the
code was reverted back to where lockde and the latency tracers
just get called directly (without using the trace events).
But because the original change cleaned up the code very nicely
we kept that, as well as the trace events for preempt and irqs
disabling, but they are limited to not being called in NMIs.
- Have trace events use SRCU for "rcu idle" calls. This was required
for the preempt/irqs off trace events. But it also had to not
allow them to be called in NMI context. Waiting till Paul makes
an NMI safe SRCU API.
- New notrace SRCU API to allow trace events to use SRCU.
- Addition of mcount-nop option support
- SPDX headers replacing GPL templates.
- Various other fixes and clean ups.
- Some fixes are marked for stable, but were not fully tested
before the merge window opened.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCW3ruhRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qiM7AP47NhYdSnCFCRUJfrt6PovXmQtuCHt3
c3QMoGGdvzh9YAEAqcSXwh7uLhpHUp1LjMAPkXdZVwNddf4zJQ1zyxQ+EAU=
=vgEr
-----END PGP SIGNATURE-----
Merge tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Restructure of lockdep and latency tracers
This is the biggest change. Joel Fernandes restructured the hooks
from irqs and preemption disabling and enabling. He got rid of a lot
of the preprocessor #ifdef mess that they caused.
He turned both lockdep and the latency tracers to use trace events
inserted in the preempt/irqs disabling paths. But unfortunately,
these started to cause issues in corner cases. Thus, parts of the
code was reverted back to where lockdep and the latency tracers just
get called directly (without using the trace events). But because the
original change cleaned up the code very nicely we kept that, as well
as the trace events for preempt and irqs disabling, but they are
limited to not being called in NMIs.
- Have trace events use SRCU for "rcu idle" calls. This was required
for the preempt/irqs off trace events. But it also had to not allow
them to be called in NMI context. Waiting till Paul makes an NMI safe
SRCU API.
- New notrace SRCU API to allow trace events to use SRCU.
- Addition of mcount-nop option support
- SPDX headers replacing GPL templates.
- Various other fixes and clean ups.
- Some fixes are marked for stable, but were not fully tested before
the merge window opened.
* tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
tracing: Fix SPDX format headers to use C++ style comments
tracing: Add SPDX License format tags to tracing files
tracing: Add SPDX License format to bpf_trace.c
blktrace: Add SPDX License format header
s390/ftrace: Add -mfentry and -mnop-mcount support
tracing: Add -mcount-nop option support
tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
tracing: Handle CC_FLAGS_FTRACE more accurately
Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
Uprobes: Simplify uprobe_register() body
tracepoints: Free early tracepoints after RCU is initialized
uprobes: Use synchronize_rcu() not synchronize_sched()
tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
ftrace: Remove unused pointer ftrace_swapper_pid
tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing/irqsoff: Handle preempt_count for different configs
tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
tracing: irqsoff: Account for additional preempt_disable
trace: Use rcu_dereference_raw for hooks from trace-event subsystem
tracing/kprobes: Fix within_notrace_func() to check only notrace functions
...
Function ap_instructions_available() had returntype int but
in fact returned 1 for true and 0 for false. Changed returntype
to bool.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For x86 this brings in PCID emulation and CR3 caching for shadow page
tables, nested VMX live migration, nested VMCS shadowing, an optimized
IPI hypercall, and some optimizations.
ARM will come next week.
There is a semantic conflict because tip also added an .init_platform
callback to kvm.c. Please keep the initializer from this branch,
and add a call to kvmclock_init (added by tip) inside kvm_init_platform
(added here).
Also, there is a backmerge from 4.18-rc6. This is because of a
refactoring that conflicted with a relatively late bugfix and
resulted in a particularly hellish conflict. Because the conflict
was only due to unfortunate timing of the bugfix, I backmerged and
rebased the refactoring rather than force the resolution on you.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbdwNFAAoJEL/70l94x66DiPEH/1cAGZWGd85Y3yRu1dmTmqiz
kZy0V+WTQ5kyJF4ZsZKKOp+xK7Qxh5e9kLdTo70uPZCHwLu9IaGKN9+dL9Jar3DR
yLPX5bMsL8UUed9g9mlhdaNOquWi7d7BseCOnIyRTolb+cqnM5h3sle0gqXloVrS
UQb4QogDz8+86czqR8tNfazjQRKW/D2HEGD5NDNVY1qtpY+leCDAn9/u6hUT5c6z
EtufgyDh35UN+UQH0e2605gt3nN3nw3FiQJFwFF1bKeQ7k5ByWkuGQI68XtFVhs+
2WfqL3ftERkKzUOy/WoSJX/C9owvhMcpAuHDGOIlFwguNGroZivOMVnACG1AI3I=
=9Mgw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull first set of KVM updates from Paolo Bonzini:
"PPC:
- minor code cleanups
x86:
- PCID emulation and CR3 caching for shadow page tables
- nested VMX live migration
- nested VMCS shadowing
- optimized IPI hypercall
- some optimizations
ARM will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (85 commits)
kvm: x86: Set highest physical address bits in non-present/reserved SPTEs
KVM/x86: Use CC_SET()/CC_OUT in arch/x86/kvm/vmx.c
KVM: X86: Implement PV IPIs in linux guest
KVM: X86: Add kvm hypervisor init time platform setup callback
KVM: X86: Implement "send IPI" hypercall
KVM/x86: Move X86_CR4_OSXSAVE check into kvm_valid_sregs()
KVM: x86: Skip pae_root shadow allocation if tdp enabled
KVM/MMU: Combine flushing remote tlb in mmu_set_spte()
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_BASE when possible
KVM: vmx: skip VMWRITE of HOST_{FS,GS}_SEL when possible
KVM: vmx: always initialize HOST_{FS,GS}_BASE to zero during setup
KVM: vmx: move struct host_state usage to struct loaded_vmcs
KVM: vmx: compute need to reload FS/GS/LDT on demand
KVM: nVMX: remove a misleading comment regarding vmcs02 fields
KVM: vmx: rename __vmx_load_host_state() and vmx_save_host_state()
KVM: vmx: add dedicated utility to access guest's kernel_gs_base
KVM: vmx: track host_state.loaded using a loaded_vmcs pointer
KVM: vmx: refactor segmentation code in vmx_save_host_state()
kvm: nVMX: Fix fault priority for VMX operations
kvm: nVMX: Fix fault vector for VMX operation at CPL > 0
...
During review of KVM patches it was complained that the
ap_instructions_available() function returns 0 if AP
instructions are available and -ENODEV if not. The function
acts like a boolean function to check for AP instructions
available and thus should return 0 on failure and != 0 on
success. Changed to the suggested behaviour and adapted
the one and only caller of this function which is the ap
bus core code.
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Utilize -mfentry and -mnop-mcount gcc options together with
-mrecord-mcount to get compiler generated calls to the profiling functions
as nops which are compatible with current -mhotpatch=0,3 approach. At the
same time -mrecord-mcount enables __mcount_loc section generation by
the compiler which allows to avoid using scripts/recordmcount.pl script.
Link: http://lkml.kernel.org/r/patch-4.thread-aa7b8d.git-aa7b8dbf236f.your-ad-here.call-01533557518-ext-9465@work.hours
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull s390 updates from Heiko Carstens:
"Since Martin is on vacation you get the s390 pull request from me:
- Host large page support for KVM guests. As the patches have large
impact on arch/s390/mm/ this series goes out via both the KVM and
the s390 tree.
- Add an option for no compression to the "Kernel compression mode"
menu, this will come in handy with the rework of the early boot
code.
- A large rework of the early boot code that will make life easier
for KASAN and KASLR. With the rework the bootable uncompressed
image is not generated anymore, only the bzImage is available. For
debuggung purposes the new "no compression" option is used.
- Re-enable the gcc plugins as the issue with the latent entropy
plugin is solved with the early boot code rework.
- More spectre relates changes:
+ Detect the etoken facility and remove expolines automatically.
+ Add expolines to a few more indirect branches.
- A rewrite of the common I/O layer trace points to make them
consumable by 'perf stat'.
- Add support for format-3 PCI function measurement blocks.
- Changes for the zcrypt driver:
+ Add attributes to indicate the load of cards and queues.
+ Restructure some code for the upcoming AP device support in KVM.
- Build flags improvements in various Makefiles.
- A few fixes for the kdump support.
- A couple of patches for gcc 8 compile warning cleanup.
- Cleanup s390 specific proc handlers.
- Add s390 support to the restartable sequence self tests.
- Some PTR_RET vs PTR_ERR_OR_ZERO cleanup.
- Lots of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (107 commits)
s390/dasd: fix hanging offline processing due to canceled worker
s390/dasd: fix panic for failed online processing
s390/mm: fix addressing exception after suspend/resume
rseq/selftests: add s390 support
s390: fix br_r1_trampoline for machines without exrl
s390/lib: use expoline for all bcr instructions
s390/numa: move initial setup of node_to_cpumask_map
s390/kdump: Fix elfcorehdr size calculation
s390/cpum_sf: save TOD clock base in SDBs for time conversion
KVM: s390: Add huge page enablement control
s390/mm: Add huge page gmap linking support
s390/mm: hugetlb pages within a gmap can not be freed
KVM: s390: Add skey emulation fault handling
s390/mm: Add huge pmd storage key handling
s390/mm: Clear skeys for newly mapped huge guest pmds
s390/mm: Clear huge page storage keys on enable_skey
s390/mm: Add huge page dirty sync support
s390/mm: Add gmap pmd invalidation and clearing
s390/mm: Add gmap pmd notification bit setting
s390/mm: Add gmap pmd linking
...
Pull perf update from Thomas Gleixner:
"The perf crowd presents:
Kernel updates:
- Removal of jprobes
- Cleanup and consolidatation the handling of kprobes
- Cleanup and consolidation of hardware breakpoints
- The usual pile of fixes and updates to PMUs and event descriptors
Tooling updates:
- Updates and improvements all over the place. Nothing outstanding,
just the (good) boring incremental grump work"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
perf trace: Do not require --no-syscalls to suppress strace like output
perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's bpf.h
perf tools: Allow overriding MAX_NR_CPUS at compile time
perf bpf: Show better message when failing to load an object
perf list: Unify metric group description format with PMU event description
perf vendor events arm64: Update ThunderX2 implementation defined pmu core events
perf cs-etm: Generate branch sample for CS_ETM_TRACE_ON packet
perf cs-etm: Generate branch sample when receiving a CS_ETM_TRACE_ON packet
perf cs-etm: Support dummy address value for CS_ETM_TRACE_ON packet
perf cs-etm: Fix start tracing packet handling
perf build: Fix installation directory for eBPF
perf c2c report: Fix crash for empty browser
perf tests: Fix indexing when invoking subtests
perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' args
perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
perf trace beauty: Do not print NULL strarray entries
perf beauty: Add a generator for IPPROTO_ socket's protocol constants
tools include uapi: Grab a copy of linux/in.h
perf tests: Fix complex event name parsing
perf evlist: Fix error out while applying initial delay and LBR
...
Processing the samples in the AUX-area by perf requires the computation
of respective time stamps. The time stamps used by perf are based on
the monotonic clock. To convert the TOD clock value contained in an
SDB to a monotonic clock value, the TOD clock base is required. Hence,
also save the TOD clock base in the SDB.
Suggested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJbX4AwAAoJEBF7vIC1phx85eMP/ifsNHwqfAOrZBdlJuLVPla5
47J8iY4i4DOKGhKI4YOTcJQhn1izKZhECXS8d8hghB/sQUCE2CLVr1X/r1Udy2Pq
bpKG4apYtcJZBF6qn7yDMjBGkIRK4OCBD1pkuKEq2NyvUgPsHUVUgpuq2gngMTBk
ZN9MIfRQMdIEJsT389D6T9as0lwABJ0MJap5AudkQwguN2dDhQGeZv8l0QYV8C2I
WqRI2VsI1QEo3cJr1lJ5li/F9fC7q0l6QwlvPVocIHJAnq01zJvOekeAgQ4hzz16
JIoQckJq8m4d4PqZ7aWmAaMEemoQ9llmCavovspJNtFT79jho6cWWtBEvq+t0GLQ
qTsG9Yi20hONZMWAw+JIdSdOuFMD0HCpOWdUtSMjENFRbr8LLHUr91dGIxRLjF8Z
gv3vDJrbGzCQ+b9qPA8SrAN7U3VNCZG384MEmobwTuv5hxOopWp6chcK7RCriV/m
7cFDfO7+2pZymdW7D4DWlFiZl4mWpwOxip32C9tCt0CQveqeYSZsb5Qb9Pe+50vr
JhpB74UL79Wffvd65InGlu5jx1SdGG0QAzmBOkdOsAhX+0WMmXRB1ddn4whu7HPU
ssNtdKgLt9KkM/kIsB9RC/YLvUFK1lBVHrfnzUmLw3CBHP3QeO+V+arLwdVLVDjV
PA/LPECBWtGtQtxGWb2H
=Y0Wl
-----END PGP SIGNATURE-----
Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features
Pull hlp_stage1 from Christian Borntraeger with the following changes:
KVM: s390: initial host large page support
- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJbX4AwAAoJEBF7vIC1phx85eMP/ifsNHwqfAOrZBdlJuLVPla5
47J8iY4i4DOKGhKI4YOTcJQhn1izKZhECXS8d8hghB/sQUCE2CLVr1X/r1Udy2Pq
bpKG4apYtcJZBF6qn7yDMjBGkIRK4OCBD1pkuKEq2NyvUgPsHUVUgpuq2gngMTBk
ZN9MIfRQMdIEJsT389D6T9as0lwABJ0MJap5AudkQwguN2dDhQGeZv8l0QYV8C2I
WqRI2VsI1QEo3cJr1lJ5li/F9fC7q0l6QwlvPVocIHJAnq01zJvOekeAgQ4hzz16
JIoQckJq8m4d4PqZ7aWmAaMEemoQ9llmCavovspJNtFT79jho6cWWtBEvq+t0GLQ
qTsG9Yi20hONZMWAw+JIdSdOuFMD0HCpOWdUtSMjENFRbr8LLHUr91dGIxRLjF8Z
gv3vDJrbGzCQ+b9qPA8SrAN7U3VNCZG384MEmobwTuv5hxOopWp6chcK7RCriV/m
7cFDfO7+2pZymdW7D4DWlFiZl4mWpwOxip32C9tCt0CQveqeYSZsb5Qb9Pe+50vr
JhpB74UL79Wffvd65InGlu5jx1SdGG0QAzmBOkdOsAhX+0WMmXRB1ddn4whu7HPU
ssNtdKgLt9KkM/kIsB9RC/YLvUFK1lBVHrfnzUmLw3CBHP3QeO+V+arLwdVLVDjV
PA/LPECBWtGtQtxGWb2H
=Y0Wl
-----END PGP SIGNATURE-----
Merge tag 'hlp_stage1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvms390/next
KVM: s390: initial host large page support
- must be enabled via module parameter hpage=1
- cannot be used together with nested
- does support migration
- does support hugetlbfs
- no THP yet
Let's allow huge pmd linking when enabled through the
KVM_CAP_S390_HPAGE_1M capability. Also we can now restrict gmap
invalidation and notification to the cases where the capability has
been activated and save some cycles when that's not the case.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Let's introduce an explicit check if skeys have already been enabled
for the vcpu, so we don't have to check the mm context if we don't have
the storage key facility.
This lets us check for enablement without having to take the mm
semaphore and thus speedup skey emulation.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Similarly to the pte skey handling, where we set the storage key to
the default key for each newly mapped pte, we have to also do that for
huge pmds.
With the PG_arch_1 flag we keep track if the area has already been
cleared of its skeys.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To do dirty loging with huge pages, we protect huge pmds in the
gmap. When they are written to, we unprotect them and mark them dirty.
We introduce the function gmap_test_and_clear_dirty_pmd which handles
dirty sync for huge pages.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
If the host invalidates a pmd, we also have to invalidate the
corresponding gmap pmds, as well as flush them from the TLB. This is
necessary, as we don't share the pmd tables between host and guest as
we do with ptes.
The clearing part of these three new functions sets a guest pmd entry
to _SEGMENT_ENTRY_EMPTY, so the guest will fault on it and we will
re-link it.
Flushing the gmap is not necessary in the host's lazy local and csp
cases. Both purge the TLB completely.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Like for ptes, we also need invalidation notification for pmds, to
make sure the guest lowcore pages are always accessible and later
addition of shadowed pmds.
With PMDs we do not have PGSTEs or some other bits we could use in the
host PMD. Instead we pick one of the free bits in the gmap PMD. Every
time a host pmd will be invalidated, we will check if the respective
gmap PMD has the bit set and in that case fire up the notifier.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Let's allow pmds to be linked into gmap for the upcoming s390 KVM huge
page support.
Before this patch we copied the full userspace pmd entry. This is not
correct, as it contains SW defined bits that might be interpreted
differently in the GMAP context. Now we only copy over all hardware
relevant information leaving out the software bits.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Currently we use the software PGSTE bits PGSTE_IN_BIT and
PGSTE_VSIE_BIT to notify before an invalidation occurs on a prefix
page or a VSIE page respectively. Both bits are pgste specific, but
are used when protecting a memory range.
Let's introduce abstract GMAP_NOTIFY_* bits that will be realized into
the respective bits when gmap DAT table entries are protected.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
We want to provide facility 156 (etoken facility) to our
guests. This includes migration support (via sync regs) and
VSIE changes. The tokens are being reset on clear reset. This
has to be implemented by userspace (via sync regs).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
This is a fix for several issues that were found in the original code
for storage attributes migration.
Now no bitmap is allocated to keep track of dirty storage attributes;
the extra bits of the per-memslot bitmap that are always present anyway
are now used for this purpose.
The code has also been refactored a little to improve readability.
Fixes: 190df4a212 ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Fixes: 4036e3874a ("KVM: s390: ioctls to get and set guest storage attributes")
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Message-Id: <1525106005-13931-3-git-send-email-imbrenda@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently there are some variables in the purgatory (e.g. kernel_entry)
which are defined twice, once in assembler- and once in c-code. The reason
for this is that these variables are set during purgatory load, where
sanity checks on the corresponding Elf_Sym's are made, while they are used
in assembler-code. Thus adding a second definition in c-code is a handy
workaround to guarantee correct Elf_Sym's are created.
When the purgatory is compiled with -fcommon (default for gcc on s390) this
is no problem because both symbols are merged by the linker. However this
is not required by ISO C and when the purgatory is built with -fno-common
the linker fails with errors like
arch/s390/purgatory/purgatory.o:(.bss+0x18): multiple definition of `kernel_entry'
arch/s390/purgatory/head.o:/.../arch/s390/purgatory/head.S:230: first defined here
Thus remove the duplicate definitions and add the required size and type
information to the assembler definition. Also add -fno-common to the
command line options to prevent similar hacks in the future.
Signed-off-by: Philipp Rudo <prudo@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for format 3 function measurement blocks.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since the plain vmlinux ELF file no longer carries all necessary parts
for starting up (like the entry point and decompressor), add a check
which would block boot process and encourage users to use bzImage or
arch/s390/boot/compressed/vmlinux instead.
The check relies on s390 linux entry point ABI definition, which is only
present in bzImage and arch/s390/boot/compressed/vmlinux.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Aligning struct lowcore to double page size allows to get rid of this
gcc warning:
In file included from ./arch/s390/include/asm/setup.h:56,
from ./arch/s390/include/asm/page.h:36,
from ./arch/s390/include/asm/user.h:11,
from ./include/linux/user.h:1,
from ./include/linux/elfcore.h:5,
from ./include/linux/crash_core.h:6,
from ./include/linux/kexec.h:18,
from arch/s390/purgatory/purgatory.c:10:
./arch/s390/include/asm/lowcore.h:189:1: warning: alignment 1 of 'struct
lowcore' is less than 8 [-Wpacked-not-aligned]
} __packed;
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Since startup code now reserves memory ranges [0, PARMAREA_END] and
[_stext, <end of kernel>] _ehead symbol is not used and could be
cleaned up.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move all the inline functions from the ap bus header
file ap_asm.h into the in-kernel api header file
arch/s390/include/asm/ap.h so that KVM can make use
of all the low level AP functions.
Signed-off-by: Harald Freudenberger <freude@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce PARMAREA_END, and use it for memblock reserve of low
memory, which is used for lowcore, kdump data mover code and page
buffer, early stack and parmarea. There is no need to reserve an
area between PARMAREA_END and the decompressor _ehead.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To avoid a mixture of asm code with expolines and c code without them,
propagate CC_USING_EXPOLINE to KBUILD_AFLAGS and use it to detect
whether asm code should have expolines or not.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When allocating a new AOB fails, handle_outbound() is still capable of
transmitting the selected buffer (just without async completion).
But if a previous transfer on this queue slot used async completion, its
sbal_state flags field is still set to QDIO_OUTBUF_STATE_FLAG_PENDING.
So when the upper layer driver sees this stale flag, it expects an async
completion that never happens.
Fix this by unconditionally clearing the flags field.
Fixes: 104ea556ee ("qdio: support asynchronous delivery of storage blocks")
Cc: <stable@vger.kernel.org> #v3.2+
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The conditional inc/dec ops differ for atomic_t and atomic64_t:
- atomic_inc_unless_positive() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_unless_negative() is optional for atomic_t, and doesn't exist for atomic64_t.
- atomic_dec_if_positive is optional for atomic_t, and is mandatory for atomic64_t.
Let's make these consistently optional for both. At the same time, let's
clean up the existing fallbacks to use atomic_try_cmpxchg().
The instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-18-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Many of the inc/dec ops are mandatory, but for most architectures inc/dec are
simply trivial wrappers around their corresponding add/sub ops.
Let's make all the inc/dec ops optional, so that we can get rid of these
boilerplate wrappers.
The instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-17-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Some of the atomics return the result of a test applied after the atomic
operation, and almost all architectures implement these as trivial
wrappers around the underlying atomic. Specifically:
* <atomic>_inc_and_test(v) is (<atomic>_inc_return(v) == 0)
* <atomic>_dec_and_test(v) is (<atomic>_dec_return(v) == 0)
* <atomic>_sub_and_test(i, v) is (<atomic>_sub_return(i, v) == 0)
* <atomic>_add_negative(i, v) is (<atomic>_add_return(i, v) < 0)
Rather than have these definitions duplicated in all architectures, with
minor inconsistencies in formatting and documentation, let's make these
operations optional, with default fallbacks as above. Implementations
must now provide a preprocessor symbol.
The instrumented atomics are updated accordingly.
Both x86 and m68k have custom implementations, which are left as-is,
given preprocessor symbols to avoid being overridden.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-16-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Architectures with atomic64_fetch_add_unless() provide a preprocessor
symbol if they do so, and all other architectures have trivial C
implementations of atomic64_add_unless() which are near-identical.
Let's unify the trivial definitions of atomic64_fetch_add_unless() in
<linux/atomic.h>, so that we always have both
atomic64_fetch_add_unless() and atomic64_add_unless() with less
boilerplate code.
This means that atomic64_add_unless() is always implemented in core
code, and the instrumented atomics are updated accordingly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-15-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Several architectures these have a near-identical implementation based
on atomic_read() and atomic_cmpxchg() which we can instead define in
<linux/atomic.h>, so let's do so, using something close to the existing
x86 implementation with try_cmpxchg().
Where an architecture provides its own atomic_fetch_add_unless(), it
must define a preprocessor symbol for it. The instrumented atomics are
updated accordingly.
Note that arch/arc's existing atomic_fetch_add_unless() had redundant
barriers, as these are already present in its atomic_cmpxchg()
implementation.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: https://lore.kernel.org/lkml/20180621121321.4761-7-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We define a trivial fallback for atomic_inc_not_zero(), but don't do
the same for atomic64_inc_not_zero(), leading most architectures to
define the same boilerplate.
Let's add a fallback in <linux/atomic.h>, and remove the redundant
implementations. Note that atomic64_add_unless() is always defined in
<linux/atomic.h>, and promotes its arguments to the requisite types, so
we need not do this explicitly.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-6-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While __atomic_add_unless() was originally intended as a building-block
for atomic_add_unless(), it's now used in a number of places around the
kernel. It's the only common atomic operation named __atomic*(), rather
than atomic_*(), and for consistency it would be better named
atomic_fetch_add_unless().
This lack of consistency is slightly confusing, and gets in the way of
scripting atomics. Given that, let's clean things up and promote it to
an official part of the atomics API, in the form of
atomic_fetch_add_unless().
This patch converts definitions and invocations over to the new name,
including the instrumented version, using the following script:
----
git grep -w __atomic_add_unless | while read line; do
sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}";
done
git grep -w __arch_atomic_add_unless | while read line; do
sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}";
done
----
Note that we do not have atomic{64,_long}_fetch_add_unless(), which will
be introduced by later patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@sifive.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull s390 updates from Martin Schwidefsky:
"common I/O layer
- Fix bit-fields crossing storage-unit boundaries in css_general_char
dasd driver
- Avoid a sparse warning in regard to the queue lock
- Allocate the struct dasd_ccw_req as per request data. Only for
internal I/O is the structure allocated separately
- Remove the unused function dasd_kmalloc_set_cda
- Save a few bytes in struct dasd_ccw_req by reordering fields
- Convert remaining users of dasd_kmalloc_request to
dasd_smalloc_request and remove the now unused function
vfio/ccw
- Refactor and improve pfn_array_alloc_pin/pfn_array_pin
- Add a new tracepoint for failed vfio/ccw requests
- Add a CCW translation improvement to accept more requests as valid
- Bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: only use preallocated requests
s390/dasd: reshuffle struct dasd_ccw_req
s390/dasd: remove dasd_kmalloc_set_cda
s390/dasd: move dasd_ccw_req to per request data
s390/dasd: simplify locking in process_final_queue
s390/cio: sanitize css_general_characteristics definition
vfio: ccw: add tracepoints for interesting error paths
vfio: ccw: set ccw->cda to NULL defensively
vfio: ccw: refactor and improve pfn_array_alloc_pin()
vfio: ccw: shorten kernel doc description for pfn_array_pin()
vfio: ccw: push down unsupported IDA check
vfio: ccw: fix error return in vfio_ccw_sch_event
s390/archrandom: Rework arch random implementation.
s390/net: add pnetid support
* ARM: lazy context-switching of FPSIMD registers on arm64, "split"
regions for vGIC redistributor
* s390: cleanups for nested, clock handling, crypto, storage keys and
control register bits
* x86: many bugfixes, implement more Hyper-V super powers,
implement lapic_timer_advance_ns even when the LAPIC timer
is emulated using the processor's VMX preemption timer. Two
security-related bugfixes at the top of the branch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJbH8Z/AAoJEL/70l94x66DF+UIAJeOuTp6LGasT/9uAb2OovaN
+5kGmOPGFwkTcmg8BQHI2fXT4vhxMXWPFcQnyig9eXJVxhuwluXDOH4P9IMay0yw
VDCBsWRdMvZDQad2hn6Z5zR4Jx01XrSaG/KqvXbbDKDCy96mWG7SYAY2m3ZwmeQi
3Pa3O3BTijr7hBYnMhdXGkSn4ZyU8uPaAgIJ8795YKeOJ2JmioGYk6fj6y2WCxA3
ztJymBjTmIoZ/F8bjuVouIyP64xH4q9roAyw4rpu7vnbWGqx1fjPYJoB8yddluWF
JqCPsPzhKDO7mjZJy+lfaxIlzz2BN7tKBNCm88s5GefGXgZwk3ByAq/0GQ2M3rk=
=H5zI
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small update for KVM:
ARM:
- lazy context-switching of FPSIMD registers on arm64
- "split" regions for vGIC redistributor
s390:
- cleanups for nested
- clock handling
- crypto
- storage keys
- control register bits
x86:
- many bugfixes
- implement more Hyper-V super powers
- implement lapic_timer_advance_ns even when the LAPIC timer is
emulated using the processor's VMX preemption timer.
- two security-related bugfixes at the top of the branch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (79 commits)
kvm: fix typo in flag name
kvm: x86: use correct privilege level for sgdt/sidt/fxsave/fxrstor access
KVM: x86: pass kvm_vcpu to kvm_read_guest_virt and kvm_write_guest_virt_system
KVM: x86: introduce linear_{read,write}_system
kvm: nVMX: Enforce cpl=0 for VMX instructions
kvm: nVMX: Add support for "VMWRITE to any supported field"
kvm: nVMX: Restrict VMX capability MSR changes
KVM: VMX: Optimize tscdeadline timer latency
KVM: docs: nVMX: Remove known limitations as they do not exist now
KVM: docs: mmu: KVM support exposing SLAT to guests
kvm: no need to check return value of debugfs_create functions
kvm: Make VM ioctl do valloc for some archs
kvm: Change return type to vm_fault_t
KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008
kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentation
KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capability
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}_EX implementation
KVM: x86: hyperv: simplistic HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} implementation
KVM: introduce kvm_make_vcpus_request_mask() API
KVM: x86: hyperv: do rep check for each hypercall separately
...
Change css_general_characteristics such that the bitfields don't
straddle storage-unit boundaries of the base types.
This does not change the offsets of the structs members but now
we do as documented and also fix the following sparse complaint:
drivers/s390/cio/chsc.c:926:56:
warning: invalid access past the end of 'css_general_characteristics' (16 18)
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the PTE special supports is turned on in per architecture
header files. Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.
This patch introduce a new configuration variable to manage this
directly in the Kconfig files. It would later replace
__HAVE_ARCH_PTE_SPECIAL.
Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:
arm
__HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.
powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
- arch/powerpc/include/asm/book3s/64/pgtable.h
- arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.
sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64
There is no functional change introduced by this patch.
Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.com
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull s390 updates from Martin Schwidefsky:
- A rework for the s390 arch random code, the TRNG instruction is
rather slow and should not be used on the interrupt path
- A fix for a memory leak in the zcrypt driver
- Changes to the early boot code to add a compile time check for code
that may not use the .bss section, with the goal to avoid initrd
corruptions
- Add an interface to get the physical network ID (pnetid), this is
useful to group network devices that are attached to the same network
- Some cleanup for the linker script
- Some code improvement for the dasd driver
- Two fixes for the perf sampling support
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: Fix CCA and EP11 CPRB processing failure memory leak.
s390/archrandom: Rework arch random implementation.
s390/net: add pnetid support
s390/dasd: simplify locking in dasd_times_out
s390/cio: add test for ccwgroup device
s390/cio: add helper to query utility strings per given ccw device
s390: remove no-op macro VMLINUX_SYMBOL()
s390: remove closung punctuation from spectre messages
s390: introduce compile time check for empty .bss section
s390/early: move functions which may not access bss section to extra file
s390/early: get rid of #ifdef CONFIG_BLK_DEV_INITRD
s390/early: get rid of memmove_early
s390/cpum_sf: Add data entry sizes to sampling trailer entry
perf: fix invalid bit in diagnostic entry
Pull timers and timekeeping updates from Thomas Gleixner:
- Core infrastucture work for Y2038 to address the COMPAT interfaces:
+ Add a new Y2038 safe __kernel_timespec and use it in the core
code
+ Introduce config switches which allow to control the various
compat mechanisms
+ Use the new config switch in the posix timer code to control the
32bit compat syscall implementation.
- Prevent bogus selection of CPU local clocksources which causes an
endless reselection loop
- Remove the extra kthread in the clocksource code which has no value
and just adds another level of indirection
- The usual bunch of trivial updates, cleanups and fixlets all over the
place
- More SPDX conversions
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
clocksource/drivers/mxs_timer: Switch to SPDX identifier
clocksource/drivers/timer-imx-tpm: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Switch to SPDX identifier
clocksource/drivers/timer-imx-gpt: Remove outdated file path
clocksource/drivers/arc_timer: Add comments about locking while read GFRC
clocksource/drivers/mips-gic-timer: Add pr_fmt and reword pr_* messages
clocksource/drivers/sprd: Fix Kconfig dependency
clocksource: Move inline keyword to the beginning of function declarations
timer_list: Remove unused function pointer typedef
timers: Adjust a kernel-doc comment
tick: Prefer a lower rating device only if it's CPU local device
clocksource: Remove kthread
time: Change nanosleep to safe __kernel_* types
time: Change types to new y2038 safe __kernel_* types
time: Fix get_timespec64() for y2038 safe compat interfaces
time: Add new y2038 safe __kernel_timespec
posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_COMPAT_32BIT_TIME
time: Introduce CONFIG_64BIT_TIME in architectures
compat: Enable compat_get/put_timespec64 always
...
Pull irq updates from Thomas Gleixner:
- Consolidation of softirq pending:
The softirq mask and its accessors/mutators have many implementations
scattered around many architectures. Most do the same things
consisting in a field in a per-cpu struct (often irq_cpustat_t)
accessed through per-cpu ops. We can provide instead a generic
efficient version that most of them can use. In fact s390 is the only
exception because the field is stored in lowcore.
- Support for level!?! triggered MSI (ARM)
Over the past couple of years, we've seen some SoCs coming up with
ways of signalling level interrupts using a new flavor of MSIs, where
the MSI controller uses two distinct messages: one that raises a
virtual line, and one that lowers it. The target MSI controller is in
charge of maintaining the state of the line.
This allows for a much simplified HW signal routing (no need to have
hundreds of discrete lines to signal level interrupts if you already
have a memory bus), but results in a departure from the current idea
the kernel has of MSIs.
- Support for Meson-AXG GPIO irqchip
- Large stm32 irqchip rework (suspend/resume, hierarchical domains)
- More SPDX conversions
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
ARM: dts: stm32: Add exti support to stm32mp157 pinctrl
ARM: dts: stm32: Add exti support for stm32mp157c
pinctrl/stm32: Add irq_eoi for stm32gpio irqchip
irqchip/stm32: Add suspend/resume support for hierarchy domain
irqchip/stm32: Add stm32mp1 support with hierarchy domain
irqchip/stm32: Prepare common functions
irqchip/stm32: Add host and driver data structures
irqchip/stm32: Add suspend support
irqchip/stm32: Add falling pending register support
irqchip/stm32: Checkpatch fix
irqchip/stm32: Optimizes and cleans up stm32-exti irq_domain
irqchip/meson-gpio: Add support for Meson-AXG SoCs
dt-bindings: interrupt-controller: New binding for Meson-AXG SoC
dt-bindings: interrupt-controller: Fix the double quotes
softirq/s390: Move default mutators of overwritten softirq mask to s390
softirq/x86: Switch to generic local_softirq_pending() implementation
softirq/sparc: Switch to generic local_softirq_pending() implementation
softirq/powerpc: Switch to generic local_softirq_pending() implementation
softirq/parisc: Switch to generic local_softirq_pending() implementation
softirq/ia64: Switch to generic local_softirq_pending() implementation
...
- replaceme the force_dma flag with a dma_configure bus method.
(Nipun Gupta, although one patch is іncorrectly attributed to me
due to a git rebase bug)
- use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
- remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
right thing for bounce buffering.
- move dma-debug initialization to common code, and apply a few cleanups
to the dma-debug code.
- cleanup the Kconfig mess around swiotlb selection
- swiotlb comment fixup (Yisheng Xie)
- a trivial swiotlb fix. (Dan Carpenter)
- support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
- add a new generic dma-noncoherent dma_map_ops implementation and use
it for arc, c6x and nds32.
- improve scatterlist validity checking in dma-debug. (Robin Murphy)
- add a struct device quirk to limit the dma-mask to 32-bit due to
bridge/system issues, and switch x86 to use it instead of a local
hack for VIA bridges.
- handle devices without a dma_mask more gracefully in the dma-direct
code.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlsU1hwLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYPraxAAocC7JiFKW133/VugCtGA1x9uE8DPHealtsWTAeEq
KOOB3GxWMU2hKqQ4km5tcfdWoGJvvab6hmDXcitzZGi2JajO7Ae0FwIy3yvxSIKm
iH/ON7c4sJt8gKrXYsLVylmwDaimNs4a6xfODoCRgnWuovI2QrrZzupnlzPNsiOC
lv8ezzcW+Ay/gvDD/r72psO+w3QELETif/OzR/qTOtvLrVabM06eHmPQ8Wb98smu
/UPMMv6/3XwQnxpxpdyqN+p/gUdneXithzT261wTeZ+8gDXmcWBwHGcMBCimcoBi
FklW52moazIPIsTysqoNlVFsLGJTeS4p2D3BLAp5NwWYsLv+zHUVZsI1JY/8u5Ox
mM11LIfvu9JtUzaqD9SvxlxIeLhhYZZGnUoV3bQAkpHSQhN/xp2YXd5NWSo5ac2O
dch83+laZkZgd6ryw6USpt/YTPM/UHBYy7IeGGHX/PbmAke0ZlvA6Rae7kA5DG59
7GaLdwQyrHp8uGFgwze8P+R4POSk1ly73HHLBT/pFKnDD7niWCPAnBzuuEQGJs00
0zuyWLQyzOj1l6HCAcMNyGnYSsMp8Fx0fvEmKR/EYs8O83eJKXi6L9aizMZx4v1J
0wTolUWH6SIIdz474YmewhG5YOLY7mfe9E8aNr8zJFdwRZqwaALKoteRGUxa3f6e
zUE=
=6Acj
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- replace the force_dma flag with a dma_configure bus method. (Nipun
Gupta, although one patch is іncorrectly attributed to me due to a
git rebase bug)
- use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
- remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
right thing for bounce buffering.
- move dma-debug initialization to common code, and apply a few
cleanups to the dma-debug code.
- cleanup the Kconfig mess around swiotlb selection
- swiotlb comment fixup (Yisheng Xie)
- a trivial swiotlb fix. (Dan Carpenter)
- support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
- add a new generic dma-noncoherent dma_map_ops implementation and use
it for arc, c6x and nds32.
- improve scatterlist validity checking in dma-debug. (Robin Murphy)
- add a struct device quirk to limit the dma-mask to 32-bit due to
bridge/system issues, and switch x86 to use it instead of a local
hack for VIA bridges.
- handle devices without a dma_mask more gracefully in the dma-direct
code.
* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
dma-direct: don't crash on device without dma_mask
nds32: use generic dma_noncoherent_ops
nds32: implement the unmap_sg DMA operation
nds32: consolidate DMA cache maintainance routines
x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
x86/pci-dma: remove the explicit nodac and allowdac option
x86/pci-dma: remove the experimental forcesac boot option
Documentation/x86: remove a stray reference to pci-nommu.c
core, dma-direct: add a flag 32-bit dma limits
dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
dma-debug: check scatterlist segments
c6x: use generic dma_noncoherent_ops
arc: use generic dma_noncoherent_ops
arc: fix arc_dma_{map,unmap}_page
arc: fix arc_dma_sync_sg_for_{cpu,device}
arc: simplify arc_dma_sync_single_for_{cpu,device}
dma-mapping: provide a generic dma-noncoherent implementation
dma-mapping: simplify Kconfig dependencies
riscv: add swiotlb support
riscv: only enable ZONE_DMA32 for 64-bit
...
The arch_get_random_seed_long() invocation done by the random device
driver is done in interrupt context and may be invoked very very
frequently. The existing s390 arch_get_random_seed*() implementation
uses the PRNO(TRNG) instruction which produces excellent high quality
entropy but is relatively slow and thus expensive.
This fix reworks the arch_get_random_seed* implementation. It
introduces a buffer concept to decouple the delivery of random data
via arch_get_random_seed*() from the generation of new random
bytes. The buffer of random data is filled asynchronously by a
workqueue thread.
If there are enough bytes in the buffer the s390_arch_random_generate()
just delivers these bytes. Otherwise false is returned until the worker
thread refills the buffer.
The worker fills the rng buffer by pulling fresh entropy from the
high quality (but slow) true hardware random generator. This entropy
is then spread over the buffer with an pseudo random generator.
As the arch_get_random_seed_long() fetches 8 bytes and the calling
function add_interrupt_randomness() counts this as 1 bit entropy the
distribution needs to make sure there is in fact 1 bit entropy
contained in 8 bytes of the buffer. The current values pull 32 byte
entropy and scatter this into a 2048 byte buffer. So 8 byte in the
buffer will contain 1 bit of entropy.
The worker thread is rescheduled based on the charge level of the
buffer but at least with 500 ms delay to avoid too much cpu consumption.
So the max. amount of rng data delivered via arch_get_random_seed is
limited to 4Kb per second.
Signed-off-by: Harald Freudenberger <freude@de.ibm.com>
Reviewed-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390 hardware supports the definition of a so-call Physical NETwork
IDentifier (short PNETID) per network device port. These PNETIDS
can be used to identify network devices that are attached to the same
physical network (broadcast domain).
This patch provides the interface to extract the PNETID of a port of
a device attached to the ccw-bus or pci-bus.
Parts of this patch are based on an initial implementation by
Thomas Richter.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a test to check if a given device is a ccwgroup device.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In KVM code we use masks to test/set control registers.
Let's define the ones we use in arch/s390/include/asm/ctl_reg.h and
replace all occurrences in KVM code.
As we will be needing the define for Clock-comparator sign control soon,
let's also add it.
Suggested-by: Collin L. Walling <walling@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Collin Walling <walling@linux.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Up to now we always expected to have the storage key facility
available for our (non-VSIE) KVM guests. For huge page support, we
need to be able to disable it, so let's introduce that now.
We add the use_skf variable to manage KVM storage key facility
usage. Also we rename use_skey in the mm context struct to uses_skeys
to make it more clear that it is an indication that the vm actively
uses storage keys.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
s390 is now the last architecture that entirely overwrites
local_softirq_pending() and uses the according default definitions of
set_softirq_pending() and or_softirq_pending().
Just move these to s390 to debloat the generic code complexity.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/1525786706-22846-12-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The CPU Measurement sampling facility creates a trailer entry for each
Sample-Data-Block of stored samples. The trailer entry contains the sizes
(in bytes) of the stored sampling types:
- basic-sampling data entry size
- diagnostic-sampling data entry size
Both sizes are 2 bytes long.
This patch changes the trailer entry definition to reflect this.
Fixes: fcc77f5073 ("s390/cpum_sf: Atomically reset trailer entry fields of sample-data-blocks")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 CPU measurement facility sampling mode supports basic entries
and diagnostic entries. Each entry has a valid bit to indicate the
status of the entry as valid or invalid.
This bit is bit 31 in the diagnostic entry, but the bit mask definition
refers to bit 30.
Fix this by making the reserved field one bit larger.
Fixes: 7e75fc3ff4 ("s390/cpum_sf: Add raw data sampling to support the diagnostic-sampling function")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition
of the sk_load_word and sk_load_half functions.
Add support for branch-on-condition instructions contained in the
thunk code of an expoline.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The return from the ftrace_stub, _mcount, ftrace_caller and
return_to_handler functions is done with "br %r14" and "br %r1".
These are indirect branches as well and need to use execute
trampolines for CONFIG_EXPOLINE=y.
The ftrace_caller function is a special case as it returns to the
start of a function and may only use %r0 and %r1. For a pre z10
machine the standard execute trampoline uses a LARL + EX to do
this, but this requires *two* registers in the range %r1..%r15.
To get around this the 'br %r1' located in the lowcore is used,
then the EX instruction does not need an address register.
But the lowcore trick may only be used for pre z14 machines,
with noexec=on the mapping for the first page may not contain
instructions. The solution for that is an ALTERNATIVE in the
expoline THUNK generated by 'GEN_BR_THUNK %r1' to switch to
EXRL, this relies on the fact that a machine that supports
noexec=on has EXRL as well.
Cc: stable@vger.kernel.org # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To be able to use the expoline branches in different assembler
files move the associated macros from entry.S to a new header
nospec-insn.h.
While we are at it make the macros a bit nicer to use.
Cc: stable@vger.kernel.org # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This was used by the ide, scsi and networking code in the past to
determine if they should bounce payloads. Now that the dma mapping
always have to support dma to all physical memory (thanks to swiotlb
for non-iommu systems) there is no need to this crude hack any more.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv)
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Fix the following sparse complaints:
arch/s390/purgatory/purgatory.c:18:5: warning: symbol 'kernel_entry' was not declared. Should it be static?
arch/s390/purgatory/purgatory.c:19:5: warning: symbol 'kernel_type' was not declared. Should it be static?
arch/s390/purgatory/purgatory.c:21:5: warning: symbol 'crash_start' was not declared. Should it be static?
arch/s390/purgatory/purgatory.c:22:5: warning: symbol 'crash_size' was not declared. Should it be static?
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In a multi-threaded program any thread can call execve(). If this
is not done by the thread group leader, the de_thread() function
replaces the pid of the task that calls execve() with the pid of
thread group leader. If the task reaches user space again without
going over __switch_to() the sampling tag is still set to the old
pid.
Define the arch_setup_new_exec function to verify the task pid
and udpate the tag with LPP if it has changed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 msgbuf/sembuf/shmbuf header files are all identical to the
version from asm-generic.
This patch removes the files and replaces them with 'generic-y'
statements, to avoid having to modify each copy when we extend sysvipc
to deal with 64-bit time_t in 32-bit user space.
Note that unlike alpha and ia64, the ipcbuf.h header file is slightly
different here, so I'm leaving the private copy.
To deal with 32-bit compat tasks, we also have to adapt the definitions
of compat_{shm,sem,msg}id_ds to match the changes to the respective
asm-generic files.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Add an ELF loader for kexec_file. The main task here is to do proper sanity
checks on the ELF file. Basically all other functionality was already
implemented for the image loader.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an image loader for kexec_file_load. For simplicity first skip crash
support. The functions defined in machine_kexec_file will later be shared
with the ELF loader.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The common code expects the architecture to have a purgatory that runs
between the two kernels. Add it now. For simplicity first skip crash
support.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
kexec_file_load needs to prepare the new kernels before they are loaded.
For that it has to know the offsets in head.S, e.g. to register the new
command line. Unfortunately there are no macros right now defining those
offsets. Define them now.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via
early_initcall is done *after* the early_param functions. This
overwrites any settings done with the nobp/no_spectre_v2/spectre_v2
parameters. The code patching for the kernel is done after the
evaluation of the early parameters but before the early_initcall
is done. The end result is a kernel image that is patched correctly
but the kernel modules are not.
Make sure that the nospec auto detection function is called before the
early parameters are evaluated and before the code patching is done.
Fixes: 6e179d6412 ("s390: add automatic detection of the spectre defense")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There was an artificial restriction on the card/adapter id
to only 6 bits but all the AP commands do support adapter
ids with 8 bit. This patch removes this restriction to 64
adapters and now up to 256 adapter can get addressed.
Some of the ioctl calls work on the max number of cards
possible (which was 64). These ioctls are now deprecated
but still supported. All the defines, structs and ioctl
interface declarations have been kept for compabibility.
There are now new ioctls (and defines for these) with an
additional '2' appended which provide the extended versions
with 256 cards supported.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390 kdump reipl implementation relies on os_info kernel structure
residing in old memory being dumped. os_info contains reipl block,
which is used (if valid) by the kdump kernel for reipl parameters.
The problem is that the reipl block and its checksum inside
os_info is updated only when /sys/firmware/reipl/reipl_type is
written. This sets an offset of a reipl block for "reipl_type" and
re-calculates reipl block checksum. Any further alteration of values
under /sys/firmware/reipl/{reipl_type}/ without subsequent write to
/sys/firmware/reipl/reipl_type lead to incorrect os_info reipl block
checksum. In such a case kdump kernel ignores it and reboots using
default logic.
To fix this, os_info reipl block update is moved right before kdump
execution.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
do_reipl, do_halt and do_poff are not defined anywhere. Cleaning up
functions declaration.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
diag308 set has been available for many machine generations, and
alternative reipl code paths has not been exercised and seems to be
broken without noticing for a while now. So, cleaning up all obsolete
reipl methods except currently used ones, assuming that diag308 set
always works.
Also removing not longer needed reset callbacks.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For both ccw and fcp boot retrieve ipl info from ipl block received via
diag308 store. Old scsi ipl parm block handling and cio_get_iplinfo are
removed. Ipl type is deducted from ipl block (if valid).
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ipl_flags and corresponding enum are not used outside of ipl.c and will
be reworked in later commits.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Ipl parm blocks received via "diag308 store" and during scsi boot at
IPL_PARMBLOCK_ORIGIN are merged into the "ipl_block".
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past invalid
privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as of now)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
=bPlD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
Pull s390 updates from Martin Schwidefsky:
- Improvements for the spectre defense:
* The spectre related code is consolidated to a single file
nospec-branch.c
* Automatic enable/disable for the spectre v2 defenses (expoline vs.
nobp)
* Syslog messages for specve v2 are added
* Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
functions for spectre v1 and v2
- Add helper macros for assembler alternatives and use them to shorten
the code in entry.S.
- Add support for persistent configuration data via the SCLP Store Data
interface. The H/W interface requires a page table that uses 4K pages
only, the code to setup such an address space is added as well.
- Enable virtio GPU emulation in QEMU. To do this the depends
statements for a few common Kconfig options are modified.
- Add support for format-3 channel path descriptors and add a binary
sysfs interface to export the associated utility strings.
- Add a sysfs attribute to control the IFCC handling in case of
constant channel errors.
- The vfio-ccw changes from Cornelia.
- Bug fixes and cleanups.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
s390/kvm: improve stack frame constants in entry.S
s390/lpp: use assembler alternatives for the LPP instruction
s390/entry.S: use assembler alternatives
s390: add assembler macros for CPU alternatives
s390: add sysfs attributes for spectre
s390: report spectre mitigation via syslog
s390: add automatic detection of the spectre defense
s390: move nobp parameter functions to nospec-branch.c
s390/cio: add util_string sysfs attribute
s390/chsc: query utility strings via fmt3 channel path descriptor
s390/cio: rename struct channel_path_desc
s390/cio: fix unbind of io_subchannel_driver
s390/qdio: split up CCQ handling for EQBS / SQBS
s390/qdio: don't retry EQBS after CCQ 96
s390/qdio: restrict buffer merging to eligible devices
s390/qdio: don't merge ERROR output buffers
s390/qdio: simplify math in get_*_buffer_frontier()
s390/decompressor: trim uncompressed image head during the build
s390/crypto: Fix kernel crash on aes_s390 module remove.
s390/defkeymap: fix global init to zero
...
With the new macros for CPU alternatives the MACHINE_FLAG_LPP check
around the LPP instruction can be optimized. After this is done the
flag can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a header with macros usable in assembler files to emit alternative
code sequences. It works analog to the alternatives for inline assmeblies
in C files, with the same restrictions and capabilities.
The syntax is
ALTERNATIVE "<default instructions sequence>", \
"<alternative instructions sequence>", \
"<features-bit>"
and
ALTERNATIVE_2 "<default instructions sequence>", \
"<alternative instructions sqeuence #1>", \
"<feature-bit #1>",
"<alternative instructions sqeuence #2>", \
"<feature-bit #2>"
Reviewed-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Automatically decide between nobp vs. expolines if the spectre_v2=auto
kernel parameter is specified or CONFIG_EXPOLINE_AUTO=y is set.
The decision made at boot time due to CONFIG_EXPOLINE_AUTO=y being set
can be overruled with the nobp, nospec and spectre_v2 kernel parameters.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add support for format 3 channel path descriptors and use them to
gather utility strings.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rename struct channel_path_desc to struct channel_path_desc_fmt0
to fit the scheme. Provide a macro for the function wrappers that
gather this and related data from firmware.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- more kvm stat counters
- virtio gpu plumbing. The 3 non-KVM/s390 patches have Acks from
Bartlomiej Zolnierkiewicz, Heiko Carstens and Greg Kroah-Hartman
but all belong together to make virtio-gpu work as a tty. So
I carried them in the KVM/s390 tree.
- document some KVM_CAPs
- cpu-model only facilities
- cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJaqXZWAAoJEBF7vIC1phx87qIP/0Zr9yC8uhU2nqtNVXf9Ezhl
x2Qh7Psk52LQEORcVHzbV5o5YIQrZxDraJ6DHLT6L6Qqpnk3SkUbtUAc+V3nHfFu
iIOGQ8GgAsjY2E6A5xKp0JcuJB9/3S7C+CDYMUuFhKHSLNwQvZmRQjn2CILugxPU
lHO3WpT3wrlVxiI/27uD3T3M+PNsyrMSeB75NBbnW5oTGB3/TNl2159aNdbuvl6C
Kn74P5FrzgGhAPcmIZkQHt6e8RLi1UBoMO/LC6+u0WFMH9E7GriiMt0bSxlsWdt8
7wCmQ5Pvv6oZm/SNDgULWfbF1iRL06pMMzArjpnAGUcMKcdTnmzs0lBSYNmbDSh4
LTQ/LBFxhqf39Hn+D/QaPAJiL0wWIHo6tsRJ9JGDatWuU7vgyfMYTBJmvvK4cGIw
70iSeOer6TqwCcDIOcsQpbA58kVJcYP+FDKV8ADAj6cVfTD+s6npVB+JwDtX2KKJ
d2t4GUsRmEJUvQjRHxYspSyYDVcPO7KDZ5TpVW5GHo1fbim1P1LO/ti6eo9u9c/T
By+t+m3CAMubd+zVeB5uz64wdt8ZhgYTFdvYN4L2chh3JWs3wf7Lkeot4vJ7HgoE
VWNHVxbYovPNMrUVYNzU6zSbDkQz9hvG5rK2mwv7jmxYGUmv/mItDiF7gFsiRWXg
N63FBCebeaVlfEJniWRz
=vw46
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: fixes and features
- more kvm stat counters
- virtio gpu plumbing. The 3 non-KVM/s390 patches have Acks from
Bartlomiej Zolnierkiewicz, Heiko Carstens and Greg Kroah-Hartman
but all belong together to make virtio-gpu work as a tty. So
I carried them in the KVM/s390 tree.
- document some KVM_CAPs
- cpu-model only facilities
- cleanups
For testing the exitless interrupt support it turned out useful to
have separate counters for inject and delivery of I/O interrupt.
While at it do the same for all interrupt types. For timer
related interrupts (clock comparator and cpu timer) we even had
no delivery counters. Fix this as well. On this way some counters
are being renamed to have a similar name.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
This counter can be used for administration, debug or test purposes.
Suggested-by: Vladislav Mironov <mironov@de.ibm.com>
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We want to count IO exit requests in kvm_stat. At the same time
we can get rid of the handle_noop function.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
use_cmma in kvm_arch means that the KVM hypervisor is allowed to use
cmma, whereas use_cmma in the mm context means cmm has been used before.
Let's rename the context one to uses_cmm, as the vm does use
collaborative memory management but the host uses the cmm assist
(interpretation facility).
Also let's introduce use_pfmfi, so we can remove the pfmfi disablement
when we activate cmma and rather not activate it in the first place.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Message-Id: <1518779775-256056-2-git-send-email-frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch introduces kvm_para_has_hint() to query for hints about
the configuration of the guests. The first hint KVM_HINTS_DEDICATED,
is set if the guest has dedicated physical CPUs for each vCPU (i.e.
pinning and no over-commitment). This allows optimizing spinlocks
and tells the guest to avoid PV TLB flush.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
When running s390 images with 'compat' processes, the following
BUG is seen repeatedly.
BUG: non-zero pgtables_bytes on freeing mm: -16384
Bisect points to commit b4e98d9ac7 ("mm: account pud page tables").
Analysis shows that init_new_context() is called with
mm->context.asce_limit set to _REGION3_SIZE. In this situation,
pgtables_bytes remains set to 0 and is not increased. The message is
displayed when the affected process dies and mm_dec_nr_puds() is called.
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Fixes: b4e98d9ac7 ("mm: account pud page tables")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This function is checking for the suspend control, not the function
control.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide base_asce_alloc() and base_asce_free() helper functions which
can be used to allocate an ASCE and all required region, segment and
page tables required to access memory regions of the virtual kernel
address space.
Both, the ASCE and all tables, do not use any features that correspond
to e.g. enhanced DAT features. This is required for some I/O functions
that pass an ASCE, like e.g. some service call requests, but which may
not use any enhanced features.
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ARM:
- Include icache invalidation optimizations, improving VM startup time
- Support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- A small fix for power-management notifiers, and some cosmetic changes
PPC:
- Add MMIO emulation for vector loads and stores
- Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- Improve the handling of escalation interrupts with the XIVE interrupt
controller
- Support decrement register migration
- Various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- Exitless interrupts for emulated devices
- Cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- Hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512
features
- Show vcpu id in its anonymous inode name
- Many fixes and cleanups
- Per-VCPU MSR bitmaps (already merged through x86/pti branch)
- Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz
Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm
Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP
xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3
/9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p
FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE=
=C/uD
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"ARM:
- icache invalidation optimizations, improving VM startup time
- support for forwarded level-triggered interrupts, improving
performance for timers and passthrough platform devices
- a small fix for power-management notifiers, and some cosmetic
changes
PPC:
- add MMIO emulation for vector loads and stores
- allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without
requiring the complex thread synchronization of older CPU versions
- improve the handling of escalation interrupts with the XIVE
interrupt controller
- support decrement register migration
- various cleanups and bugfixes.
s390:
- Cornelia Huck passed maintainership to Janosch Frank
- exitless interrupts for emulated devices
- cleanup of cpuflag handling
- kvm_stat counter improvements
- VSIE improvements
- mm cleanup
x86:
- hypervisor part of SEV
- UMIP, RDPID, and MSR_SMI_COUNT emulation
- paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit
- allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more
AVX512 features
- show vcpu id in its anonymous inode name
- many fixes and cleanups
- per-VCPU MSR bitmaps (already merged through x86/pti branch)
- stable KVM clock when nesting on Hyper-V (merged through
x86/hyperv)"
* tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits)
KVM: PPC: Book3S: Add MMIO emulation for VMX instructions
KVM: PPC: Book3S HV: Branch inside feature section
KVM: PPC: Book3S HV: Make HPT resizing work on POWER9
KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code
KVM: PPC: Book3S PR: Fix broken select due to misspelling
KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs()
KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled
KVM: PPC: Book3S HV: Drop locks before reading guest memory
kvm: x86: remove efer_reload entry in kvm_vcpu_stat
KVM: x86: AMD Processor Topology Information
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
kvm: embed vcpu id to dentry of vcpu anon inode
kvm: Map PFN-type memory regions as writable (if possible)
x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n
KVM: arm/arm64: Fixup userspace irqchip static key optimization
KVM: arm/arm64: Fix userspace_irqchip_in_use counting
KVM: arm/arm64: Fix incorrect timer_is_pending logic
MAINTAINERS: update KVM/s390 maintainers
MAINTAINERS: add Halil as additional vfio-ccw maintainer
MAINTAINERS: add David as a reviewer for KVM/s390
...
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.
With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call
basr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1
The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.
To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The man page for the s390_runtime_instr() system call refers to a
header file that was never exported. Therefore it's time to finally
provide the header file, and move large parts of the runtime
instrumentation header file to a new uapi header file.
Reported-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With z14, the store system information instruction provides an
licensed internal code identifier. Display it in /proc/sysinfo.
For more information, see the z/Architecture Principles of Operation.
(SA22-7832-11).
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 2a842acab1 ("block: introduce new block status code type")
added blk_status_t usage to the eadm subchannel driver. However
blk_status_t is unknown when included via <linux/blkdev.h> for CONFIG_BLOCK=n.
Only include <linux/blk_types.h> since this is the only dependency eadm has.
This fixes build failures like below:
In file included from drivers/s390/cio/eadm_sch.c:24:0:
./arch/s390/include/asm/eadm.h:111:4: error: unknown type name 'blk_status_t'; did you mean 'si_status'?
blk_status_t error);
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
to the clk rate protection support added by Jerome Brunet. This feature
will allow consumers to lock in a certain rate on the output of a clk so
that things like audio playback don't hear pops when the clk frequency
changes due to shared parent clks changing rates. Currently the clk
API doesn't guarantee the rate of a clk stays at the rate you request
after clk_set_rate() is called, so this new API will allow drivers
to express that requirement. Beyond this, the core got some debugfs
pretty printing patches and a couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h file
causes small one-liners in all the architecture Kbuild files. Overall, the
driver diff seems to be the normal stuff that comes all the time to
fix little problems here and there and to support new hardware.
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/
AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW
zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY
WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L
q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS
HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g
EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB
eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ
pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V
OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN
32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0
nC0JWTp5sC/pKf54QBXj
=bGPp
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"The core framework has a handful of patches this time around, mostly
due to the clk rate protection support added by Jerome Brunet.
This feature will allow consumers to lock in a certain rate on the
output of a clk so that things like audio playback don't hear pops
when the clk frequency changes due to shared parent clks changing
rates. Currently the clk API doesn't guarantee the rate of a clk stays
at the rate you request after clk_set_rate() is called, so this new
API will allow drivers to express that requirement.
Beyond this, the core got some debugfs pretty printing patches and a
couple minor non-critical fixes.
Looking outside of the core framework diff we have some new driver
additions and the removal of a legacy TI clk driver. Both of these hit
high in the dirstat. Also, the removal of the asm-generic/clkdev.h
file causes small one-liners in all the architecture Kbuild files.
Overall, the driver diff seems to be the normal stuff that comes all
the time to fix little problems here and there and to support new
hardware.
Summary:
Core:
- Clk rate protection
- Symbolic clk flags in debugfs output
- Clk registration enabled clks while doing bookkeeping updates
New Drivers:
- Spreadtrum SC9860
- HiSilicon hi3660 stub
- Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
- Amlogic Meson-AXG
- ASPEED BMC
Removed Drivers:
- TI OMAP 3xxx legacy clk (non-DT) support
- asm*/clkdev.h got removed (not really a driver)
Updates:
- Renesas FDP1-0 module clock on R-Car M3-W
- Renesas LVDS module clock on R-Car V3M
- Misc fixes to pr_err() prints
- Qualcomm MSM8916 audio fixes
- Qualcomm IPQ8074 rounded out support for more peripherals
- Qualcomm Alpha PLL variants
- Divider code was using container_of() on bad pointers
- Allwinner DE2 clks on H3
- Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
- Mediatek clk driver compile test support
- AT91 PMC clk suspend/resume restoration support
- PLL issues fixed on si5351
- Broadcom IProc PLL calculation updates
- DVFS support for Armada mvebu CPU clks
- Allwinner fixed post-divider support
- TI clkctrl fixes and support for newer SoCs"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: aspeed: Handle inverse polarity of USB port 1 clock gate
clk: aspeed: Fix return value check in aspeed_cc_init()
clk: aspeed: Add reset controller
clk: aspeed: Register gated clocks
clk: aspeed: Add platform driver and register PLLs
clk: aspeed: Register core clocks
clk: Add clock driver for ASPEED BMC SoCs
clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
clk: fix reentrancy of clk_enable() on UP systems
clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
clk: Simplify debugfs registration
clk: Fix debugfs_create_*() usage
clk: Show symbolic clock flags in debugfs
clk: renesas: r8a7796: Add FDP clock
clk: Move __clk_{get,put}() into private clk.h API
clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
clk: Improve flags doc for of_clk_detect_critical()
arch: Remove clkdev.h asm-generic from Kbuild
clk: sunxi-ng: a83t: Add M divider to TCON1 clock
clk: Prepare to remove asm-generic/clkdev.h
...
Pull s390 updates from Martin Schwidefsky:
"Bug fixes, small improvements and one notable change: the system call
table and the unistd.h header are now generated automatically with a
shell script from a text file"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/decompressor: discard __ksymtab and .eh_frame sections
s390: fix handling of -1 in set{,fs}[gu]id16 syscalls
s390/tools: generate header files in arch/s390/include/generated/
s390/syscalls: use generated syscall_table.h and unistd.h header files
s390/syscalls: add Makefile to generate system call header files
s390/syscalls: add syscalltbl script
s390/syscalls: add system call table
s390/decompressor: swap .text and .rodata.compressed sections
s390/sclp: fix .data section specification
s390/ipl: avoid usage of __section(.data)
s390/head: replace hard coded values with constants
s390/disassembler: add generated gen_opcode_table tool to .gitignore
s390: remove bogus system call table entries
s390/kprobes: remove duplicate includes
s390/dasd: Remove dead return code checks
s390/dasd: Simplify code
s390/vdso: revise CFI annotations of vDSO functions
s390/kernel: emit CFI data in .debug_frame and discard .eh_frame sections
Merge updates from Andrew Morton:
- misc fixes
- ocfs2 updates
- most of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits)
mm: remove PG_highmem description
tools, vm: new option to specify kpageflags file
mm/swap.c: make functions and their kernel-doc agree
mm, memory_hotplug: fix memmap initialization
mm: correct comments regarding do_fault_around()
mm: numa: do not trap faults on shared data section pages.
hugetlb, mbind: fall back to default policy if vma is NULL
hugetlb, mempolicy: fix the mbind hugetlb migration
mm, hugetlb: further simplify hugetlb allocation API
mm, hugetlb: get rid of surplus page accounting tricks
mm, hugetlb: do not rely on overcommit limit during migration
mm, hugetlb: integrate giga hugetlb more naturally to the allocation path
mm, hugetlb: unify core page allocation accounting and initialization
mm/memcontrol.c: try harder to decrease [memory,memsw].limit_in_bytes
mm/memcontrol.c: make local symbol static
mm/hmm: fix uninitialized use of 'entry' in hmm_vma_walk_pmd()
include/linux/mmzone.h: fix explanation of lower bits in the SPARSEMEM mem_map pointer
mm/compaction.c: fix comment for try_to_compact_pages()
mm/page_ext.c: make page_ext_init a noop when CONFIG_PAGE_EXTENSION but nothing uses it
zsmalloc: use U suffix for negative literals being shifted
...
Pull networking updates from David Miller:
1) Significantly shrink the core networking routing structures. Result
of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf
2) Add netdevsim driver for testing various offloads, from Jakub
Kicinski.
3) Support cross-chip FDB operations in DSA, from Vivien Didelot.
4) Add a 2nd listener hash table for TCP, similar to what was done for
UDP. From Martin KaFai Lau.
5) Add eBPF based queue selection to tun, from Jason Wang.
6) Lockless qdisc support, from John Fastabend.
7) SCTP stream interleave support, from Xin Long.
8) Smoother TCP receive autotuning, from Eric Dumazet.
9) Lots of erspan tunneling enhancements, from William Tu.
10) Add true function call support to BPF, from Alexei Starovoitov.
11) Add explicit support for GRO HW offloading, from Michael Chan.
12) Support extack generation in more netlink subsystems. From Alexander
Aring, Quentin Monnet, and Jakub Kicinski.
13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
Russell King.
14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.
15) Many improvements and simplifications to the NFP driver bpf JIT,
from Jakub Kicinski.
16) Support for ipv6 non-equal cost multipath routing, from Ido
Schimmel.
17) Add resource abstration to devlink, from Arkadi Sharshevsky.
18) Packet scheduler classifier shared filter block support, from Jiri
Pirko.
19) Avoid locking in act_csum, from Davide Caratti.
20) devinet_ioctl() simplifications from Al viro.
21) More TCP bpf improvements from Lawrence Brakmo.
22) Add support for onlink ipv6 route flag, similar to ipv4, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
tls: Add support for encryption using async offload accelerator
ip6mr: fix stale iterator
net/sched: kconfig: Remove blank help texts
openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
tcp_nv: fix potential integer overflow in tcpnv_acked
r8169: fix RTL8168EP take too long to complete driver initialization.
qmi_wwan: Add support for Quectel EP06
rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
ipmr: Fix ptrdiff_t print formatting
ibmvnic: Wait for device response when changing MAC
qlcnic: fix deadlock bug
tcp: release sk_frag.page in tcp_disconnect
ipv4: Get the address of interface correctly.
net_sched: gen_estimator: fix lockdep splat
net: macb: Handle HRESP error
net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
ipv6: addrconf: break critical section in addrconf_verify_rtnl()
ipv6: change route cache aging logic
i40e/i40evf: Update DESC_NEEDED value to reflect larger value
bnxt_en: cleanup DIM work on device shutdown
...
This pull requests contains a consolidation of the generic no-IOMMU code,
a well as the glue code for swiotlb. All the code is based on the x86
implementation with hooks to allow all architectures that aren't cache
coherent to use it. The x86 conversion itself has been deferred because
the x86 maintainers were a little busy in the last months.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlpxcVoLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN/Lw/+Je9teM4NPQ8lU/ncbJN/bUzCFGJ6dFt2eVX/6xs3
sfl8vBdeHt6CBM02rRNecEr31z3+orjQes5JnlEJFYeG3jumV0zCPw/zbxqjzbJ1
3n6cckLxbxzy8Ca1G/BVjHLAUX5eWp1ujn/Q4d03VKVQZhJvFYlqDbP3TrNVx7xn
k86u37p/o+ngjwX66UdZ3C4iIBF8zqy6n2kkpv4HUQtHHzPwEvliN39eNilovb56
iGOzjDX1UWHAu4xCTVnPHSG4fA4XU41NWzIN3DIVPE25lYSISSl9TFAdR8GeZA0G
0Yj6sW53pRSoUwco1ocoS44/FgrPOB5/vHIL06pABvicXBiomje1QylqcK7zAczk
esjkfPEZrmZuu99GtqFyDNKEvKKdy+aBGaTZ3y+NxsuBs+0xS2Owz1IE4Tk28xaw
xh7zn+CVdk2fJh6ZIdw5Eu9b9VN08UriqDmDzO/ylDlcNGcDi7wcxiSTEkHJ1ON/
g9nletV6f3egL0wljDcOnhCJCHTvmWEeq3z8lE55QzPzSH0hHpnGQ2WD0tKrroxz
kjOZp0TdXa4F5iysOHe2xl2sftOH0zIkBQJ+oBcK12mTaLu21+yeuCggQXJ/CBdk
1Ol7l9g9T0TDuZPfiTHt5+6jmECQs92LElWA8x7uF7Fpix3BpnafWaaSMSsosF3F
D1Y=
=Nrl9
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping
Pull dma mapping updates from Christoph Hellwig:
"Except for a runtime warning fix from Christian this is all about
consolidation of the generic no-IOMMU code, a well as the glue code
for swiotlb.
All the code is based on the x86 implementation with hooks to allow
all architectures that aren't cache coherent to use it.
The x86 conversion itself has been deferred because the x86
maintainers were a little busy in the last months"
* tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping: (57 commits)
MAINTAINERS: add the iommu list for swiotlb and xen-swiotlb
arm64: use swiotlb_alloc and swiotlb_free
arm64: replace ZONE_DMA with ZONE_DMA32
mips: use swiotlb_{alloc,free}
mips/netlogic: remove swiotlb support
tile: use generic swiotlb_ops
tile: replace ZONE_DMA with ZONE_DMA32
unicore32: use generic swiotlb_ops
ia64: remove an ifdef around the content of pci-dma.c
ia64: clean up swiotlb support
ia64: use generic swiotlb_ops
ia64: replace ZONE_DMA with ZONE_DMA32
swiotlb: remove various exports
swiotlb: refactor coherent buffer allocation
swiotlb: refactor coherent buffer freeing
swiotlb: wire up ->dma_supported in swiotlb_dma_ops
swiotlb: add common swiotlb_map_ops
swiotlb: rename swiotlb_free to swiotlb_exit
x86: rename swiotlb_dma_ops
powerpc: rename swiotlb_dma_ops
...
Pull siginfo cleanups from Eric Biederman:
"Long ago when 2.4 was just a testing release copy_siginfo_to_user was
made to copy individual fields to userspace, possibly for efficiency
and to ensure initialized values were not copied to userspace.
Unfortunately the design was complex, it's assumptions unstated, and
humans are fallible and so while it worked much of the time that
design failed to ensure unitialized memory is not copied to userspace.
This set of changes is part of a new design to clean up siginfo and
simplify things, and hopefully make the siginfo handling robust enough
that a simple inspection of the code can be made to ensure we don't
copy any unitializied fields to userspace.
The design is to unify struct siginfo and struct compat_siginfo into a
single definition that is shared between all architectures so that
anyone adding to the set of information shared with struct siginfo can
see the whole picture. Hopefully ensuring all future si_code
assignments are arch independent.
The design is to unify copy_siginfo_to_user32 and
copy_siginfo_from_user32 so that those function are complete and cope
with all of the different cases documented in signinfo_layout. I don't
think there was a single implementation of either of those functions
that was complete and correct before my changes unified them.
The design is to introduce a series of helpers including
force_siginfo_fault that take the values that are needed in struct
siginfo and build the siginfo structure for their callers. Ensuring
struct siginfo is built correctly.
The remaining work for 4.17 (unless someone thinks it is post -rc1
material) is to push usage of those helpers down into the
architectures so that architecture specific code will not need to deal
with the fiddly work of intializing struct siginfo, and then when
struct siginfo is guaranteed to be fully initialized change copy
siginfo_to_user into a simple wrapper around copy_to_user.
Further there is work in progress on the issues that have been
documented requires arch specific knowledge to sort out.
The changes below fix or at least document all of the issues that have
been found with siginfo generation. Then proceed to unify struct
siginfo the 32 bit helpers that copy siginfo to and from userspace,
and generally clean up anything that is not arch specific with regards
to siginfo generation.
It is a lot but with the unification you can of siginfo you can
already see the code reduction in the kernel"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits)
signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr
mm/memory_failure: Remove unused trapno from memory_failure
signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed
signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap
signal: Helpers for faults with specialized siginfo layouts
signal: Add send_sig_fault and force_sig_fault
signal: Replace memset(info,...) with clear_siginfo for clarity
signal: Don't use structure initializers for struct siginfo
signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered
ptrace: Use copy_siginfo in setsiginfo and getsiginfo
signal: Unify and correct copy_siginfo_to_user32
signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32
signal: Unify and correct copy_siginfo_from_user32
signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED
signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h
signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h
signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h
signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h
signal/powerpc: Remove redefinition of NSIGTRAP on powerpc
signal: Move addr_lsb into the _sigfault union for clarity
...
-----BEGIN PGP SIGNATURE-----
iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv
QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU
0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr
pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf
NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq
A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q
f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG
PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh
BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F
0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ
jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG
tdfUsev1Mc8=
=jhWg
-----END PGP SIGNATURE-----
Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull init_task initializer cleanups from David Howells:
"It doesn't seem useful to have the init_task in a header file rather
than in a normal source file. We could consolidate init_task handling
instead and expand out various macros.
Here's a series of patches that consolidate init_task handling:
(1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
openrisc.
(2) Alter the INIT_TASK_DATA linker script macro to set
init_thread_union and init_stack rather than defining these in C.
Insert init_task and init_thread_into into the init_stack area in
the linker script as appropriate to the configuration, with
different section markers so that they end up correctly ordered.
We can then get merge ia64's init_task.c into the main one.
We then have a bunch of single-use INIT_*() macros that seem only
to be macros because they used to be used per-arch. We can then
expand these in place of the user and get rid of a few lines and
a lot of backslashes.
(3) Expand INIT_TASK() in place.
(4) Expand in place various small INIT_*() macros that are defined
conditionally. Expand them and surround them by #if[n]def/#endif
in the .c file as it takes fewer lines.
(5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
(6) Expand INIT_STRUCT_PID in place.
These macros can then be discarded"
* tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
Expand INIT_STRUCT_PID and remove
Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
Expand various INIT_* macros and remove
Expand INIT_TASK() in init/init_task.c and remove
Construct init thread stack in the linker script rather than by union
openrisc: Make THREAD_SIZE available to vmlinux.lds
hexagon: Make THREAD_SIZE available to vmlinux.lds
cris: Make THREAD_SIZE available to vmlinux.lds
The patch modifies the previously defined GISA data structure to be
able to store two GISA formats, format-0 and format-1. Additionally,
it verifies the availability of the GISA format facility and enables
the use of a format-1 GISA in the SIE control block accordingly.
A format-1 can do everything that format-0 can and we will need it
for real HW passthrough. As there are systems with only format-0
we keep both variants.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The GISA format facility is required by the host to be able to process
a format-1 GISA. If not available, the used GISA format will be format-0.
All format-1 related extension will not be available in this case.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The adapter interruption virtualization (AIV) facility is an
optional facility that comes with functionality expected to increase
the performance of adapter interrupt handling for both emulated and
passed-through adapter interrupts. With AIV, adapter interrupts can be
delivered to the guest without exiting SIE.
This patch provides some preparations for using AIV for emulated adapter
interrupts (including virtio) if it's available. When using AIV, the
interrupts are delivered at the so called GISA by setting the bit
corresponding to its Interruption Subclass (ISC) in the Interruption
Pending Mask (IPM) instead of inserting a node into the floating interrupt
list.
To keep the change reasonably small, the handling of this new state is
deferred in get_all_floating_irqs and handle_tpi. This patch concentrates
on the code handling enqueuement of emulated adapter interrupts, and their
delivery to the guest.
Note that care is still required for adapter interrupts using AIV,
because there is no guarantee that AIV is going to deliver the adapter
interrupts pending at the GISA (consider all vcpus idle). When delivering
GISA adapter interrupts by the host (usual mechanism) special attention
is required to honor interrupt priorities.
Empirical results show that the time window between making an interrupt
pending at the GISA and doing kvm_s390_deliver_pending_interrupts is
sufficient for a guest with at least moderate cpu activity to get adapter
interrupts delivered within the SIE, and potentially save some SIE exits
(if not other deliverable interrupts).
The code will be activated with a follow-up patch.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The patch adds an indication for the presence Adapter Interruption
Virtualization facility (AIV) of the general channel subsystem
characteristics.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[change wording]
This patch adds a MSB0 bit numbering version of test_and_clear_bit().
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In preperation to support pass-through adapter interrupts, the Guest
Interruption State Area (GISA) and the Adapter Interruption Virtualization
(AIV) features will be introduced here.
This patch introduces format-0 GISA (that is defines the struct describing
the GISA, allocates storage for it, and introduces fields for the
GISA address in kvm_s390_sie_block and kvm_s390_vsie).
As the GISA requires storage below 2GB, it is put in sie_page2, which is
already allocated in ZONE_DMA. In addition, The GISA requires alignment to
its integral boundary. This is already naturally aligned via the
padding in the sie_page2.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch prepares a simplification of bit operations between the irq
pending mask for emulated interrupts and the Interruption Pending Mask
(IPM) which is part of the Guest Interruption State Area (GISA), a feature
that allows interrupt delivery to guests by means of the SIE instruction.
Without that change, a bit-wise *or* operation on parts of these two masks
would either require a look-up table of size 256 bytes to map the IPM
to the emulated irq pending mask bit orientation (all bits mirrored at half
byte) or a sequence of up to 8 condidional branches to perform tests of
single bit positions. Both options are to be rejected either by performance
or space utilization reasons.
Beyond that this change will be transparent.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The overall instruction counter is larger than the sum of the
single counters. We should try to catch all instruction handlers
to make this match the summary counter.
Let us add sck,tb,sske,iske,rrbe,tb,tpi,tsch,lpsw,pswe....
and remove other unused ones.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in
'net'.
The esp{4,6}_offload.c conflicts were overlapping changes.
The 'out' label is removed so we just return ERR_PTR(-EINVAL)
directly.
Signed-off-by: David S. Miller <davem@davemloft.net>