The 64 bit system call table contains three entries that come without
a matching NR_<name> entry in unistd.h. In fact all three of them do
not make sense on 64 bit, but only for compat processes.
llseek and mmap2 were specifically introduced for 32 bit / compat
processes. getrlimit is wired up twice, so that only the entry that
comes with a corresponding NR_getrlimit needs to be kept.
The other entries can be removed, since it seems very unlikely that
this will break user space.
Reported-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Changed capability number to 152. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- add the virtio-ccw transport for kvmconfig
- more debug tracing for cpu model
- cleanups and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJaXhcpAAoJEBF7vIC1phx8YdwP/1FYC24FZVqKZ3NO4ItSh7xc
QdithL2dqfeudmwc/nU6AilMbvgTdR6QmWOICh7fc2HklrIxqkFcjZeHDe2mp5NB
aI1WVtt3EpqZWsimXUkWYUY0Az3DF36Yc/vYw7ubUvPzb5aN9c7G666ADfUwgIjP
IgFgqyEKeT7uP5KVF5Ysz/WaYSGY1BsbwfNfWWjWYQgcj77cA4FkBrM4Krq7GYsO
sGI/IeI9RjtNyExLljpV/eg1rfO6iV+9k8QR4DOYccHooG3tZNhRTbOWTIbvDQir
ryoDeAe2ndDa6BpWDPWRjsricq53+hXuDhx344hro15Uiv949cNMS5d6UFsAnuHR
JYoX/TLmqaETTEC2krn0OgviEU7RcEUAaiEbdegHRTgCNVsYnxoqO91OMudaiyml
zyzUKQYt73t2rBsciRPi3p+nSe6i56uE2yvAi1HtKSM5JMJweVp0VYsQB/0MTFnz
8VIrQjWhj/GEbUufHwWTTwPvEy1Aj9yr4xM6Jxe+C0hnFnB9n2BQQr89QWLkLt2L
0YGviq17Xbk3dgvhp28wY6kPTYipY3VJy2MiyH5DZDY9+5MsMo2VY/y6GyXEe4HZ
ycGyRdvyyNxwiAOI7NVHQYufiVjcdX4kV9uKC6VcfB2tcJF16l3s3u60EE324+t5
lf1LrFVP0xgBrKfAA8SV
=Cc57
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: Fixes and features for 4.16
- add the virtio-ccw transport for kvmconfig
- more debug tracing for cpu model
- cleanups and fixes
"wq" is not used at all. "cpuflags" can be access directly via the vcpu,
just as "float_int" via vcpu->kvm.
While at it, reuse _set_cpuflag() to make the code look nicer.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180108193747.10818-1-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
It is not required to take to a lock to protect access to the cpuflags
of the local interrupt structure of a vcpu as the performed operation
is an atomic_or.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The cpu model already traces the cpu facilities, the ibc and
guest CPU ids. We should do the same for the cpu features (on
success only).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
commit a03825bbd0 ("KVM: s390: use kvm->created_vcpus") introduced
kvm->created_vcpus to avoid races with the existing kvm->online_vcpus
scheme. One place was "forgotten" and one new place was "added".
Let's fix those.
Reported-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Fixes: 4e0b1ab72b ("KVM: s390: gs support for kvm guests")
Fixes: a03825bbd0 ("KVM: s390: use kvm->created_vcpus")
gmap_mprotect_notify() refuses shadow gmaps. Turns out that
a) gmap_protect_range()
b) gmap_read_table()
c) gmap_pte_op_walk()
Are never called for gmap shadows. And never should be. This dates back
to gmap shadow prototypes where we allowed to call mprotect_notify() on
the gmap shadow (to get notified about the prefix pages getting removed).
This is avoided by always getting notified about any change on the gmap
shadow.
The only real function for walking page tables on shadow gmaps is
gmap_table_walk().
So, essentially, these functions should never get called and
gmap_pte_op_walk() can be cleaned up. Add some checks to callers of
gmap_pte_op_walk().
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20171110151805.7541-1-david@redhat.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Among the existing architecture specific versions of
copy_siginfo_to_user32 there are several different implementation
problems. Some architectures fail to handle all of the cases in in
the siginfo union. Some architectures perform a blind copy of the
siginfo union when the si_code is negative. A blind copy suggests the
data is expected to be in 32bit siginfo format, which means that
receiving such a signal via signalfd won't work, or that the data is
in 64bit siginfo and the code is copying nonsense to userspace.
Create a single instance of copy_siginfo_to_user32 that all of the
architectures can share, and teach it to handle all of the cases in
the siginfo union correctly, with the assumption that siginfo is
stored internally to the kernel is 64bit siginfo format.
A special case is made for x86 x32 format. This is needed as presence
of both x32 and ia32 on x86_64 results in two different 32bit signal
formats. By allowing this small special case there winds up being
exactly one code base that needs to be maintained between all of the
architectures. Vastly increasing the testing base and the chances of
finding bugs.
As the x86 copy of copy_siginfo_to_user32 the call of the x86
signal_compat_build_tests were moved into sigaction_compat_abi, so
that they will keep running.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The function copy_siginfo_from_user32 is used for two things, in ptrace
since the dawn of siginfo for arbirarily modifying a signal that
user space sees, and in sigqueueinfo to send a signal with arbirary
siginfo data.
Create a single copy of copy_siginfo_from_user32 that all architectures
share, and teach it to handle all of the cases in the siginfo union.
In the generic version of copy_siginfo_from_user32 ensure that all
of the fields in siginfo are initialized so that the siginfo structure
can be safely copied to userspace if necessary.
When copying the embedded sigval union copy the si_int member. That
ensures the 32bit values passes through the kernel unchanged.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c
Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in
linux/compat.h
CONFIG_X86_X32 is set when the user requests X32 support.
CONFIG_X86_X32_ABI is set when the user requests X32 support
and the tool-chain has X32 allowing X32 support to be built.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The trivial direct mapping implementation already does a virtual to
physical translation which isn't strictly a noop, and will soon learn
to do non-direct but linear physical to dma translations through the
device offset and a few small tricks. Rename it to a better fitting
name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
For architectures that just use the generic dma_noop_ops we can provide
a generic version of dma-mapping.h.
Signed-off-by: Christoph Hellwig <hch@lst.de>
We need to consistently enforce that keyed hashes cannot be used without
setting the key. To do this we need a reliable way to determine whether
a given hash algorithm is keyed or not. AF_ALG currently does this by
checking for the presence of a ->setkey() method. However, this is
actually slightly broken because the CRC-32 algorithms implement
->setkey() but can also be used without a key. (The CRC-32 "key" is not
actually a cryptographic key but rather represents the initial state.
If not overridden, then a default initial state is used.)
Prepare to fix this by introducing a flag CRYPTO_ALG_OPTIONAL_KEY which
indicates that the algorithm has a ->setkey() method, but it is not
required to be called. Then set it on all the CRC-32 algorithms.
The same also applies to the Adler-32 implementation in Lustre.
Also, the cryptd and mcryptd templates have to pass through the flag
from their underlying algorithm.
Cc: stable@vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Construct the init thread stack in the linker script rather than doing it
by means of a union so that ia64's init_task.c can be got rid of.
The following symbols are then made available from INIT_TASK_DATA() linker
script macro:
init_thread_union
init_stack
INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
size of the init stack. init_thread_union is given its own section so that
it can be placed into the stack space in the right order. I'm assuming
that the ia64 ordering is correct and that the task_struct is first and the
thread_info second.
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Will Deacon <will.deacon@arm.com> (arm64)
Tested-by: Palmer Dabbelt <palmer@sifive.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can just pass this on instead of having to do a radix tree lookup
without proper locking a few levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
We can just pass this on instead of having to do a radix tree lookup
without proper locking 2 levels into the callchain.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
s390:
* Two fixes for potential bitmap overruns in the cmma migration code
x86:
* Clear guest provided GPRs to defeat the Project Zero PoC for CVE
2017-5715
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaUTJ4AAoJEED/6hsPKofohk0IAJAFlMG66u5MxC0kSM61U4Zf
1vkzRwAkBbcN82LpGQKbqabVyTq0F3aLipyOn6WO5SN0K5m+OI2OV/aAroPyX8bI
F7nWIqTXLhJ9X6KXINFvyavHMprvWl8PA72tR/B/7GhhfShrZ2wGgqhl0vv/kCUK
/8q+5e693yJqw8ceemin9a6kPJrLpmjeH+Oy24KIlGbvJWV4UrIE86pRHnAnBtg8
L7Vbxn5+ezKmakvBh+zF8NKcD1zHDcmQZHoYFPsQT0vX5GPoYqT2bcO6gsh1Grmp
8ti6KkrnP+j2A/OEna4LBWfwKI/1xHXneB22BYrAxvNjHt+R4JrjaPpx82SEB4Y=
=URMR
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"s390:
- Two fixes for potential bitmap overruns in the cmma migration code
x86:
- Clear guest provided GPRs to defeat the Project Zero PoC for CVE
2017-5715"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: vmx: Scrub hardware GPRs at VM-exit
KVM: s390: prevent buffer overrun on memory hotplug during migration
KVM: s390: fix cmma migration for multiple memory slots
Pull s390 fixes from Martin Schwidefsky:
"Four bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: fix wrongly assigned configuration data
s390: fix preemption race in disable_sacf_uaccess
s390/sclp: disable FORTIFY_SOURCE for early sclp code
s390/pci: handle insufficient resources during dma tlb flush
Now that every architecture is using the generic clkdev.h file
and we no longer include asm/clkdev.h anywhere in the tree, we
can remove it.
Cc: Russell King <linux@armlinux.org.uk>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
With subcode 0x24, diag26c returns all sorts of VNIC-related information.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of overlapping changes. Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.
Include a necessary change by Jakub Kicinski, with log message:
====================
cls_bpf no longer takes care of offload tracking. Make sure
netdevsim performs necessary checks. This fixes a warning
caused by TC trying to remove a filter it has not added.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We must not go beyond the pre-allocated buffer. This can happen when
a new memory slot is added during migration.
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # 4.13+
Fixes: 190df4a212 (KVM: s390: CMMA tracking, ESSA emulation, migration mode)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
When multiple memory slots are present the cmma migration code
does not allocate enough memory for the bitmap. The memory slots
are sorted in reverse order, so we must use gfn and size of
slot[0] instead of the last one.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 4.13+
Fixes: 190df4a212 (KVM: s390: CMMA tracking, ESSA emulation, migration mode)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2017-12-18
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Allow arbitrary function calls from one BPF function to another BPF function.
As of today when writing BPF programs, __always_inline had to be used in
the BPF C programs for all functions, unnecessarily causing LLVM to inflate
code size. Handle this more naturally with support for BPF to BPF calls
such that this __always_inline restriction can be overcome. As a result,
it allows for better optimized code and finally enables to introduce core
BPF libraries in the future that can be reused out of different projects.
x86 and arm64 JIT support was added as well, from Alexei.
2) Add infrastructure for tagging functions as error injectable and allow for
BPF to return arbitrary error values when BPF is attached via kprobes on
those. This way of injecting errors generically eases testing and debugging
without having to recompile or restart the kernel. Tags for opting-in for
this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.
3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
call for XDP programs. First part of this work adds handling of BPF
capabilities included in the firmware, and the later patches add support
to the nfp verifier part and JIT as well as some small optimizations,
from Jakub.
4) The bpftool now also gets support for basic cgroup BPF operations such
as attaching, detaching and listing current BPF programs. As a requirement
for the attach part, bpftool can now also load object files through
'bpftool prog load'. This reuses libbpf which we have in the kernel tree
as well. bpftool-cgroup man page is added along with it, from Roman.
5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
a single perf event") added support for attaching multiple BPF programs
to a single perf event. Given they are configured through perf's ioctl()
interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
command in this work in order to return an array of one or multiple BPF
prog ids that are currently attached, from Yonghong.
6) Various minor fixes and cleanups to the bpftool's Makefile as well
as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
itself or prior installed documentation related to it, from Quentin.
7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
required for the test_dev_cgroup test case to run, from Naresh.
8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.
9) Fix libbpf's exit code from the Makefile when libelf was not found in
the system, also from Jakub.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2017-12-17
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a corner case in generic XDP where we have non-linear skbs
but enough tailroom in the skb to not miss to linearizing there,
from Song.
2) Fix BPF JIT bugs in s390x and ppc64 to not recache skb data when
BPF context is not skb, from Daniel.
3) Fix a BPF JIT bug in sparc64 where recaching skb data after helper
call would use the wrong register for the skb, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
global bpf_jit_enable variable is tested multiple times in JITs,
blinding and verifier core. The malicious root can try to toggle
it while loading the programs. This race condition was accounted
for and there should be no issues, but it's safer to avoid
this race condition.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This reverts commits 5c9d2d5c26, c7da82b894, and e7fe7b5cae.
We'll probably need to revisit this, but basically we should not
complicate the get_user_pages_fast() case, and checking the actual page
table protection key bits will require more care anyway, since the
protection keys depend on the exact state of the VM in question.
Particularly when doing a "remote" page lookup (ie in somebody elses VM,
not your own), you need to be much more careful than this was. Dave
Hansen says:
"So, the underlying bug here is that we now a get_user_pages_remote()
and then go ahead and do the p*_access_permitted() checks against the
current PKRU. This was introduced recently with the addition of the
new p??_access_permitted() calls.
We have checks in the VMA path for the "remote" gups and we avoid
consulting PKRU for them. This got missed in the pkeys selftests
because I did a ptrace read, but not a *write*. I also didn't
explicitly test it against something where a COW needed to be done"
It's also not entirely clear that it makes sense to check the protection
key bits at this level at all. But one possible eventual solution is to
make the get_user_pages_fast() case just abort if it sees protection key
bits set, which makes us fall back to the regular get_user_pages() case,
which then has a vma and can do the check there if we want to.
We'll see.
Somewhat related to this all: what we _do_ want to do some day is to
check the PAGE_USER bit - it should obviously always be set for user
pages, but it would be a good check to have back. Because we have no
generic way to test for it, we lost it as part of moving over from the
architecture-specific x86 GUP implementation to the generic one in
commit e585513b76 ("x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation").
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The assumption of unconditionally reloading skb pointers on
BPF helper calls where bpf_helper_changes_pkt_data() holds
true is wrong. There can be different contexts where the
BPF helper would enforce a reload such as in case of XDP.
Here, we do have a struct xdp_buff instead of struct sk_buff
as context, thus this will access garbage.
JITs only ever need to deal with cached skb pointer reload
when ld_abs/ind was seen, therefore guard the reload behind
SEEN_SKB only. Tested on s390x.
Fixes: 9db7f2b818 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With CONFIG_PREEMPT=y there is a possible race in disable_sacf_uaccess.
The new set_fs value needs to be stored the the task structure first,
the control register update needs to be second. Otherwise a preemptive
schedule may interrupt the code right after the control register update
has been done and the next time the task is scheduled we get an incorrect
value in the control register due to the old set_fs setting.
Fixes: 0aaba41b58 ("s390: remove all code using the access register mode")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the vcpu_load/vcpu_put pushdown, the handling of asynchronous VCPU
ioctl is already much clearer in that it is obvious that they bypass
vcpu_load and vcpu_put.
However, it is still not perfect in that the different state of the VCPU
mutex is still hidden in the caller. Separate those ioctls into a new
function kvm_arch_vcpu_async_ioctl that returns -ENOIOCTLCMD for more
"traditional" synchronous ioctls.
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Suggested-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the calls to vcpu_load() and vcpu_put() in to the architecture
specific implementations of kvm_arch_vcpu_ioctl() which dispatches
further architecture-specific ioctls on to other functions.
Some architectures support asynchronous vcpu ioctls which cannot call
vcpu_load() or take the vcpu->mutex, because that would prevent
concurrent execution with a running VCPU, which is the intended purpose
of these ioctls, for example because they inject interrupts.
We repeat the separate checks for these specifics in the architecture
code for MIPS, S390 and PPC, and avoid taking the vcpu->mutex and
calling vcpu_load for these ioctls.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_fpu().
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_fpu().
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_guest_debug().
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_mpstate().
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_mpstate().
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_sregs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_sregs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_set_regs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_get_regs().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move vcpu_load() and vcpu_put() into the architecture specific
implementations of kvm_arch_vcpu_ioctl_run().
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390 parts
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Rebased. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Revise and add CFI CFA and register rule annotations to the vDSO
functions for proper stack unwinding and debugging.
Because glibc might call the vDSO in special ways, the vDSO code
does not rely on a stack frame created by the caller. The TOD clock
value can be therefore not stored in the pre-allocated stack area
and additional stack space is required.
To correctly annotate these situations with CFI, the .cfi_val_offset
directive is required to create relative offsets on the value of the
stack register %r15. Because the .cfi_val_offset directive is
available with recent GNU assembler versions only, additional checks
are necessary.
Note that if the vDSO is assembled with an older assembler version,
stack unwinding and debugging from within the vDSO code might not
be possible.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Using perf probe and libdw on kernel modules failed to find CFI
data for symbols. The CFI data is stored in the .eh_frame section.
The elfutils libdw is not able to extract the CFI data correctly,
because the .eh_frame section requires "non-simple" relocations
for kernel modules.
The suggestion is to avoid these "non-simple" relocations by emitting
the CFI data in the .debug_frame section. Let gcc emit respective
directives by specifying the -fno-asynchronous-unwind-tables option.
Using the .debug_frame section for CFI data, the .eh_frame section
becomes unused and, thus, discard it for kernel and modules builds
The vDSO requires the .eh_frame section and, hence, emit the CFI data
in both, the .eh_frame and .debug_frame sections.
See also discussion on elfutils/libdw bugzilla:
https://sourceware.org/bugzilla/show_bug.cgi?id=22452
Suggested-by: Mark Wielaard <mark@klomp.org>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In a virtualized setup lazy flushing can lead to the hypervisor
running out of resources when lots of guest pages need to be
pinned. In this situation simply trigger a global flush to give
the hypervisor a chance to free some of these resources.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ARM:
* A number of issues in the vgic discovered using SMATCH
* A bit one-off calculation in out stage base address mask (32-bit and
64-bit)
* Fixes to single-step debugging instructions that trap for other
reasons such as MMMIO aborts
* Printing unavailable hyp mode as error
* Potential spinlock deadlock in the vgic
* Avoid calling vgic vcpu free more than once
* Broken bit calculation for big endian systems
s390:
* SPDX tags
* Fence storage key accesses from problem state
* Make sure that irq_state.flags is not used in the future
x86:
* Intercept port 0x80 accesses to prevent host instability (CVE)
* Use userspace FPU context for guest FPU (mainly an optimization that
fixes a double use of kernel FPU)
* Do not leak one page per module load
* Flush APIC page address cache from MMU invalidation notifiers
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaLA93AAoJEED/6hsPKofo9msH/2DrqT2FOKfLuxNR2FeUGWr3
lqFoBRUXrVDMINGStnWrV36h/xYzlgJl9jtSDS8dr3VxLqtrNLlDg9NmGeogoZ+k
/xewr/jFYoSRfffsvrbkzORUfvu6zqvJwufiwBEJwAfcswiLqPizdFXcxtUL4eZE
9s9sIweo5zp2Xjg5yLOEkyanePKMEht/81zPkHyM+g0ZMoaPam3qZHA0lLzdyRgd
G9LpSyiMFHguYYgbwipaVue3zgMY1EdmKQ8C2hEPmZd8nVau26YDwRnAwwLrmVkW
sFhGO1Xi18TzQPokzALC25c9v0fqgxL5+fNyFNgWwTc2n9PSwO+IHcy699UH+3A=
=Qcqd
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Radim Krčmář:
"ARM:
- A number of issues in the vgic discovered using SMATCH
- A bit one-off calculation in out stage base address mask (32-bit
and 64-bit)
- Fixes to single-step debugging instructions that trap for other
reasons such as MMMIO aborts
- Printing unavailable hyp mode as error
- Potential spinlock deadlock in the vgic
- Avoid calling vgic vcpu free more than once
- Broken bit calculation for big endian systems
s390:
- SPDX tags
- Fence storage key accesses from problem state
- Make sure that irq_state.flags is not used in the future
x86:
- Intercept port 0x80 accesses to prevent host instability (CVE)
- Use userspace FPU context for guest FPU (mainly an optimization
that fixes a double use of kernel FPU)
- Do not leak one page per module load
- Flush APIC page address cache from MMU invalidation notifiers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits)
KVM: x86: fix APIC page invalidation
KVM: s390: Fix skey emulation permission check
KVM: s390: mark irq_state.flags as non-usable
KVM: s390: Remove redundant license text
KVM: s390: add SPDX identifiers to the remaining files
KVM: VMX: fix page leak in hardware_setup()
KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
x86,kvm: remove KVM emulator get_fpu / put_fpu
x86,kvm: move qemu/guest FPU switching out to vcpu_run
KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion
KVM: arm/arm64: kvm_arch_destroy_vm cleanups
KVM: arm/arm64: Fix spinlock acquisition in vgic_set_owner
kvm: arm: don't treat unavailable HYP mode as an error
KVM: arm/arm64: Avoid attempting to load timer vgic state without a vgic
kvm: arm64: handle single-step of hyp emulated mmio instructions
kvm: arm64: handle single-step during SError exceptions
kvm: arm64: handle single-step of userspace mmio instructions
kvm: arm64: handle single-stepping trapped instructions
KVM: arm/arm64: debug: Introduce helper for single-step
arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
...
Pull networking fixes from David Miller:
1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
drivers).
2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
to userspace and broke some apps. From Johannes Berg.
3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
Wang.
4) Gianfar MAC can't do EEE so don't advertise it by default, from
Claudiu Manoil.
5) Relax strict netlink attribute validation, but emit a warning. From
David Ahern.
6) Fix regression in checksum offload of thunderx driver, from Florian
Westphal.
7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
8) New card support in iwlwifi, from Ihab Zhaika.
9) BBR congestion control bug fixes from Neal Cardwell.
10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
net: mvpp2: fix the RSS table entry offset
tcp: evaluate packet losses upon RTT change
tcp: fix off-by-one bug in RACK
tcp: always evaluate losses in RACK upon undo
tcp: correctly test congestion state in RACK
bnxt_en: Fix sources of spurious netpoll warnings
tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
tcp_bbr: reset full pipe detection on loss recovery undo
tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
sfc: pass valid pointers from efx_enqueue_unwind
gianfar: Disable EEE autoneg by default
tcp: invalidate rate samples during SACK reneging
can: peak/pcie_fd: fix potential bug in restarting tx queue
can: usb_8dev: cancel urb on -EPIPE and -EPROTO
can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
can: esd_usb2: cancel urb on -EPIPE and -EPROTO
can: ems_usb: cancel urb on -EPIPE and -EPROTO
can: mcba_usb: cancel urb on -EPROTO
usbnet: fix alignment for frames with no ethernet header
tcp: use current time in tcp_rcv_space_adjust()
...
When wiring up the socket system calls the compat entries were
incorrectly set. Not all of them point to the corresponding compat
wrapper functions, which clear the upper 33 bits of user space
pointers, like it is required.
Fixes: 977108f89c ("s390: wire up separate socketcalls system calls")
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- SPDX tags
- Fence storage key accesses from problem state
- Make sure that irq_state.flags is not used in the future
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJaJ6rwAAoJEBF7vIC1phx8QDIP/jUgH9OpLpg+bhrEUvB7e83X
sAQuKv7jVTTEpZ6mTkqjPihdiRC72x1spzz5ACw5XaZ9NoxMiOtABFBtiaTfoYz+
l6eu+qOqk6lIn7n/WR8VQFSuizHa1VjnQiuco/GEUE+3FOZQPVE/u8gpNsjvWwfV
gB+45oTCF24LZEgPAotPglMWOtbxjauMmqHkUh3jDgsk0bFCGWe+MR+T3ljIZ45M
/6JBDchEibCsfkg4/ck0HjnQ3p9J4gfictAKJWeYgNh/4oB2krId9FNYbxXkOFNX
1+zeurttmRuFjFwVdCD6SoxE0PQTYXnL/hisITxRfX5otoXQ/x5PffiBrXIZucWK
fZZvPX0MBNNzIx1UvCaJ8bKEmtzXdGuy5mpzX84kJNqCIkqft/bFrOPAf/p/Nrrv
4RoF00FH6ZZdxPD3rLkBYSs//P6lTEivkrMHGFndHrJc844pVTEN45lTg0ngOcmF
aOBbpZQl6etRwobWJdye76OuVszadoECYrnLPP+fFgWjFqp0F3b9Ki1WkSPsZ4E1
isXp/tYRA+/0tZPBQT297tuUXv7c0ID2SROIUvgQt2yC2EdrizWvXsl2QCGsfbxL
8jT5AsPg0U3qUeBAUZP6gtdIAIuN5lj75uOM83CEcPTo4fOkGuvCnHI0LSHz6Ooc
oz2Z8aZAENKd+3193FnO
=uodY
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: Fixes for 4.15
- SPDX tags
- Fence storage key accesses from problem state
- Make sure that irq_state.flags is not used in the future
All skey functions call skey_check_enable at their start, which checks
if we are in the PSTATE and injects a privileged operation exception
if we are.
Unfortunately they continue processing afterwards and perform the
operation anyhow as skey_check_enable does not deliver an error if the
exception injection was successful.
Let's move the PSTATE check into the skey functions and exit them on
such an occasion, also we now do not enable skey handling anymore in
such a case.
Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: a7e19ab ("KVM: s390: handle missing storage-key facility")
Cc: <stable@vger.kernel.org> # v4.8+
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Old kernels did not check for zero in the irq_state.flags field and old
QEMUs did not zero the flag/reserved fields when calling
KVM_S390_*_IRQ_STATE. Let's add comments to prevent future uses of
these fields.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Now that the SPDX tag is in all arch/s390/kvm/ files, that identifies
the license in a specific and legally-defined manner. So the extra GPL
text wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20171124140043.10062-9-gregkh@linuxfoundation.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/kvm/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20171124140043.10062-3-gregkh@linuxfoundation.org>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Correct whitespace and coding style issues in the s390 asm/ptrace.h
uapi header file. This is preparatory work to copy it to the tools/
directory for inclusion by selftests and perf.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
To mitigate and correct the broken uapi for the BPF_PROG_TYPE_PERF_EVENT
program type, introduce a user_pt_regs structure (similar to arm64) that
exports parts from the beginnig of the pt_regs structure.
The export must start with the beginning of the pt_regs structure because
to correctly calculate BPF prologues for perf (regs_query_register_offset()).
For BPF_PROG_TYPE_PERF_EVENT program types, the BPF program is then passed
a user_pt_regs structure.
Note: Depending on future changes to the s390 pt_regs structure, consider
the user_pt_regs structure to be stable for a particular kernel version
only. (Of course, s390 tries to ensure keep it stable as much as possible.)
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Martin Cermak reported that setting a uprobe doesn't work. Reason for
this is that the common uprobes code tries to get an unmapped area at
the last possible page within an address space.
This broke with commit 1aea9b3f92 ("s390/mm: implement 5 level pages
tables") which introduced an off-by-one bug which prevents to map
anything at the last possible page within an address space.
The check with the off-by-one bug however can be removed since with
commit 8ab867cb08 ("s390/mm: fix BUG_ON in crst_table_upgrade") the
necessary check is done at both call sites.
Reported-by: Martin Cermak <mcermak@redhat.com>
Bisected-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Fixes: 1aea9b3f92 ("s390/mm: implement 5 level pages tables")
Cc: <stable@vger.kernel.org> # v4.13+
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
More files under arch/s390 have been tagged with the SPDX identifier,
a few of those files have a GPL license text. Remove the GPL text
as it is no longer needed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the correct SPDX license to a few more files under arch/s390 and
drivers/s390 which have been missed to far.
The SPDX identifier is a legally binding shorthand, which can be used
instead of the full boiler plate text.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The switch_to() macro has an optimization to avoid saving and
restoring register contents that aren't needed for kernel threads.
There is however the possibility that a kernel thread execve's a user
space program. In such a case the execve'd process can partially see
the contents of the previous process, which shouldn't be allowed.
To avoid this, simply always save and restore register contents on
context switch.
Cc: <stable@vger.kernel.org> # v2.6.37+
Fixes: fdb6d070ef ("switch_to: dont restore/save access & fpu regs for kernel threads")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The original intent of the virtio header relicensing
from 2008 was to make sure anyone can implement compatible
devices/drivers. The virtio-ccw was omitted by mistake.
We have an ack from the only contributor as well as the
maintainer from IBM, so it's not too late to fix that.
Make it dual-licensed with GPLv2, as the whole kernel is GPL2.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* PPC bugfix: HPT guests on a POWER9 radix host
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJaICi1AAoJEL/70l94x66DjvEIAIML/e9YX1YrJZi0rsB9cbm0
Le3o5b3wKxPrlZdnpOZQ2mVWubUQdiHMPGX6BkpgyiJWUchnbj5ql1gUf5S0i3jk
TOk6nae6DU94xBuboeqZJlmx2VfPY/fqzLWsX3HFHpnzRl4XvXL5o7cWguIxVcVO
yU6bPgbAXyXSBennLWZxC3aQ2Ojikr3uxZQpUZTAPOW5hFINpCKCpqJBMxsb67wq
rwI0cJhRl92mHpbe8qeNJhavqY5eviy9iPUaZrOW9P4yw1uqjTAjgsUc1ydiaZSV
rOHeKBOgVfY/KBaNJKyKySfuL1MJ+DLcQqm9RlGpKNpFIeB0vvSf0gtmmqIAXIk=
=kh2y
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
- x86 bugfixes: APIC, nested virtualization, IOAPIC
- PPC bugfix: HPT guests on a POWER9 radix host
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits)
KVM: Let KVM_SET_SIGNAL_MASK work as advertised
KVM: VMX: Fix vmx->nested freeing when no SMI handler
KVM: VMX: Fix rflags cache during vCPU reset
KVM: X86: Fix softlockup when get the current kvmclock
KVM: lapic: Fixup LDR on load in x2apic
KVM: lapic: Split out x2apic ldr calculation
KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts
KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP
KVM: x86: Fix CPUID function for word 6 (80000001_ECX)
KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2
KVM: x86: ioapic: Preserve read-only values in the redirection table
KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
KVM: x86: ioapic: Don't fire level irq when Remote IRR set
KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
KVM: x86: inject exceptions produced by x86_decode_insn
KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs
KVM: x86: fix em_fxstor() sleeping while in atomic
KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure
KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry
...
Pull s390 fixes from Martin Schwidefsky:
- SPDX identifiers are added to more of the s390 specific files.
- The ELF_ET_DYN_BASE base patch from Kees is reverted, with the change
some old 31-bit programs crash.
- Bug fixes and cleanups.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
s390/gs: add compat regset for the guarded storage broadcast control block
s390: revert ELF_ET_DYN_BASE base changes
s390: Remove redundant license text
s390: crypto: Remove redundant license text
s390: include: Remove redundant license text
s390: kernel: Remove redundant license text
s390: add SPDX identifiers to the remaining files
s390: appldata: add SPDX identifiers to the remaining files
s390: pci: add SPDX identifiers to the remaining files
s390: mm: add SPDX identifiers to the remaining files
s390: crypto: add SPDX identifiers to the remaining files
s390: kernel: add SPDX identifiers to the remaining files
s390: sthyi: add SPDX identifiers to the remaining files
s390: drivers: Remove redundant license text
s390: crypto: Remove redundant license text
s390: virtio: add SPDX identifiers to the remaining files
s390: scsi: zfcp_aux: add SPDX identifier
s390: net: add SPDX identifiers to the remaining files
s390: char: add SPDX identifiers to the remaining files
s390: cio: add SPDX identifiers to the remaining files
...
The 'access_permitted' helper is used in the gup-fast path and goes
beyond the simple _PAGE_RW check to also:
- validate that the mapping is writable from a protection keys
standpoint
- validate that the pte has _PAGE_USER set since all fault paths where
pud_write is must be referencing user-memory.
[dan.j.williams@intel.com: fix powerpc compile error]
Link: http://lkml.kernel.org/r/151129127237.37405.16073414520854722485.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/151043110453.2842.2166049702068628177.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In response to compile breakage introduced by a series that added the
pud_write helper to x86, Stephen notes:
did you consider using the other paradigm:
In arch include files:
#define pud_write pud_write
static inline int pud_write(pud_t pud)
.....
Then in include/asm-generic/pgtable.h:
#ifndef pud_write
tatic inline int pud_write(pud_t pud)
{
....
}
#endif
If you had, then the powerpc code would have worked ... ;-) and many
of the other interfaces in include/asm-generic/pgtable.h are
protected that way ...
Given that some architecture already define pmd_write() as a macro, it's
a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE.
Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
git commit e525f8a6e6
"s390/gs: add regset for the guarded storage broadcast control block"
added the missing regset to the s390_regsets array but failed to add it
to the s390_compat_regsets array.
Fixes: e525f8a6e6 ("add compat regset for the guarded storage broadcast control block")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
KVM API says for the signal mask you set via KVM_SET_SIGNAL_MASK, that
"any unblocked signal received [...] will cause KVM_RUN to return with
-EINTR" and that "the signal will only be delivered if not blocked by
the original signal mask".
This, however, is only true, when the calling task has a signal handler
registered for a signal. If not, signal evaluation is short-circuited for
SIG_IGN and SIG_DFL, and the signal is either ignored without KVM_RUN
returning or the whole process is terminated.
Make KVM_SET_SIGNAL_MASK behave as advertised by utilizing logic similar
to that in do_sigtimedwait() to avoid short-circuiting of signals.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit a73dc5370e.
Reducing the base address for 31-bit PIE executables from
(STACK_TOP/3)*2 to 4MB broke several compat programs which
use -fpie to move the executable out of the lower 16MB.
Cc: <stable@vger.kernel.org> # 4.13+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that the SPDX tag is in all arch/s390/ files, that identifies the
license in a specific and legally-defined manner. So the extra GPL text
wording in the remaining files can be removed as it is no longer needed
at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that the SPDX tag is in all arch/s390/crypto/ files, that identifies
the license in a specific and legally-defined manner. So the extra GPL
text wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that the SPDX tag is in all arch/s390/include/ files, that
identifies the license in a specific and legally-defined manner. So the
extra GPL text wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Halil Pasic <pasic@linux.vnet.ibm.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Now that the SPDX tag is in all arch/s390/kernel/ files, that identifies
the license in a specific and legally-defined manner. So the extra GPL
text wording can be removed as it is no longer needed at all.
This is done on a quest to remove the 700+ different ways that files in
the kernel describe the GPL license text. And there's unneeded stuff
like the address (sometimes incorrect) for the FSF which is never
needed.
No copyright headers or other non-license-description text was removed.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the remaining arch/s390/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/appldata/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/pci/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/mm/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/crypto/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/kernel/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/kernel/sthyi file with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
git commit badb8bb983 "fix alloc_pgste check in init_new_context" fixed
the problem of 'current->mm == NULL' in init_new_context back in 2011.
git commit 3eabaee998 "KVM: s390: allow sie enablement for multi-
threaded programs" completely removed the check against alloc_pgste.
git commit 23fefe119c "s390/kvm: avoid global config of vm.alloc_pgste=1"
re-added a check against the alloc_pgste flag but without the required
check for current->mm != NULL.
For execve() called by a kernel thread init_new_context() reads from
((struct mm_struct *) NULL)->context.alloc_pgste to decide between
2K vs 4K page tables. If the bit happens to be set for the init process
it will be created with large page tables. This decision is inherited by
all the children of init, this waste quite some memory.
Re-add the check for 'current->mm != NULL'.
Fixes: 23fefe119c ("s390/kvm: avoid global config of vm.alloc_pgste=1")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
176.718956 Krnl Code: 00000000004d38b0: a54c0018 llihh %r4,24
176.718956 00000000004d38b4: b9080014 agr %r1,%r4
^
Using a tab to align disassembly lines which follow the first line with
"Krnl Code: " doesn't always work, e.g. if there is a prefix (timestamp
or syslog prefix) which is not 8 chars aligned. Go back to alignment
with spaces.
Fixes: b192571d1a ("s390/disassembler: increase show_code buffer size")
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linux-s390@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
When searching the opcode offset table within find_insn() the check
"entry->opcode == 0" was intended to clarify that 1-byte opcodes, the
first one being 0, are special.
However there is no mnemonic for an illegal opcode starting with 0.
Therefore there is also no opcode offset table entry that matches,
which again means that the check never is true. Therefore just remove
the confusing check, and add a comment which hopefully explains how
this works.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If GCC_PLUGIN_RANDSTRUCT is enabled the members of task_struct will be
shuffled around. The offsets of the "pid" and "stack" members within
task_struct may not necessarily fit into 12 bits anymore, which causes
compile errors within __switch_to, since instructions are used, which
only have a 12 bit displacement field.
Therefore rework __switch_to, to allow for larger offsets.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 1887aa07b6
("s390/topology: add detection of dedicated vs shared CPUs")
introduced following compiler error when CONFIG_SCHED_TOPOLOGY is not set.
CC arch/s390/kernel/smp.o
...
arch/s390/kernel/smp.c: In function ‘smp_start_secondary’:
arch/s390/kernel/smp.c:812:6: error: implicit declaration of function
‘topology_cpu_dedicated’; did you mean ‘topology_cpu_init’?
This patch fixes the compiler error by adding function
topology_cpu_dedicated() to return false when this config option is
not defined.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull second round of s390 updates from Martin Schwidefsky:
- rework of the vdso code to avoid the use of the access register mode
- use perf AUX buffers for the transport of diagnostic sample data
- add perf_regs and user stack dump support
- enable perf call graphs for user space programs
- add perf register support for floating-point registers
- all remaining s390 related timer_setup conversions
- bug fixes and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (30 commits)
s390: remove unused parameter from Makefile
zfcp: purely mechanical update using timer API, plus blank lines
s390/scsi: Convert timers to use timer_setup()
s390/cpum_sf: correctly set the PID and TID in perf samples
s390/cpum_sf: load program parameter at sampler enablement
s390/perf: add perf register support for floating-point registers
s390/perf: extend perf_regs support to include floating-point registers
s390/perf: define common DWARF register string table
s390/perf: add support for perf_regs and libdw
s390/perf: add perf_regs support and user stack dump
s390/cpum_sf: do not register PMU if no sampling mode is authorized
s390/cpumf: remove raw event support in basic-only sampling mode
s390/perf: add callback to perf to enable using AUX buffer
s390/cpumf: enable using AUX buffer
s390/cpumf: introduce AUX buffer for dump diagnostic sample data
s390/disassembler: increase show_code buffer size
s390: Remove CONFIG_HARDENED_USERCOPY
s390: enable CPU alternatives unconditionally
s390/nmi: remove unused code
s390/mm: remove unused code
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDuoWAAoJEAAOaEEZVoIVXEQP/jQYoU9hgvEj8j3ZIgi56SDJ
pR45w2zcJz2/uU43DEKyShyLgsuoBbJQ3l/gGBH/tl+xGm9NzB0gatoEu9GmKNYz
/IN6/vUFnoIAUyD+iMZbpmsYKIkz0z2YJo261IfspAwIft/cvHJnYYGQrP9YXg9F
c7bdDuANTKocdQigc4BQyOe3OfIBGfTwJhuakO+1yuZmGOVNyxEcdYbMM8FiTfc8
+62kvQQ3t7WMqSbM8M0QdGcYQjG0EwcVAuV7COurLJIva7hUkVel32MVUjoFcf28
BnRu2ztFJCubm1HA85twlJDtpeXbcMqrUl/CcwRMpwDaePd5GVB1h5iKqbZ51BZ1
fWT2STmt+8hY2B5eiXoYEaG3B7ZRr+r0oroxqOxpiZ/m4AVeouF+gPGv+NV5zgvD
NGWC0MdklIJ4xaC99NEeP6kBhz0M74VKymFCTeHkVg9m4TqDepNvitKed0qagw19
uw8seei7TOTm4o117+l55NHmyfTHXFO4U0WLTJyeZcoEnUs0rOcHeqyy0RwCBMrK
W2fJtdBLFr+tBIIrID4TnPhhYtSvIPjz+FpiRDobqhgvMva/PIvLGTWK4unrgIjG
ZQ7YGnwWda8GjqKhgZacn/BSXyJzOAF9hJp0mz2ORaOxaMarEV55duiZufCvGuZw
uUQWRCKuQX7Oi05i9jXp
=fCeF
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking update from Jeff Layton:
"A couple of fixes for a patch that went into v4.14, and the bug report
just came in a few days ago.. It passes my (minimal) testing, and has
been in linux-next for a few days now.
I also would like to get my address changed in MAINTAINERS to clear
that hurdle"
* tag 'locks-v4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
fcntl: don't leak fd reference when fixup_compat_flock fails
MAINTAINERS: s/jlayton@poochiereds.net/jlayton@kernel.org/
Pull compat and uaccess updates from Al Viro:
- {get,put}_compat_sigset() series
- assorted compat ioctl stuff
- more set_fs() elimination
- a few more timespec64 conversions
- several removals of pointless access_ok() in places where it was
followed only by non-__ variants of primitives
* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
coredump: call do_unlinkat directly instead of sys_unlink
fs: expose do_unlinkat for built-in callers
ext4: take handling of EXT4_IOC_GROUP_ADD into a helper, get rid of set_fs()
ipmi: get rid of pointless access_ok()
pi433: sanitize ioctl
cxlflash: get rid of pointless access_ok()
mtdchar: get rid of pointless access_ok()
r128: switch compat ioctls to drm_ioctl_kernel()
selection: get rid of field-by-field copyin
VT_RESIZEX: get rid of field-by-field copyin
i2c compat ioctls: move to ->compat_ioctl()
sched_rr_get_interval(): move compat to native, get rid of set_fs()
mips: switch to {get,put}_compat_sigset()
sparc: switch to {get,put}_compat_sigset()
s390: switch to {get,put}_compat_sigset()
ppc: switch to {get,put}_compat_sigset()
parisc: switch to {get,put}_compat_sigset()
get_compat_sigset()
get rid of {get,put}_compat_itimerspec()
io_getevents: Use timespec64 to represent timeouts
...
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs, and
after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJaDayXAAoJEED/6hsPKofo/3UH/3HvlcHt+ADTkCU1/iiKAs+i
0zngIOXIxgHDnV0ww6bV+Znww0BzTYgKCAXX76z603jdpDwG/pzQQcbLDF5ZoJnD
sQtF10gZinWaRsHlfbLqjrHGL2pGDHO1UKBKLJ0bAIyORPZBxs7i+VmrY/blnr9c
0wsybJ8RbvwAxjsDL5jeX/z4NehPupmKUc4Lf0eZdSHwVOf9sjn+MP6jJ0r2JcIb
D+zddPBiLStzN97t4gZpQsrlj3LKrDS+6hY+1TjSvlh+yHKFVFh58VhLm4DuDeb5
bYOAlWJ/gAWEzfvr5Ld+Nd7SqWWn/14logPkQ4gcU4BI/neAOzk4c6hJfCHl1nk=
=593n
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.15
Common:
- Python 3 support in kvm_stat
- Accounting of slabs to kmemcg
ARM:
- Optimized arch timer handling for KVM/ARM
- Improvements to the VGIC ITS code and introduction of an ITS reset
ioctl
- Unification of the 32-bit fault injection logic
- More exact external abort matching logic
PPC:
- Support for running hashed page table (HPT) MMU mode on a host that
is using the radix MMU mode; single threaded mode on POWER 9 is
added as a pre-requisite
- Resolution of merge conflicts with the last second 4.14 HPT fixes
- Fixes and cleanups
s390:
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
x86:
- Improved emulation of LAPIC timer mode changes, MCi_STATUS MSRs,
and after-reset state
- Refined dependencies for VMX features
- Fixes for nested SMI injection
- A lot of cleanups"
* tag 'kvm-4.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (89 commits)
KVM: s390: provide a capability for AIS state migration
KVM: s390: clear_io_irq() requests are not expected for adapter interrupts
KVM: s390: abstract conversion between isc and enum irq_types
KVM: s390: vsie: use common code functions for pinning
KVM: s390: SIE considerations for AP Queue virtualization
KVM: s390: document memory ordering for kvm_s390_vcpu_wakeup
KVM: PPC: Book3S HV: Cosmetic post-merge cleanups
KVM: arm/arm64: fix the incompatible matching for external abort
KVM: arm/arm64: Unify 32bit fault injection
KVM: arm/arm64: vgic-its: Implement KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: Document KVM_DEV_ARM_ITS_CTRL_RESET
KVM: arm/arm64: vgic-its: Free caches when GITS_BASER Valid bit is cleared
KVM: arm/arm64: vgic-its: New helper functions to free the caches
KVM: arm/arm64: vgic-its: Remove kvm_its_unmap_device
arm/arm64: KVM: Load the timer state when enabling the timer
KVM: arm/arm64: Rework kvm_timer_should_fire
KVM: arm/arm64: Get rid of kvm_timer_flush_hwstate
KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit
KVM: arm/arm64: Move phys_timer_emulate function
KVM: arm/arm64: Use kvm_arm_timer_set/get_reg for guest register traps
...
Remove unused parameter from the call function,
which I accidentally added.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The hardware sampler creates samples that are processed at a later
point in time. The PID and TID values of the perf samples that are
created for hardware samples are initialized with values from the
current task. Hence, the PID and TID values are not correct and
perf samples are associated with wrong processes.
The PID and TID values are obtained from the Host Program Parameter
(HPP) field in the basic-sampling data entries. These PIDs are
valid in the init PID namespace. Ensure that the PIDs in the perf
samples are resolved considering the PID namespace in which the
perf event was created.
To correct the PID and TID values in the created perf samples,
a special overflow handler is installed. It replaces the default
overflow handler and does not become effective if any other
overflow handler is used. With the special overflow handler most
of the perf samples are associated with the right processes.
For processes, that are no longer exist, the association might
still be wrong.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The lpp instruction is used to place the PID of the current
task in the program-parameter (PP) register. The register
contents is then included in the sampling data entries.
The lpp instruction loads the PP register only when at least
one sampling function is enabled. Otherwise it is executed
as a no-op.
Linux calls lpp at context switch. If the context switch
happens before the sampler is enabled, the PP register is
empty. That means, the PID of the task that is sampled is
not stored in sampling data until the next context switch.
Hence, always call lpp when enabling the sampler.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Extend the perf register support to also export floating-point register
contents for user space tasks. Floating-point registers might be used
in leaf functions to contain the return address. Hence, they are required
for proper DWARF unwinding.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add s390 support to dump user stack to user space for DWARF
stack unwinding.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Previously, the cpum_sf PMU was registered even if there is no
sampling mode authorized. Add a check and register cpum_sf only
at least one sampling mode is authorized.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Raw sample was implemented to export the diagnostic samples.
With having this achieved with AUX buffers, there is no requirement
for basic samples to export raw data. In particular, most basic
sampling information are consumed for creating the perf event sample.
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Modify PMU callback to use AUX buffer for diagnostic mode sampling.
Basic-mode sampling still use orignal way.
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Current implementation uses a private buffer for cpumf to dump samples.
Samples first go to this buffer. Then copy to ring buffer allocated
by perf core. With AUX buffer, this copy is not needed. AUX buffer is
shared and zero-copy mapped to user space. The trailer information at
the end of each SDB(sample data block) is also exported to user space.
AUX buffer is used when diagnostic sampling mode is enabled.
This patch contains functions to setup/free AUX buffer or to begin/end
sampling per-cpu. Also include function called in interrupt to
collect samples.
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
- merge of the sthyi tree from the base s390 team, which moves the sthyi
out of KVM into a shared function also for non-KVM
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJaBHkvAAoJEBF7vIC1phx84nwP/AvR7yRRqdeWvpZ+T9hiwscR
p7AY5jnbVun7QtqR3yK+Z0IuZzU3gWheDNB4ZPegLLgzxN+ge4C45cZbpKJZUYXf
Fef8kdXs7Agi6oRU+xXKgYipot4g3VBdRGrfktUMiYD/LC7WpDlJybF0UW45FCKk
ECBumYXQ+6Jo5pplF3VH8XUwZb1IK3+//WaMLOToYyYuijCpcE0KfoKqCMrc39CU
GMQVq87IXnhKFeIAt4upiaXXHK/0mGBHdkOG6ILwdNMDfCDiBgynU4HALz+wf/IE
cvukxsTbHWDcgHMznBgDmnmb4DiWqahtatqGpXEpzzabgjIfkHKagYlaRb+VwwNm
vIDhm18Wq66nnmrNkpwbn1SB2OoOh7ug+YLrc3XTnGnSWWfZGNmrgOj7K9xTE9yZ
VmQReOT70KB/DYqZDVAlgNFqWA6MyQxcdxKDaLAqGWt2TN/uXrbdBd8Aa4AF1mtb
V8aFchFPauuj60cqG91NnKVVdVQ1qB+ZG7AzKo9BPa7iCH8IW5UlPL4FXz3lKdDu
/KWfL/nUFu8hPSmZijdHRKwdekXHfadHXsvGLVSsLxe/DYWaWTAgnwWceUAXUnvK
ZUDVKUT9S30dIZwbxSLoou03Bu8WZVefxd6/wOi3g2BYRrqZN+w0lhkWIQwKdMo4
VUP8fkMR9p6cAnitHh3i
=9mW5
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: fixes and improvements for 4.15
- Some initial preparation patches for exitless interrupts and crypto
- New capability for AIS migration
- Fixes
- merge of the sthyi tree from the base s390 team, which moves the sthyi
out of KVM into a shared function also for non-KVM
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup. The trick is to allocate a lot of PUD page tables. We don't
account PUD page tables, only PMD and PTE.
We already addressed the same issue for PMD page tables, see commit
dc6c9a35b6 ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.
The patch expands accounting to PUD level.
[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.
Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.
With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
Fixes: 94073ad77f (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
- turn dma_cache_sync into a dma_map_ops instance and remove
implementation that purely are dead because the architecture
doesn't support noncoherent allocations
- add a flag for busses that need DMA configuration (Robin Murphy)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSrYLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOMuQ//XXD94uNPYavrgXzGsAtg+I+LEm+xyk4T0dX5fxfj
amXX49MHoGemjsBgzJlkQMMFqwDEdkKyEuFnEuy6OeowYCyD6zW0MJ3MwP9OosNJ
PNTdGZIfSvxPYEW8cR9AdK3iQ2loMBZnYhd+O/oVjSugULLW2DNa7r2VRktcCKoh
8Ob/8gL6Y9xEYJBRszhrBwKTa/hU8IThxxozBFzN7I3LIKyFboSTcwXGLAHow43g
4anCTjWTaDcoU2JwY6UTRKRRTV+gD0ZRcsZfd8lNNb5rtMVZkBVOHbF14SMAmw1r
kSgRcU3+WIFPhK/8wBYqtGZZGnOgFBTHVeqow3AdS728pBWlWl8niTK0DiIgCd3m
qzScF6SqfN1bCZkZAy8FUV2l0DPYKS6lvyNkf00Eb2W/f6LEqAcjCi2QDDxRfaw+
Vm97nPUiM+uXNy/6KtAy6ChdprSqx12/edXPp7Y3H2rS/+Dmr6exeix+wb7QUN8W
JI7ZRHo4JLaJZk/XrZtGX/6jnN1Jo7vfApQOmYDY7kE1iGtOU/LQQj8gcZRVQxML
4soN6ivSmZX2k03LabWHpYQ8QiyCSYChLC+Az7rQH47LDLeu1IdTJu6orpXpaxyo
ymzEWlHbmF7mE66X4g/Up/eAYk2YLUA3rKLGVjAIaWDBzHftSFg5EaAnqMADC1G2
hSo=
=ALJf
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- turn dma_cache_sync into a dma_map_ops instance and remove
implementation that purely are dead because the architecture doesn't
support noncoherent allocations
- add a flag for busses that need DMA configuration (Robin Murphy)
* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: turn dma_cache_sync into a dma_map_ops method
sh: make dma_cache_sync a no-op
xtensa: make dma_cache_sync a no-op
unicore32: make dma_cache_sync a no-op
powerpc: make dma_cache_sync a no-op
mn10300: make dma_cache_sync a no-op
microblaze: make dma_cache_sync a no-op
ia64: make dma_cache_sync a no-op
frv: make dma_cache_sync a no-op
x86: make dma_cache_sync a no-op
floppy: consolidate the dummy fd_cacheflush definition
drivers: flag buses which demand DMA configuration
Remove the CPU_ALTERNATIVES config option and enable the code
unconditionally. The config option was only added to avoid a conflict
with the named saved segment support. Since that code is gone there is
no reason to keep the CPU_ALTERNATIVES config option.
Just enable it unconditionally to also reduce the number of config
options and make it less likely that something breaks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
sparse says:
arch/s390/kernel/vdso.c:150:18:
warning: symbol 'boot_vdso_data' was not declared. Should it be static?
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This change fixes the following warning:
warning: (KCOV) selects GCC_PLUGINS which has unmet direct dependencies
(HAVE_GCC_PLUGINS && !COMPILE_TEST)
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Inline assembly code changed in this patch should really use "Q"
constraint "Memory reference without index register and with short
displacement". The kernel does not compile with kasan support enabled
otherwise (due to stack instrumentation).
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The vdso code for the getcpu() and the clock_gettime() call use the access
register mode to access the per-CPU vdso data page with the current code.
An alternative to the complicated AR mode is to use the secondary space
mode. This makes the vdso faster and quite a bit simpler. The downside is
that the uaccess code has to be changed quite a bit.
Which instructions are used depends on the machine and what kind of uaccess
operation is requested. The instruction dictates which ASCE value needs
to be loaded into %cr1 and %cr7.
The different cases:
* User copy with MVCOS for z10 and newer machines
The MVCOS instruction can copy between the primary space (aka user) and
the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
loaded in %cr1.
* User copy with MVCP/MVCS for older machines
To be able to execute the MVCP/MVCS instructions the kernel needs to
switch to primary mode. The control register %cr1 has to be set to the
kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
on set_fs(KERNEL_DS) vs set_fs(USER_DS).
* Data access in the user address space for strnlen / futex
To use "normal" instruction with data from the user address space the
secondary space mode is used. The kernel needs to switch to primary mode,
%cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
kernel ASCE, dependent on set_fs.
To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
tries to be lazy about it. E.g. for multiple user copies in a row with
MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
done only once. On return to user space a CPU bit is checked that loads the
vdso ASCE again.
To enable and disable the data access via the secondary space two new
functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
that a context is in secondary space uaccess mode is stored in the
mm_segment_t value for the task. The code of an interrupt may use set_fs
as long as it returns to the previous state it got with get_fs with another
call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
set_fs with the current mm_segment_t value for the task.
For CPUs with MVCOS:
CPU running in | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space | user | vdso |
kernel, USER_DS, normal-mode | user | vdso |
kernel, USER_DS, normal-mode, lazy | user | user |
kernel, USER_DS, sacf-mode | kernel | user |
kernel, KERNEL_DS, normal-mode | kernel | vdso |
kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel |
kernel, KERNEL_DS, sacf-mode | kernel | kernel |
For CPUs without MVCOS:
CPU running in | %cr1 ASCE | %cr7 ASCE |
--------------------------------------|-----------|-----------|
user space | user | vdso |
kernel, USER_DS, normal-mode | user | vdso |
kernel, USER_DS, normal-mode lazy | kernel | user |
kernel, USER_DS, sacf-mode | kernel | user |
kernel, KERNEL_DS, normal-mode | kernel | vdso |
kernel, KERNEL_DS, normal-mode, lazy | kernel | kernel |
kernel, KERNEL_DS, sacf-mode | kernel | kernel |
The lines with "lazy" refer to the state after a copy via the secondary
space with a delayed reload of %cr1 and %cr7.
There are three hardware address spaces that can cause a DAT exception,
primary, secondary and home space. The exception can be related to
four different fault types: user space fault, vdso fault, kernel fault,
and the gmap faults.
Dependent on the set_fs state and normal vs. sacf mode there are a number
of fault combinations:
1) user address space fault via the primary ASCE
2) gmap address space fault via the primary ASCE
3) kernel address space fault via the primary ASCE for machines with
MVCOS and set_fs(KERNEL_DS)
4) vdso address space faults via the secondary ASCE with an invalid
address while running in secondary space in problem state
5) user address space fault via the secondary ASCE for user-copy
based on the secondary space mode, e.g. futex_ops or strnlen_user
6) kernel address space fault via the secondary ASCE for user-copy
with secondary space mode with set_fs(KERNEL_DS)
7) kernel address space fault via the primary ASCE for user-copy
with secondary space mode with set_fs(USER_DS) on machines without
MVCOS.
8) kernel address space fault via the home space ASCE
Replace user_space_fault() with a new function get_fault_type() that
can distinguish all four different fault types.
With these changes the futex atomic ops from the kernel and the
strnlen_user will get a little bit slower, as well as the old style
uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
fast as before. On the positive side, the user space vdso code is a
lot faster and Linux ceases to use the complicated AR mode.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The identification of guest fault currently relies on the PF_VCPU flag.
This is set in guest_entry_irqoff and cleared in guest_exit_irqoff.
Both functions are called by __vcpu_run, the PF_VCPU flag is set for
quite a lot of kernel code outside of the guest execution.
Replace the PF_VCPU scheme with the PIF_GUEST_FAULT in the pt_regs and
make the program check handler code in entry.S set the bit only for
exception that occurred between the .Lsie_gmap and .Lsie_done labels.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Pull timer updates from Thomas Gleixner:
"Yet another big pile of changes:
- More year 2038 work from Arnd slowly reaching the point where we
need to think about the syscalls themself.
- A new timer function which allows to conditionally (re)arm a timer
only when it's either not running or the new expiry time is sooner
than the armed expiry time. This allows to use a single timer for
multiple timeout requirements w/o caring about the first expiry
time at the call site.
- A new NMI safe accessor to clock real time for the printk timestamp
work. Can be used by tracing, perf as well if required.
- A large number of timer setup conversions from Kees which got
collected here because either maintainers requested so or they
simply got ignored. As Kees pointed out already there are a few
trivial merge conflicts and some redundant commits which was
unavoidable due to the size of this conversion effort.
- Avoid a redundant iteration in the timer wheel softirq processing.
- Provide a mechanism to treat RTC implementations depending on their
hardware properties, i.e. don't inflict the write at the 0.5
seconds boundary which originates from the PC CMOS RTC to all RTCs.
No functional change as drivers need to be updated separately.
- The usual small updates to core code clocksource drivers. Nothing
really exciting"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
timers: Add a function to start/reduce a timer
pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
timer: Prepare to change all DEFINE_TIMER() callbacks
netfilter: ipvs: Convert timers to use timer_setup()
scsi: qla2xxx: Convert timers to use timer_setup()
block/aoe: discover_timer: Convert timers to use timer_setup()
ide: Convert timers to use timer_setup()
drbd: Convert timers to use timer_setup()
mailbox: Convert timers to use timer_setup()
crypto: Convert timers to use timer_setup()
drivers/pcmcia: omap1: Fix error in automated timer conversion
ARM: footbridge: Fix typo in timer conversion
drivers/sgi-xp: Convert timers to use timer_setup()
drivers/pcmcia: Convert timers to use timer_setup()
drivers/memstick: Convert timers to use timer_setup()
drivers/macintosh: Convert timers to use timer_setup()
hwrng/xgene-rng: Convert timers to use timer_setup()
auxdisplay: Convert timers to use timer_setup()
sparc/led: Convert timers to use timer_setup()
mips: ip22/32: Convert timers to use timer_setup()
...
Pull core locking updates from Ingo Molnar:
"The main changes in this cycle are:
- Another attempt at enabling cross-release lockdep dependency
tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
with better performance and fewer false positives. (Byungchul Park)
- Introduce lockdep_assert_irqs_enabled()/disabled() and convert
open-coded equivalents to lockdep variants. (Frederic Weisbecker)
- Add down_read_killable() and use it in the VFS's iterate_dir()
method. (Kirill Tkhai)
- Convert remaining uses of ACCESS_ONCE() to
READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
driven. (Mark Rutland, Paul E. McKenney)
- Get rid of lockless_dereference(), by strengthening Alpha atomics,
strengthening READ_ONCE() with smp_read_barrier_depends() and thus
being able to convert users of lockless_dereference() to
READ_ONCE(). (Will Deacon)
- Various micro-optimizations:
- better PV qspinlocks (Waiman Long),
- better x86 barriers (Michael S. Tsirkin)
- better x86 refcounts (Kees Cook)
- ... plus other fixes and enhancements. (Borislav Petkov, Juergen
Gross, Miguel Bernal Marin)"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
rcu: Use lockdep to assert IRQs are disabled/enabled
netpoll: Use lockdep to assert IRQs are disabled/enabled
timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
irq_work: Use lockdep to assert IRQs are disabled/enabled
irq/timings: Use lockdep to assert IRQs are disabled/enabled
perf/core: Use lockdep to assert IRQs are disabled/enabled
x86: Use lockdep to assert IRQs are disabled/enabled
smp/core: Use lockdep to assert IRQs are disabled/enabled
timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
timers/nohz: Use lockdep to assert IRQs are disabled/enabled
workqueue: Use lockdep to assert IRQs are disabled/enabled
irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
locking/pvqspinlock: Implement hybrid PV queued/unfair locks
locking/rwlocks: Fix comments
x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
...
Pull s390 updates from Heiko Carstens:
"Since Martin is on vacation you get the s390 pull request for the
v4.15 merge window this time from me.
Besides a lot of cleanups and bug fixes these are the most important
changes:
- a new regset for runtime instrumentation registers
- hardware accelerated AES-GCM support for the aes_s390 module
- support for the new CEX6S crypto cards
- support for FORTIFY_SOURCE
- addition of missing z13 and new z14 instructions to the in-kernel
disassembler
- generate opcode tables for the in-kernel disassembler out of a
simple text file instead of having to manually maintain those
tables
- fast memset16, memset32 and memset64 implementations
- removal of named saved segment support
- hardware counter support for z14
- queued spinlocks and queued rwlocks implementations for s390
- use the stack_depth tracking feature for s390 BPF JIT
- a new s390_sthyi system call which emulates the sthyi (store
hypervisor information) instruction
- removal of the old KVM virtio transport
- an s390 specific CPU alternatives implementation which is used in
the new spinlock code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits)
MAINTAINERS: add virtio-ccw.h to virtio/s390 section
s390/noexec: execute kexec datamover without DAT
s390: fix transactional execution control register handling
s390/bpf: take advantage of stack_depth tracking
s390: simplify transactional execution elf hwcap handling
s390/zcrypt: Rework struct ap_qact_ap_info.
s390/virtio: remove unused header file kvm_virtio.h
s390: avoid undefined behaviour
s390/disassembler: generate opcode tables from text file
s390/disassembler: remove insn_to_mnemonic()
s390/dasd: avoid calling do_gettimeofday()
s390: vfio-ccw: Do not attempt to free no-op, test and tic cda.
s390: remove named saved segment support
s390/archrandom: Reconsider s390 arch random implementation
s390/pci: do not require AIS facility
s390/qdio: sanitize put_indicator
s390/qdio: use atomic_cmpxchg
s390/nmi: avoid using long-displacement facility
s390: pass endianness info to sparse
s390/decompressor: remove informational messages
...
Rebooting into a new kernel with kexec fails (system dies) if tried on
a machine that has no-execute support. Reason for this is that the so
called datamover code gets executed with DAT on (MMU is active) and
the page that contains the datamover is marked as non-executable.
Therefore when branching into the datamover an unexpected program
check happens and afterwards the machine is dead.
This can be simply avoided by disabling DAT, which also disables any
no-execute checks, just before the datamover gets executed.
In fact the first thing done by the datamover is to disable DAT. The
code in the datamover that disables DAT can be removed as well.
Thanks to Michael Holzheu and Gerald Schaefer for tracking this down.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes: 57d7f939e7 ("s390: add no-execute support")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan@danny.cz>
Fixes: d35339a42d ("s390: add support for transactional memory")
Cc: <stable@vger.kernel.org> # v3.7+
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Make use of the "stack_depth" tracking feature introduced with
commit 8726679a0f ("bpf: teach verifier to track stack depth") for the
s390 JIT, so that stack usage can be reduced.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Just use MACHINE_HAS_TE to decide if HWCAP_S390_TE needs
to be added to elf_hwcap.
Suggested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The AIS capability was introduced in 4.12, while the interface to
migrate the state was added in 4.13. Unfortunately it is not possible
for userspace to detect the migration capability without creating a flic
kvm device. As in QEMU the cpu model detection runs on the "none"
machine this will result in cpu model issues regarding the "ais"
capability.
To get the "ais" capability properly let's add a new KVM capability that
tells userspace that AIS states can be migrated.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
With commit 7fb2b2d512 ("s390/virtio: remove the old KVM virtio
transport") the pre-ccw virtio transport for s390 was removed. To
complete the removal the uapi header file that contains the related data
structures must also be removed.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
There is a chance to delete not yet delivered I/O interrupts if an
exploiter uses the subsystem identification word 0x0000 while
processing a KVM_DEV_FLIC_CLEAR_IO_IRQ ioctl. -EINVAL will be returned
now instead in that case.
Classic interrupts will always have bit 0x10000 set in the schid while
adapter interrupts have a zero schid. The clear_io_irq interface is
only useful for classic interrupts (as adapter interrupts belong to
many devices). Let's make this interface more strict and forbid a schid
of 0.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The abstraction of the conversion between an isc value and an irq_type
by means of functions isc_to_irq_type() and irq_type_to_isc() allows
to clarify the respective operations where used.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We will not see -ENOMEM (gfn_to_hva() will return KVM_ERR_PTR_BAD_PAGE
for all errors). So we can also get rid of special handling in the
callers of pin_guest_page() and always assume that it is a g2 error.
As also kvm_s390_inject_program_int() should never fail, we can
simplify pin_scb(), too.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170901151143.22714-1-david@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The Crypto Control Block (CRYCB) is referenced by the SIE state
description and controls KVM guest access to the Adjunct
Processor (AP) adapters, usage domains and control domains.
This patch defines the AP control blocks to be used for
controlling guest access to the AP adapters and domains.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Message-Id: <1507916344-3896-2-git-send-email-akrowiak@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
swait_active does not enforce any ordering and it can therefore trigger
some subtle races when the CPU moves the read for the check before a
previous store and that store is then used on another CPU that is
preparing the swait.
On s390 there is a call to swait_active in kvm_s390_vcpu_wakeup. The
good thing is, on s390 all potential races cannot happen because all
callers of kvm_s390_vcpu_wakeup do not store (no race) or use an atomic
operation, which handles memory ordering. Since this is not guaranteed
by the Linux semantics (but by the implementation on s390) let's add
smp_mb_after_atomic to make this obvious and document the ordering.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
At a couple of places smatch emits warnings like this:
arch/s390/mm/vmem.c:409 vmem_map_init() warn:
right shifting more than type allows
In fact shifting a signed type right is undefined. Avoid this and add
an unsigned long cast. The shifted values are always positive.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The current way of adding new instructions to the opcode tables is
painful and error prone. Therefore add, similar to binutils, a text
file which contains all opcodes and the corresponding mnemonics and
instruction formats.
A small gen_opcode_table tool then generates a header file with the
required enums and opcode table initializers at the prepare step of
the kernel build.
This way only a simple text file has to be maintained, which can be
rather easily extended.
Unlike before where there were plenty of opcode tables and a large
switch statement to find the correct opcode table, there is now only
one opcode table left which contains all instructions. A second opcode
offset table now contains offsets within the opcode table to find
instructions which have the same opcode prefix. In order to save space
all 1-byte opcode instructions are grouped together at the end of the
opcode table. This is also quite similar to like it was before.
In addition also move and change code and definitions within the
disassembler. As a side effect this reduces the size required for the
code and opcode tables by ~1.5k.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
insn_to_mnemonic() was introduced ages ago for KVM debugging, but is
unused in the meantime. Therefore remove it.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Remove the support to create a z/VM named saved segment (NSS). This
feature is not supported since quite a while in favour of jump labels,
function tracing and (now) CPU alternatives. All of these features
require to write to the kernel text section which is not possible if
the kernel is contained within an NSS.
Given that memory savings are minimal if kernel images are shared and
in addition updates of shared images are painful, the NSS feature can
be removed.
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The reworked version of the random device driver now calls
the arch_get_random_* functions on a very high frequency.
It does about 100.000 calls to arch_get_random_long for
providing 10 MB via /dev/urandom. Each invocation was
fetching entropy from the hardware random generator which
has a rate limit of about 4 MB/s. As the trng invocation
waits until enough entropy is gathered, the random device
driver is slowed down dramatically.
The s390 true random generator is not designed for such
a high rate. The TRNG is more designed to be used together
with the arch_get_random_seed_* functions. This is similar
to the way how powerpc has implemented their arch random
functionality.
This patch removes the invocations of the s390 TRNG for
arch_get_random_long() and arch_get_random_int() but leaving
the invocations for arch_get_random_seed_long() and
arch_get_random_seed_int(). So the s390 arch random
implementation now contributes high quality entropy to
the kernel random device for reseeding.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
As of today QEMU does not provide the AIS facility to its guest. This
prevents Linux guests from using PCI devices as the ais facility is
checked during init. As this is just a performance optimization, we can
move the ais check into the code where we need it (calling the SIC
instruction). This is used at initialization and on interrupt. Both
places do not require any serialization, so we can simply skip the
instruction.
Since we will now get all interrupts, we can also avoid the 2nd scan.
As we can have multiple interrupts in parallel we might trigger spurious
irqs more often for the non-AIS case but the core code can handle that.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
6dVh26uchcEQLN/XqUDt
=x306
-----END PGP SIGNATURE-----
Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
"License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the
'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
binding shorthand, which can be used instead of the full boiler plate
text.
This patch is based on work done by Thomas Gleixner and Kate Stewart
and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset
of the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to
license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied
to a file was done in a spreadsheet of side by side results from of
the output of two independent scanners (ScanCode & Windriver)
producing SPDX tag:value files created by Philippe Ombredanne.
Philippe prepared the base worksheet, and did an initial spot review
of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537
files assessed. Kate Stewart did a file by file comparison of the
scanner results in the spreadsheet to determine which SPDX license
identifier(s) to be applied to the file. She confirmed any
determination that was not immediately clear with lawyers working with
the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained
>5 lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that
became the concluded license(s).
- when there was disagreement between the two scanners (one detected
a license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply
(and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases,
confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.
The Windriver scanner is based on an older version of FOSSology in
part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot
checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect
the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial
patch version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch
license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
License cleanup: add SPDX license identifier to uapi header files with a license
License cleanup: add SPDX license identifier to uapi header files with no license
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
__LC_MCESAD is currently 4528 /* offsetof(struct lowcore, mcesad) */
that would require long-displacement facility for lg, which we don't
have on z900.
Fixes: 3037a52f98 ("s390/nmi: do register validation as early as possible")
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Many user space API headers have licensing information, which is either
incomplete, badly formatted or just a shorthand for referring to the
license under which the file is supposed to be. This makes it hard for
compliance tools to determine the correct license.
Update these files with an SPDX license identifier. The identifier was
chosen based on the license information in the file.
GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
identifier with the added 'WITH Linux-syscall-note' exception, which is
the officially assigned exception identifier for the kernel syscall
exception:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
This exception makes it possible to include GPL headers into non GPL
code, without confusing license compliance tools.
Headers which have either explicit dual licensing or are just licensed
under a non GPL license are updated with the corresponding SPDX
identifier and the GPLv2 with syscall exception identifier. The format
is:
((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)
SPDX license identifiers are a legally binding shorthand, which can be
used instead of the full boiler plate text. The update does not remove
existing license information as this has to be done on a case by case
basis and the copyright holders might have to be consulted. This will
happen in a separate step.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many user space API headers are missing licensing information, which
makes it hard for compliance tools to determine the correct license.
By default are files without license information under the default
license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
them from being included in non GPLV2 code, which is obviously not
intended. The user space API headers fall under the syscall exception
which is in the kernels COPYING file:
NOTE! This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived work".
otherwise syscall usage would not be possible.
Update the files which contain no license information with an SPDX
license identifier. The chosen identifier is 'GPL-2.0 WITH
Linux-syscall-note' which is the officially assigned identifier for the
Linux syscall exception. SPDX license identifiers are a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne. See the previous patch in this series for the
methodology of how this patch was researched.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
s390 is big-endian only but sparse assumes the same endianness
as the building machine.
This is problematic for code which expect __BYTE_ORDER__ being
correctly predefined by the compiler which sparse can then
pre-process differently from what gcc would, depending on the
building machine endianness.
Fix this by letting sparse know about the architecture endianness.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The decompressor for bzImage prints two informational messages which are
not really helpful. The decompression step is fast and if something bad
happens an error message will be printed. Remove the noise.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the hardware counters that are available with z14. With z14,
the number of problem-state counters is reduced. The initialization
is updated respectively.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The new detection code for guest machine checks added a check based
on %r11 to .Lcleanup_sie to distinguish between normal asynchronous
interrupts and machine checks. But the funtion is called from the
program check handler as well with an undefined value in %r11.
The effect is that all program exceptions pointing to the SIE instruction
will set the CIF_MCCK_GUEST bit. The bit stays set for the CPU until the
next machine check comes in which will incorrectly be interpreted as a
guest machine check.
The simplest fix is to stop using .Lcleanup_sie in the program check
handler and duplicate a few instructions.
Fixes: c929500d7a ("s390/nmi: s390: New low level handling for machine check happening in guest")
Cc: <stable@vger.kernel.org> # v4.13+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The validation of the CPU registers in the machine check handler is
currently split into two parts. The first part is done at the start
of the low level mcck_int_handler function, this includes the CPU
timer register and the general purpose registers.
The second part is done a bit later in s390_do_machine_check for all
the other registers, including the control registers, floating pointer
control, vector or floating pointer registers, the access registers,
the guarded storage registers, the TOD programmable registers and the
clock comparator.
This is working fine to far but in theory a future extensions could
cause the C code to use registers that are not validated yet. A better
approach is to validate all CPU registers in "safe" assembler code
before any C function is called.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The machine check extended save area is needed to store the vector
registers and the guarded storage control block when a CPU is
interrupted by a machine check.
Move the slab cache allocation of the full save area to nmi.c,
for early boot use a static __initdata block.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The nmi.h header has some constant defines for control register bits.
These definitions should really be located in ctl_reg.h. Move and
rename the defines.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a decoding union for the bits in control registers 2 and use
'union ctlreg0' and 'union ctlreg2' in update_cr_regs to improve
readability.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The smp_send_stop() function can be called from s390_handle_damage
while DAT is off. This happens if a machine check indicates that
kernel gprs or control registers can not be restored. The function
smp_send_stop reenables DAT via __load_psw_mask. That should work
for the case of lost kernel gprs and the system will do the expected
stop of all CPUs. But if control registers are lost, in particular
CR13 with the home space ASCE, interesting secondary crashes may
occur.
Make smp_emergency_stop callable from nmi.c and remove the cpumask
argument. Replace the smp_send_stop call with smp_emergency_stop in
the s390_handle_damage function.
In addition add notrace and NOKPROBE_SYMBOL annotations for all
functions required for the emergency shutdown.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
After we removed all the dead wood it turns out only two architectures
actually implement dma_cache_sync as a real op: mips and parisc. Add
a cache_sync method to struct dma_map_ops and implement it for the
mips defualt DMA ops, and the parisc pa11 ops.
Note that arm, arc and openrisc support DMA_ATTR_NON_CONSISTENT, but
never provided a functional dma_cache_sync implementations, which
seems somewhat odd.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
The boot_vdso_data variable is related to the vdso code, the magic of the
initial vdso area for the early boot and the replacement of it in vdso_init
should all be put into vdso.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Enable niai instruction in the spinlock code at run-time for machines
on which facility 49 is available (zEC12 and newer).
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event_common memsets the active debug entry with zeros to
prevent stale data leakage. This is overwritten with the actual
debug data in the next step. Only write zeros to that part of the
debug entry that's not used by new debug data.
Micro benchmarks show a 2-10% reduction of cpu cycles with this
approach.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
debug_event currently truncates the data if used with a size larger than
the buf_size of the debug feature. For lots of callers of this function,
wrappers have been implemented that loop until all data is handled.
Move that functionality into debug_event_common and get rid of the wrappers.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Before kexec boots to a crash kernel it checks whether the image in memory
changed after load. This is done by the function kdump_csum_valid, which
returns true, i.e. an int != 0, on success and 0 otherwise. In other words
when kdump_csum_valid returns an error code it means that the validation
succeeded. This is not only counterintuitive but also produces the wrong
result if the kernel was build without CONFIG_CRASH_DUMP. Fix this by
making kdump_csum_valid return a bool.
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The debug feature code hasn't been touched in ages and the code also
looks like this. Therefore clean up the code so it looks a bit more
like current coding style.
There is no functional change - actually I made also sure that the
generated code with performance_defconfig is identical.
A diff of old vs new with "objdump -d" is empty.
The code is still not checkpatch clean, but that was not the goal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
zipl from s390-tools generates root=/dev/ram0 kernel cmdline for
zfcpdump, thus BLK_DEV_RAM is required.
zfcpdump initrd mounts DEBUG_FS, thus is also required.
Bug-Ubuntu: https://launchpad.net/bugs/1722735
Bug-Ubuntu: https://launchpad.net/bugs/1719290
Signed-off-by: Dimitri John Ledkov <xnox@ubuntu.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For an unknown reason the s390 kprobes instruction replacement
function modifies the kprobe_status of the current CPU to
KPROBE_SWAP_INST. This was supposed to catch traps that happened
during instruction patching. Such a fault is not supposed to happen,
and silently discarding such a fault is certainly also not what we
want. In fact s390 is the only architecture which has this odd piece
of code.
Just remove this and behave like all other architectures. This was
pointed out by Jens Remus.
Reported-by: Jens Remus <jremus@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
arch_{read,spin,write}_relax() are defined as cpu_relax() by the core
code, so architectures that can't do better (i.e. most of them) don't
need to bother with the dummy definitions.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Outside of the locking code itself, {read,spin,write}_can_lock() have no
users in tree. Apparmor (the last remaining user of write_can_lock()) got
moved over to lockdep by the previous patch.
This patch removes the use of {read,spin,write}_can_lock() from the
BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
lock status, and subsequently removes the unused macros altogether. They
aren't guaranteed to work in a concurrent environment and can give
incorrect results in the case of qrwlock.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Just some trivial changes like removing the extern keyword from the
header file, renaming arguments to match the man pages, and whitespace
removal.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like for the memset16/32/64 variants avoid that subsequent mvc
instructions depend on each other since that might have negative
performance impacts.
This patch is currently hardly relevant since at least gcc 7.1
generates only inline memset code and not a single memset call.
However there is no reason to not provide an optimized version
just in case gcc generates memset calls again, like it did in
the past.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use memset64 instead of the (now) open-coded variant clear_table.
Performance wise there is no difference.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide fast versions of the new memset variants. E.g. the generic
memset64 is ten times slower than the optimized version if used on a
whole page.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a syscall of s390_sthyi to implement STHYI instruction in LPAR
which reuses the implementation for KVM by Janosch Frank -
commit 95ca2cb579 ("KVM: s390: Add sthyi emulation").
STHYI(Store Hypervisor Information) is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.
For the arguments of s390_sthyi, code shall be 0 and flags is reserved for
future use, info is the output argument to store the required hypervisor
info.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
STHYI requires extensive locking in the higher hypervisors and is
very computational/memory expensive. Therefore we cache the retrieved
hypervisor info whose valid period is 1s with mutex to allow concurrent
access. rw semaphore can't benefit here due to cache line bounce.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As we need to support sthyi instruction on LPAR too, move the common code
to kernel part and kvm related code to intercept.c for better reuse.
Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We never optimized our rwsem inline assemblies to make use of the new
atomic instructions. The generic rwsem implementation implicitly makes
use of the new instructions, since it implements the required rwsem
primitives with atomic operations, which we did optimize.
However even when compiling for old architectures the generic variant
still generates better code. So it's time to simply remove our old
code and switch to the generic implementation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This instruction came with a z/VM extension and not with a specific
machine generation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove a couple of instructions that are listed twice.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The e7 opcode table does not have an end marker. Hence when trying to
find an unknown e7 instruction the code will access memory behind the
table until it finds something that matches the opcode, or the kernel
crashes, whatever comes first.
This affects not only the in-kernel disassembler but also uprobes and
kprobes which refuse to set a probe on unknown instructions, and
therefore search the opcode tables to figure out if instructions are
known or not.
Cc: <stable@vger.kernel.org> # v3.18+
Fixes: 3585cb0280 ("s390/disassembler: add vector instructions")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is no recent user space application available anymore which still
supports this old virtio transport. Additionally, commit 3b2fbb3f06
("virtio/s390: deprecate old transport") introduced a deprecation message
in the driver, and apparently nobody complained so far that it is still
required. So let's simply remove it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When grouping devices, the ccwgroup core only checks whether all of the
devices are bound to the same ccw_driver. It has no means of checking
if the requesting ccwgroup driver actually supports this device type.
qeth implements its own device matching in qeth_core_probe_device(),
while ctcm and lcs currently have no sanity-checking at all.
Enable ccwgroup drivers to optionally defer the device type checking to
the ccwgroup core, by specifying their supported ccw_driver.
This allows us drop the device type matching from qeth, and improves
the robustness of ctcm and lcs.
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces gcm(aes) support into the aes_s390 kernel module.
Signed-off-by: Patrick Steuer <patrick.steuer@de.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Like the common queued rwlock code the s390 implementation uses the
queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
the queueing. The encoding of the rwlock_t differs though, the counter
field in the rwlock_t is split into two parts. The upper two bytes hold
the write bit and the write wait counter, the lower two bytes hold the
read counter.
The arch_read_lock operation works exactly like the common qrwlock but
the enqueue operation for a writer follows a diffent logic. After the
failed inline try to get the rwlock in write, the writer first increases
the write wait counter, acquires the wait spin_lock for the queueing,
and then loops until there are no readers and the write bit is zero.
Without the write wait counter a CPU that just released the rwlock
could immediately reacquire the lock in the inline code, bypassing all
outstanding read and write waiters. For s390 this would cause massive
imbalances in favour of writers in case of a contended rwlock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.
The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.
The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:
lhi %r0,0 # 0 indicates an empty lock
l %r1,0x3a0 # CPU number + 1 from lowcore
cs %r0,%r1,<some_lock> # lock operation
jnz call_wait # on failure call wait function
locked:
...
call_wait:
la %r2,<some_lock>
brasl %r14,arch_spin_lock_wait
j locked
A spin_unlock is as simple as before:
lhi %r0,0
sth %r0,2(%r2) # unlock operation
After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.
To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.
While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code will come out simpler if the encoding of
the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The topology information returned by STSI 15.x.x contains a flag
if the CPUs of a topology-list are dedicated or shared. Make this
information available if the machine provides topology information.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Paul Burton reported that the nr_cpumask_bits check
within cpumsf_pmu_event_init() is not necessary.
Actually there is already a prior check within
perf_event_alloc(). Therefore remove the check.
Reported-by: Paul Burton <paul.burton@imgtec.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add runtime instrumention register get and set which allows to read
and modify the runtime instrumention control block.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Update runtime_instr_cb structure to be consistent with the runtime
instrumentation documentation.
Signed-off-by: Alice Frosi <alice@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is the quite trivial backend for s390 which is required to enable
FORTIFY_SOURCE support.
See commit 6974f0c455 ("include/linux/string.h: add the option of
fortified string.h functions") for more details.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
exit_thread() is empty now. Therefore remove it and get rid of a
pointless branch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Free data structures required for guarded storage from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the guarded storage regset for current is supposed to be changed,
the regset from user space is copied directly into the guarded storage
control block.
If then the process gets scheduled away while the control block is
being copied and before the new control block has been loaded, the
result is random: the process can be scheduled away due to a page
fault or preemption. If that happens the already copied parts will be
overwritten by save_gs_cb(), called from switch_to().
Avoid this by copying the data to a temporary buffer on the stack and
do the actual update with preemption disabled.
Fixes: f5bbd72198 ("s390/ptrace: guarded storage regset for the current task")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For PREEMPT enabled kernels the guarded storage (GS) code contains a
possible use-after-free bug. If a task that makes use of GS exits, it
will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via exit_thread().
If exit_thread_gs() gets preempted after the GS control block of the
task has been freed but before the pointer to it is set to NULL, then
save_gs_cb(), called from switch_to(), will write to already freed
memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: 916cda1aa1 ("s390: add a system call for guarded storage")
Cc: <stable@vger.kernel.org> # v4.12+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Free data structures required for runtime instrumentation from
arch_release_task_struct(). This allows to simplify the code a bit,
and also makes the semantics a bit easier: arch_release_task_struct()
is never called from the task that is being removed.
In addition this allows to get rid of exit_thread() in a later patch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For PREEMPT enabled kernels the runtime instrumentation (RI) code
contains a possible use-after-free bug. If a task that makes use of RI
exits, it will execute do_exit() while still enabled for preemption.
That function will call exit_thread_runtime_instr() via
exit_thread(). If exit_thread_runtime_instr() gets preempted after the
RI control block of the task has been freed but before the pointer to
it is set to NULL, then save_ri_cb(), called from switch_to(), will
write to already freed memory.
Avoid this and simply disable preemption while freeing the control
block and setting the pointer to NULL.
Fixes: e4b8b3f33f ("s390: add support for runtime instrumentation")
Cc: <stable@vger.kernel.org> # v3.7+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
release_thread() is an empty function that gets called on every task
exit. Move the function to a header file and force inlining of it, so
that the compiler can optimize it away instead of generating a
pointless function call.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add a new sysctl file /proc/sys/s390/topology which displays if
topology is on (1) or off (0) as specified by the "topology=" kernel
parameter.
This allows to change topology information during runtime and
configuring it via /etc/sysctl.conf instead of using the kernel line
parameter.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.
Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.
For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.
In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:
Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.
On machines which provide topology information specifying
"topology=on" does not have any effect.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The check for the _SEGMENT_ENTRY_PROTECT bit in gup_huge_pmd() is the
wrong way around. It must not be set for write==1, and not be checked for
write==0. Fix this similar to how it was fixed for ptes long time ago in
commit 25591b0703 ("[S390] fix get_user_pages_fast").
One impact of this bug would be unnecessarily using the gup slow path for
write==0 on r/w mappings. A potentially more severe impact would be that
gup_huge_pmd() will succeed for write==1 on r/o mappings.
Cc: <stable@vger.kernel.org>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
inadvertently changed the behavior of pmdp_invalidate(), so that it now
clears the pmd instead of just marking it as invalid. Fix this by restoring
the original behavior.
A possible impact of the misbehaving pmdp_invalidate() would be the
MADV_DONTNEED races (see commits ced10803 and 58ceeb6b), although we
should not have any negative impact on the related dirty/young flags,
since those flags are not set by the hardware on s390.
Fixes: 227be799c3 ("s390/mm: uninline pmdp_xxx functions from pgtable.h")
Cc: <stable@vger.kernel.org> # v4.6+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A per-thread event could not be created correctly like below:
perf record --per-thread -e rB0000 -- sleep 1
Error:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (rB0000).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
This bug was introduced by:
commit c311c79799
Author: Alexey Dobriyan <adobriyan@gmail.com>
Date: Mon May 8 15:56:15 2017 -0700
cpumask: make "nr_cpumask_bits" unsigned
If a per-thread event is not attached to any CPU, the cpu field
in struct perf_event is -1. The above commit converts the CPU number
to unsigned int, which result in an illegal CPU number.
Fixes: c311c79799 ("cpumask: make "nr_cpumask_bits" unsigned")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Pu Hou <bjhoupu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull more s390 updates from Martin Schwidefsky:
"The second patch set for the 4.14 merge window:
- Convert the dasd device driver to the blk-mq interface.
- Provide three zcrypt interfaces for vfio_ap. These will be required
for KVM guest access to the crypto cards attached via the AP bus.
- A couple of memory management bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/dasd: blk-mq conversion
s390/mm: use a single lock for the fields in mm_context_t
s390/mm: fix race on mm->context.flush_mm
s390/mm: fix local TLB flushing vs. detach of an mm address space
s390/zcrypt: externalize AP queue interrupt control
s390/zcrypt: externalize AP config info query
s390/zcrypt: externalize test AP queue
s390/mm: use VM_BUG_ON in crst_table_[upgrade|downgrade]
Pull namespace updates from Eric Biederman:
"Life has been busy and I have not gotten half as much done this round
as I would have liked. I delayed it so that a minor conflict
resolution with the mips tree could spend a little time in linux-next
before I sent this pull request.
This includes two long delayed user namespace changes from Kirill
Tkhai. It also includes a very useful change from Serge Hallyn that
allows the security capability attribute to be used inside of user
namespaces. The practical effect of this is people can now untar
tarballs and install rpms in user namespaces. It had been suggested to
generalize this and encode some of the namespace information
information in the xattr name. Upon close inspection that makes the
things that should be hard easy and the things that should be easy
more expensive.
Then there is my bugfix/cleanup for signal injection that removes the
magic encoding of the siginfo union member from the kernel internal
si_code. The mips folks reported the case where I had used FPE_FIXME
me is impossible so I have remove FPE_FIXME from mips, while at the
same time including a return statement in that case to keep gcc from
complaining about unitialized variables.
I almost finished the work to get make copy_siginfo_to_user a trivial
copy to user. The code is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3
But I did not have time/energy to get the code posted and reviewed
before the merge window opened.
I was able to see that the security excuse for just copying fields
that we know are initialized doesn't work in practice there are buggy
initializations that don't initialize the proper fields in siginfo. So
we still sometimes copy unitialized data to userspace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
Introduce v3 namespaced file capabilities
mips/signal: In force_fcr31_sig return in the impossible case
signal: Remove kernel interal si_code magic
fcntl: Don't use ambiguous SIG_POLL si_codes
prctl: Allow local CAP_SYS_ADMIN changing exe_file
security: Use user_namespace::level to avoid redundant iterations in cap_capable()
userns,pidns: Verify the userns for new pid namespaces
signal/testing: Don't look for __SI_FAULT in userspace
signal/mips: Document a conflict with SI_USER with SIGFPE
signal/sparc: Document a conflict with SI_USER with SIGFPE
signal/ia64: Document a conflict with SI_USER with SIGFPE
signal/alpha: Document a conflict with SI_USER for SIGTRAP
Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now
tries to preserve the mappings of the kernel so that master
aborts for devices are avoided. Master aborts cause some
devices to fail in the kdump kernel, so this code makes the
dump more likely to succeed when AMD IOMMU is enabled.
- Common flush queue implementation for IOVA code users. The
code is still optional, but AMD and Intel IOMMU drivers had
their own implementation which is now unified.
- Finish support for iommu-groups. All drivers implement this
feature now so that IOMMU core code can rely on it.
- Finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- New functions in the IOMMU-API for explicit IO/TLB flushing.
This will help to reduce the number of IO/TLB flushes when
IOMMU drivers support this interface.
- Support for mt2712 in the Mediatek IOMMU driver
- New IOMMU driver for QCOM hardware
- System PM support for ARM-SMMU
- Shutdown method for ARM-SMMU-v3
- Some constification patches
- Various other small improvements and fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJZtCFNAAoJECvwRC2XARrjZnQP/AxC/ezQpq82HbegF4sM/cVE
Ep7TeTqodEl75FS/6txe2wU0pwodqk/LB9ajfQZUbE1w8vKsNEqi5qf4ZYHGoxYI
5bWyjJBzKIlwENH5lsBpQNt6XLevrYmRsFy7F0tRYy+qPQq8k+js2i7/XkCL3q7L
3xklF847RRoITaTOhhaROx1pF23dSMEsS2XGuWHcZfjORtep4wcFKzd/2SvlCWCo
P2bRU7jBzfJuuGSA80gaiUbDmrULTUfYuZNp7njASzCgsDmagERtvDEpdoXPNNSp
u6s4LjU1Dp3fgr6g6cFRO7B6JUbWd619nwo9so/c/wZN54yEngBF9EyeeF3mv2O5
ZbM2mOW3RlZcjxFT/AC8G4cZwwP6MpCEQOdqknoqc6ZQwcDqwN0o9I4+po0wsiAU
89ijZZe9Mx0p9lNpihaBEB1erAUWPo5Obh62zo80W3h6x9WzkGQWM+PyFK2DYoaC
8biEZzcc21sLEHvXQkcEGJSKrihHr9sluOqvxmCw5QAkYIFAeZRoeH7JtZWjVCnr
T3XvaG1G1Aw6tS7ErxufdKawREAGki0Rm9i1baiH9sqNj5rllM01Y+PgU6E21Nbg
iZp9gJLjfwM4vhYLlovvQK5PRoOBsCkyCpEI4GJqjLeam5p/WN06CFFf0ifQofYr
qDPCVDkWHWV8nugFFKE7
=EVh9
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"Slightly more changes than usual this time:
- KDump Kernel IOMMU take-over code for AMD IOMMU. The code now tries
to preserve the mappings of the kernel so that master aborts for
devices are avoided. Master aborts cause some devices to fail in
the kdump kernel, so this code makes the dump more likely to
succeed when AMD IOMMU is enabled.
- common flush queue implementation for IOVA code users. The code is
still optional, but AMD and Intel IOMMU drivers had their own
implementation which is now unified.
- finish support for iommu-groups. All drivers implement this feature
now so that IOMMU core code can rely on it.
- finish support for 'struct iommu_device' in iommu drivers. All
drivers now use the interface.
- new functions in the IOMMU-API for explicit IO/TLB flushing. This
will help to reduce the number of IO/TLB flushes when IOMMU drivers
support this interface.
- support for mt2712 in the Mediatek IOMMU driver
- new IOMMU driver for QCOM hardware
- system PM support for ARM-SMMU
- shutdown method for ARM-SMMU-v3
- some constification patches
- various other small improvements and fixes"
* tag 'iommu-updates-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (87 commits)
iommu/vt-d: Don't be too aggressive when clearing one context entry
iommu: Introduce Interface for IOMMU TLB Flushing
iommu/s390: Constify iommu_ops
iommu/vt-d: Avoid calling virt_to_phys() on null pointer
iommu/vt-d: IOMMU Page Request needs to check if address is canonical.
arm/tegra: Call bus_set_iommu() after iommu_device_register()
iommu/exynos: Constify iommu_ops
iommu/ipmmu-vmsa: Make ipmmu_gather_ops const
iommu/ipmmu-vmsa: Rereserving a free context before setting up a pagetable
iommu/amd: Rename a few flush functions
iommu/amd: Check if domain is NULL in get_domain() and return -EBUSY
iommu/mediatek: Fix a build warning of BIT(32) in ARM
iommu/mediatek: Fix a build fail of m4u_type
iommu: qcom: annotate PM functions as __maybe_unused
iommu/pamu: Fix PAMU boot crash
memory: mtk-smi: Degrade SMI init to module_init
iommu/mediatek: Enlarge the validate PA range for 4GB mode
iommu/mediatek: Disable iommu clock when system suspend
iommu/mediatek: Move pgtable allocation into domain_alloc
iommu/mediatek: Merge 2 M4U HWs into one iommu domain
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZsr8cAAoJEFmIoMA60/r8lXYQAKViYIRMJDD4n3NhjMeLOsnJ
vwaBmWlLRjSFIEpag5kMjS1RJE17qAvmkBZnDvSNZ6cT28INkkZnVM2IW96WECVq
64MIvDijVPcvqGuWePCfWdDiSXApiDWwJuw55BOhmvV996wGy0gYgzpPY+1g0Knh
XzH9IOzDL79hZleLfsxX0MLV6FGBVtOsr0jvQ04k4IgEMIxEDTlbw85rnrvzQUtc
0Vj2koaxWIESZsq7G/wiZb2n6ekaFdXO/VlVvvhmTSDLCBaJ63Hb/gfOhwMuVkS6
B3cVprNrCT0dSzWmU4ZXf+wpOyDpBexlemW/OR/6CQUkC6AUS6kQ5si1X44dbGmJ
nBPh414tdlm/6V4h/A3UFPOajSGa/ZWZ/uQZPfvKs1R6WfjUerWVBfUpAzPbgjam
c/mhJ19HYT1J7vFBfhekBMeY2Px3JgSJ9rNsrFl48ynAALaX5GEwdpo4aqBfscKz
4/f9fU4ysumopvCEuKD2SsJvsPKd5gMQGGtvAhXM1TxvAoQ5V4cc99qEetAPXXPf
h2EqWm4ph7YP4a+n/OZBjzluHCmZJn1CntH5+//6wpUk6HnmzsftGELuO9n12cLE
GGkreI3T9ctV1eOkzVVa0l0QTE1X/VLyEyKCtb9obXsDaG4Ud7uKQoZgB19DwyTJ
EG76ridTolUFVV+wzJD9
=9cLP
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add enhanced Downstream Port Containment support, which prints more
details about Root Port Programmed I/O errors (Dongdong Liu)
- add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)
- add MediaTek MT2712 and MT7622 support (Ryder Lee)
- add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)
- add Qualcom IPQ8074 support (Varadarajan Narayanan)
- add R-Car r8a7743/5 device tree support (Biju Das)
- add Rockchip per-lane PHY support for better power management (Shawn
Lin)
- fix IRQ mapping for hot-added devices by replacing the
pci_fixup_irqs() boot-time design with a host bridge hook called at
probe-time (Lorenzo Pieralisi, Matthew Minter)
- fix race when enabling two devices that results in upstream bridge
not being enabled correctly (Srinath Mannam)
- fix pciehp power fault infinite loop (Keith Busch)
- fix SHPC bridge MSI hotplug events by enabling bus mastering
(Aleksandr Bezzubikov)
- fix a VFIO issue by correcting PCIe capability sizes (Alex
Williamson)
- fix an INTD issue on Xilinx and possibly other drivers by unifying
INTx IRQ domain support (Paul Burton)
- avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
Roedel)
- allow APM X-Gene device assignment to guests by adding an ACS quirk
(Feng Kan)
- fix driver crashes by disabling Extended Tags on Broadcom HT2100
(Extended Tags support is required for PCIe Receivers but not
Requesters, and we now enable them by default when Requesters support
them) (Sinan Kaya)
- fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
use the real Requester ID (not a phantom RID) (Robin Murphy)
- prevent assignment of Intel VMD children to guests (which may be
supported eventually, but isn't yet) by not associating an IOMMU with
them (Jon Derrick)
- fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
Bauer)
- fix a Function-Level Reset issue with Intel 750 NVMe by waiting
longer (up to 60sec instead of 1sec) for device to become ready
(Sinan Kaya)
- fix a Function-Level Reset issue on iProc Stingray by working around
hardware defects in the CRS implementation (Oza Pawandeep)
- fix an issue with Intel NVMe P3700 after an iProc reset by adding a
delay during shutdown (Oza Pawandeep)
- fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
in compose_msi_msg() (Stephen Hemminger)
- fix a wireless LAN driver timeout by clearing DesignWare MSI
interrupt status after it is handled, not before (Faiz Abbas)
- fix DesignWare ATU enable checking (Jisheng Zhang)
- reduce Layerscape dependencies on the bootloader by doing more
initialization in the driver (Hou Zhiqiang)
- improve Intel VMD performance allowing allocation of more IRQ vectors
than present CPUs (Keith Busch)
- improve endpoint framework support for initial DMA mask, different
BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
Vijay Abraham I, Stan Drozd)
- rework CRS support to add periodic messages while we poll during
enumeration and after Function-Level Reset and prepare for possible
other uses of CRS (Sinan Kaya)
- clean up Root Port AER handling by removing unnecessary code and
moving error handler methods to struct pcie_port_service_driver
(Christoph Hellwig)
- clean up error handling paths in various drivers (Bjorn Andersson,
Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
Lorenzo Pieralisi, Sergei Shtylyov)
- clean up SR-IOV resource handling by disabling VF decoding before
updating the corresponding resource structs (Gavin Shan)
- clean up DesignWare-based drivers by unifying quirks to update Class
Code and Interrupt Pin and related handling of write-protected
registers (Hou Zhiqiang)
- clean up by adding empty generic pcibios_align_resource() and
pcibios_fixup_bus() and removing empty arch-specific implementations
(Palmer Dabbelt)
- request exclusive reset control for several drivers to allow cleanup
elsewhere (Philipp Zabel)
- constify various structures (Arvind Yadav, Bhumika Goyal)
- convert from full_name() to %pOF (Rob Herring)
- remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
Lin)
* tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
PCI: xgene: Clean up whitespace
PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
PCI: xgene: Fix platform_get_irq() error handling
PCI: xilinx-nwl: Fix platform_get_irq() error handling
PCI: rockchip: Fix platform_get_irq() error handling
PCI: altera: Fix platform_get_irq() error handling
PCI: spear13xx: Fix platform_get_irq() error handling
PCI: artpec6: Fix platform_get_irq() error handling
PCI: armada8k: Fix platform_get_irq() error handling
PCI: dra7xx: Fix platform_get_irq() error handling
PCI: exynos: Fix platform_get_irq() error handling
PCI: iproc: Clean up whitespace
PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
PCI: iproc: Add 500ms delay during device shutdown
PCI: Fix typos and whitespace errors
PCI: Remove unused "res" variable from pci_resource_io()
PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
PCI/AER: Reformat AER register definitions
iommu/vt-d: Prevent VMD child devices from being remapping targets
x86/PCI: Use is_vmd() rather than relying on the domain number
...
Common:
- improve heuristic for boosting preempted spinlocks by ignoring VCPUs
in user mode
ARM:
- fix for decoding external abort types from guests
- added support for migrating the active priority of interrupts when
running a GICv2 guest on a GICv3 host
- minor cleanup
PPC:
- expose storage keys to userspace
- merge powerpc/topic/ppc-kvm branch that contains
find_linux_pte_or_hugepte and POWER9 thread management cleanup
- merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations
- fixes
s390:
- merge of topic branch tlb-flushing from the s390 tree to get the
no-dat base features
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
x86:
- emulate Hyper-V TSC frequency MSRs
- add nested INVPCID
- emulate EPTP switching VMFUNC
- support Virtual GIF
- support 5 level page tables
- speedup nested VM exits by packing byte operations
- speedup MMIO by using hardware provided physical address
- a lot of fixes and cleanups, especially nested
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJZspE1AAoJEED/6hsPKofoDcMIALT11n+LKV50QGwQdg2W1GOt
aChbgnj/Kegit3hQlDhVNb8kmdZEOZzSL81Lh0VPEr7zXU8QiWn2snbizDPv8sde
MpHhcZYZZ0YrpoiZKjl8yiwcu88OWGn2qtJ7OpuTS5hvEGAfxMncp0AMZho6fnz/
ySTwJ9GK2MTgBw39OAzCeDOeoYn4NKYMwjJGqBXRhNX8PG/1wmfqv0vPrd6wfg31
KJ58BumavwJjr8YbQ1xELm9rpQrAmaayIsG0R1dEUqCbt5a1+t2gt4h2uY7tWcIv
ACt2bIze7eF3xA+OpRs+eT+yemiH3t9btIVmhCfzUpnQ+V5Z55VMSwASLtTuJRQ=
=R8Ry
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.14
Common:
- improve heuristic for boosting preempted spinlocks by ignoring
VCPUs in user mode
ARM:
- fix for decoding external abort types from guests
- added support for migrating the active priority of interrupts when
running a GICv2 guest on a GICv3 host
- minor cleanup
PPC:
- expose storage keys to userspace
- merge kvm-ppc-fixes with a fix that missed 4.13 because of
vacations
- fixes
s390:
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
x86:
- emulate Hyper-V TSC frequency MSRs
- add nested INVPCID
- emulate EPTP switching VMFUNC
- support Virtual GIF
- support 5 level page tables
- speedup nested VM exits by packing byte operations
- speedup MMIO by using hardware provided physical address
- a lot of fixes and cleanups, especially nested"
* tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
KVM: arm/arm64: Support uaccess of GICC_APRn
KVM: arm/arm64: Extract GICv3 max APRn index calculation
KVM: arm/arm64: vITS: Drop its_ite->lpi field
KVM: arm/arm64: vgic: constify seq_operations and file_operations
KVM: arm/arm64: Fix guest external abort matching
KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd
KVM: s390: vsie: cleanup mcck reinjection
KVM: s390: use WARN_ON_ONCE only for checking
KVM: s390: guestdbg: fix range check
KVM: PPC: Book3S HV: Report storage key support to userspace
KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9
KVM: PPC: Book3S HV: Fix invalid use of register expression
KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
KVM: PPC: e500mc: Fix a NULL dereference
KVM: PPC: e500: Fix some NULL dereferences on error
KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
KVM: s390: we are always in czam mode
KVM: s390: expose no-DAT to guest and migration support
KVM: s390: sthyi: remove invalid guest write access
...
This fix was intended for 4.13, but didn't get in because both
maintainers were on vacation.
Paul Mackerras:
"It adds mutual exclusion between list_add_rcu and list_del_rcu calls
on the kvm->arch.spapr_tce_tables list. Without this, userspace could
potentially trigger corruption of the list and cause a host crash or
worse."
- merge of topic branch tlb-flushing from the s390 tree to get the
no-dat base features
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJZqQt+AAoJEBF7vIC1phx82jcP/0PgDUFpyaq50R1LhoLoXfsv
55x8hZFJzkBTapv6ckb9MP1J90htbmYXUDbgk/JS7ElenLvLw6r9QenHJnLYRoJ4
s4ApooG5/wpbQWdxfqZaNHZEV1/Fgvv+R348k8cFfhrzJdOd0p7I+qkd5FOX2VTl
NGk+cR03yhtPOYeUb90k40k0kGmY7acdnYe2txUoh7qC38vPVwMJA6S01mWbB3sm
ApQN9hzGvXhT833dVTNMS33uCds/9nCHtMn0viqTfs9AWKIs62nK4L54c5dgZWpH
u7+ji5OUEbV+8QQH5G1OmWUIo3sKO5MVS0P9jbh0bIIdUZvUKdchFP8bQcmde02E
IgYgjbgmKc8Zec+Z0PHs5Gy+VbGXpqg0nvwyzK96fh/KM7aepxHUfDOdWWj8qbgI
4sc2fe5JFkP5IldYgJlMPSm4GqLNDkbGFnLralkaqhRvjS1sNjgXxcCclLQQaI5H
kMklzkNpMQ9yC7pYLjq7AKOzmMdHqq90rAIq0zQSYQjMoiC2RLYFyjFwsK3KlmeJ
GRgYrIZOVkYONZhT4F81piGcuRsbCGSXZ5I5i/QxJ/EgNGO8EcGrrONAk1NW9Dsp
q9qu/aKXOzGe57jjP+MonH4bcrHLCmcpkxS9sgiECu0PtutzbGdZPUXA9Pj9YMON
7kQXrwPgEHhn62+l/JdO
=N/6G
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: Fixes and features for 4.14
- merge of topic branch tlb-flushing from the s390 tree to get the
no-dat base features
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
Pull networking updates from David Miller:
1) Support ipv6 checksum offload in sunvnet driver, from Shannon
Nelson.
2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
Dumazet.
3) Allow generic XDP to work on virtual devices, from John Fastabend.
4) Add bpf device maps and XDP_REDIRECT, which can be used to build
arbitrary switching frameworks using XDP. From John Fastabend.
5) Remove UFO offloads from the tree, gave us little other than bugs.
6) Remove the IPSEC flow cache, from Florian Westphal.
7) Support ipv6 route offload in mlxsw driver.
8) Support VF representors in bnxt_en, from Sathya Perla.
9) Add support for forward error correction modes to ethtool, from
Vidya Sagar Ravipati.
10) Add time filter for packet scheduler action dumping, from Jamal Hadi
Salim.
11) Extend the zerocopy sendmsg() used by virtio and tap to regular
sockets via MSG_ZEROCOPY. From Willem de Bruijn.
12) Significantly rework value tracking in the BPF verifier, from Edward
Cree.
13) Add new jump instructions to eBPF, from Daniel Borkmann.
14) Rework rtnetlink plumbing so that operations can be run without
taking the RTNL semaphore. From Florian Westphal.
15) Support XDP in tap driver, from Jason Wang.
16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.
17) Add Huawei hinic ethernet driver.
18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
Delalande.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
i40e: avoid NVM acquire deadlock during NVM update
drivers: net: xgene: Remove return statement from void function
drivers: net: xgene: Configure tx/rx delay for ACPI
drivers: net: xgene: Read tx/rx delay for ACPI
rocker: fix kcalloc parameter order
rds: Fix non-atomic operation on shared flag variable
net: sched: don't use GFP_KERNEL under spin lock
vhost_net: correctly check tx avail during rx busy polling
net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
rxrpc: Make service connection lookup always check for retry
net: stmmac: Delete dead code for MDIO registration
gianfar: Fix Tx flow control deactivation
cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
cxgb4: Fix pause frame count in t4_get_port_stats
cxgb4: fix memory leak
tun: rename generic_xdp to skb_xdp
tun: reserve extra headroom only when XDP is set
net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
net: dsa: bcm_sf2: Advertise number of egress queues
...
The three locks 'lock', 'pgtable_lock' and 'gmap_lock' in the
mm_context_t can be reduced to a single lock.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The order in __tlb_flush_mm_lazy is to flush TLB first and then clear
the mm->context.flush_mm bit. This can lead to missed flushes as the
bit can be set anytime, the order needs to be the other way aronud.
But this leads to a different race, __tlb_flush_mm_lazy may be called
on two CPUs concurrently. If mm->context.flush_mm is cleared first then
another CPU can bypass __tlb_flush_mm_lazy although the first CPU has
not done the flush yet. In a virtualized environment the time until the
flush is finally completed can be arbitrarily long.
Add a spinlock to serialize __tlb_flush_mm_lazy and use the function
in finish_arch_post_lock_switch as well.
Cc: <stable@vger.kernel.org>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The local TLB flushing code keeps an additional mask in the mm.context,
the cpu_attach_mask. At the time a global flush of an address space is
done the cpu_attach_mask is copied to the mm_cpumask in order to avoid
future global flushes in case the mm is used by a single CPU only after
the flush.
Trouble is that the reset of the mm_cpumask is racy against the detach
of an mm address space by switch_mm. The current order is first the
global TLB flush and then the copy of the cpu_attach_mask to the
mm_cpumask. The order needs to be the other way around.
Cc: <stable@vger.kernel.org>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
KVM has a need to control the interrupts on real and virtualized
AP queue devices. This fix provides a new function to control
the interrupt facilities of an AP queue device.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
KVM has a need to fetch the crypto configuration information
as it is returned by the PQAP(QCI) instruction. This patch
introduces a new API ap_query_configuration() which provides
this info in a handy way for the caller.
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Under certain specified conditions, the Test AP Queue (TAPQ)
subfunction of the Process Adjunct Processor Queue (PQAP) instruction
will be intercepted by a guest VM. The guest VM must have a means for
executing the intercepted instruction.
The vfio_ap driver will provide an interface to execute the
PQAP(TAPQ) instruction subfunction on behalf of a guest VM.
The code for executing the AP instructions currently resides in the
AP bus. This patch refactors the AP bus code to externalize access
to the PQAP(TAPQ) instruction subfunction to make it available to
the vfio_ap driver.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
"The first part of the s390 updates for 4.14:
- Add machine type 0x3906 for IBM z14
- Add IBM z14 TLB flushing improvements for KVM guests
- Exploit the TOD clock epoch extension to provide a continuous TOD
clock afer 2042/09/17
- Add NIAI spinlock hints for IBM z14
- Rework the vmcp driver and use CMA for the respone buffer of z/VM
CP commands
- Drop some s390 specific asm headers and use the generic version
- Add block discard for DASD-FBA devices under z/VM
- Add average request times to DASD statistics
- A few of those constify patches which seem to be in vogue right now
- Cleanup and bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (50 commits)
s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs
s390/dasd: Add discard support for FBA devices
s390/zcrypt: make CPRBX const
s390/uaccess: avoid mvcos jump label
s390/mm: use generic mm_hooks
s390/facilities: fix typo
s390/vmcp: simplify vmcp_response_free()
s390/topology: Remove the unused parent_node() macro
s390/dasd: Change unsigned long long to unsigned long
s390/smp: convert cpuhp_setup_state() return code to zero on success
s390: fix 'novx' early parameter handling
s390/dasd: add average request times to dasd statistics
s390/scm: use common completion path
s390/pci: log changes to uid checking
s390/vmcp: simplify vmcp_ioctl()
s390/vmcp: return -ENOTTY for unknown ioctl commands
s390/vmcp: split vmcp header file and move to uapi
s390/vmcp: make use of contiguous memory allocator
s390/cpcmd,vmcp: avoid GFP_DMA allocations
s390/vmcp: fix uaccess check and avoid undefined behavior
...
Pull locking updates from Ingo Molnar:
- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.
- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)
- Fix static keys (Paolo Bonzini)
- Add killable versions of down_read() et al (Kirill Tkhai)
- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)
- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)
- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...
Pull RCU updates from Ingo Molnad:
"The main RCU related changes in this cycle were:
- Removal of spin_unlock_wait()
- SRCU updates
- RCU torture-test updates
- RCU Documentation updates
- Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
- Miscellaneous RCU fixes
- CPU-hotplug fixes"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
arch: Remove spin_unlock_wait() arch-specific definitions
locking: Remove spin_unlock_wait() generic definitions
drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
ipc: Replace spin_unlock_wait() with lock/unlock pair
exit: Replace spin_unlock_wait() with lock/unlock pair
completion: Replace spin_unlock_wait() with lock/unlock pair
doc: Set down RCU's scheduling-clock-interrupt needs
doc: No longer allowed to use rcu_dereference on non-pointers
doc: Add RCU files to docbook-generation files
doc: Update memory-barriers.txt for read-to-write dependencies
doc: Update RCU documentation
membarrier: Provide expedited private command
rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
rcu: Add warning to rcu_idle_enter() for irqs enabled
rcu: Make rcu_idle_enter() rely on callers disabling irqs
rcu: Add assertions verifying blocked-tasks list
rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
rcu: Add TPS() protection for _rcu_barrier_trace strings
rcu: Use idle versions of swait to make idle-hack clear
swait: Add idle variants which don't contribute to load average
...
Pull misc fixes from Al Viro:
"Loose ends and regressions from the last merge window.
Strictly speaking, only binfmt_flat thing is a build regression per
se - the rest is 'only sparse cares about that' stuff"
[ This came in before the 4.13 release and could have gone there, but it
was late in the release and nothing seemed critical enough to care, so
I'm pulling it in the 4.14 merge window instead - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
binfmt_flat: fix arch/m32r and arch/microblaze flat_put_addr_at_rp()
compat_hdio_ioctl: Fix a declaration
<linux/uaccess.h>: Fix copy_in_user() declaration
annotate RWF_... flags
teach SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE to handle __bitwise arguments
A 31-bit compat process can force a BUG_ON in crst_table_upgrade
with specific, invalid mmap calls, e.g.
mmap((void*) 0x7fff8000, 0x10000, 3, 32, -1, 0)
The arch_get_unmapped_area[_topdown] functions miss an if condition
in the decision to do a page table upgrade.
Fixes: 9b11c7912d ("s390/mm: simplify arch_get_unmapped_area[_topdown]")
Cc: <stable@vger.kernel.org> # v4.12+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The mm->context.asce field of a new process is not set up correctly
in case of a fork with a 5 level page table.
Add the missing case to init_new_context().
Fixes: 1aea9b3f92 ("s390/mm: implement 5 level pages tables")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The machine check information is part of the vsie_page.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-4-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Move the real logic that always has to be executed out of the
WARN_ON_ONCE.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-3-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Looks like the "overflowing" range check is wrong.
|=======b-------a=======|
addr >= a || addr <= b
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-2-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Independent of the underlying hardware, kvm will now always handle
SIGP SET ARCHITECTURE as if czam were enabled. Therefore, let's not
only forward that bit but always set it.
While at it, add a comment regarding STHYI.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170829143108.14703-1-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Right now there is a potential hang situation for postcopy migrations,
if the guest is enabling storage keys on the target system during the
postcopy process.
For storage key virtualization, we have to forbid the empty zero page as
the storage key is a property of the physical page frame. As we enable
storage key handling lazily we then drop all mappings for empty zero
pages for lazy refaulting later on.
This does not work with the postcopy migration, which relies on the
empty zero page never triggering a fault again in the future. The reason
is that postcopy migration will simply read a page on the target system
if that page is a known zero page to fault in an empty zero page. At
the same time postcopy remembers that this page was already transferred
- so any future userfault on that page will NOT be retransmitted again
to avoid races.
If now the guest enters the storage key mode while in postcopy, we will
break this assumption of postcopy.
The solution is to disable the empty zero page for KVM guests early on
and not during storage key enablement. With this change, the postcopy
migration process is guaranteed to start after no zero pages are left.
As guest pages are very likely not empty zero pages anyway the memory
overhead is also pretty small.
While at it this also adds proper page table locking to the zero page
removal.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The z/VM hypervisor provides virtual disks (VDISK) which are backed by
main memory of the hypervisor. Those devices are seen as DASD FBA disks
within the Linux guest.
Whenever data is written to such a device, memory is allocated
on-the-fly by z/VM accordingly. This memory, however, is not being freed
if data on the device is deleted by the guest OS.
In order to make memory usable after deletion again, add discard support
to the FBA discipline.
While at it, update comments regarding the DASD_FEATURE_* flags.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the kernel is compiled for z10 or later machines the uaccess
code inlines the mvcos instruction. The facility bit 27 which
indicates the availability of MVCOS has to be set. The have_mvcos
jump label will always be true.
Make the generation of the have_mvcos jump label conditional on
!CONFIG_HAVE_MARCH_Z10_FEATURES.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With git commit 3446c13b26
"s390/mm: four page table levels vs. fork"
s390 dropped its architecture specific version of arch_dup_mmap.
Now all functions defined by include/asm-generic/mm_hooks.h are
identical to the s390 versions. Use the generic header.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There really is no "general extension" facility available. Use the
correct name "general instructions extension" facility instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The STFLE bit 147 indicates whether the ESSA no-DAT operation code is
valid, the bit is not normally provided to the host; the host is
instead provided with an SCLP bit that indicates whether guests can
support the feature.
This patch:
* enables the STFLE bit in the guest if the corresponding SCLP bit is
present in the host.
* adds support for migrating the no-DAT bit in the PGSTEs
* fixes the software interpretation of the ESSA instruction that is
used when migrating, both for the new operation code and for the old
"set stable", as per specifications.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
handle_sthyi() always writes to guest memory if the sthyi function
code is zero in order to fault in the page that later is written to.
However a function code of zero does not necessarily mean that a write
to guest memory happens: if the KVM host is running as a second level
guest under z/VM 6.2 the sthyi instruction is indicated to be
available to the KVM host, however if the instruction is executed it
will always return with a return code that indicates "unsupported
function code".
In such a case handle_sthyi() must not write to guest memory. This
means that the prior write access to fault in the guest page may
result in invalid guest exceptions, and/or invalid data modification.
In order to be architecture compliant simply remove the write_guest()
call.
Given that the guest assumed a write access anyway, this fix does not
qualify for -stable. This just makes sure the sthyi handler is
architecture compliant.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Allow for the enablement of MEF and the support for the extended
epoch in SIE and VSIE for the extended guest TOD-Clock.
A new interface is used for getting/setting a guest's extended TOD-Clock
that uses a single ioctl invocation, KVM_S390_VM_TOD_EXT. Since the
host time is a moving target that might see an epoch switch or STP sync
checks we need an atomic ioctl and cannot use the exisiting two
interfaces. The old method of getting and setting the guest TOD-Clock is
still retained and is used when the old ioctls are called.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
kvm has always supported the concept of starting in z/Arch mode so let's
reflect the feature bit to the guest.
Also, we change sigp set architecture to reject any request to change
architecture modes.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.
Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.
This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.
Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
Commit a7be6e5a7f ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().
The parent_node() macro in S390 platform is unnecessary.
Remove it for cleanup.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Unsigned long long and unsigned long were different in size for 31-bit.
For 64-bit the size for both datatypes is 8 Bytes and since the support
for 31-bit is long gone we can clean up a little and change everything
to unsigned long.
Change get_phys_clock() along the way to accept unsigned long as well so
that the DASD code can be consistent.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
cpuhp_setup_state() returns a state number on CPUHP_AP_ONLINE_DYN if
no error occurred. Therefore convert the return code to zero before
using it as exit code for s390_smp_init().
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Specifying the 'novx' kernel parameter always results in a warning:
Malformed early option 'novx'
The reason for this is that the novx early parameter handling function
always returns a non-zero value which means that an error occurred.
Fix this and return the correct zero value instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sthyi should only generate a specification exception if the function
code is zero and the response buffer is not on a 4k boundary.
The current code would also test for unknown function codes if the
response buffer, that is currently only defined for function code 0,
is not on a 4k boundary and incorrectly inject a specification
exception instead of returning with condition code 3 and return code 4
(unsupported function code).
Fix this by moving the boundary check.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Cc: <stable@vger.kernel.org> # 4.8+
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The sthyi inline assembly misses register r3 within the clobber
list. The sthyi instruction will always write a return code to
register "R2+1", which in this case would be r3. Due to that we may
have register corruption and see host crashes or data corruption
depending on how gcc decided to allocate and use registers during
compile time.
Fixes: 95ca2cb579 ("KVM: s390: Add sthyi emulation")
Cc: <stable@vger.kernel.org> # 4.8+
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair. This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Add support for the iommu_device_register interface to make
the s390 hardware iommus visible to the iommu core and in
sysfs.
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
Quote from Mel Gorman:
"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to
flush.
Hence, when madv_free on CPU 1 returns, there are still potentially
writable TLB entries and the underlying PTE is still present so that a
subsequent write does not necessarily propagate the dirty bit to the
underlying PTE any more. Reclaim at some unknown time at the future
may then see that the PTE is still clean and discard the page even
though a write has happened in the meantime. I think this is possible
but I could have missed some protection in madv_free that prevents it
happening."
This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].
TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on. In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.
I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.
NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends. However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.
[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/
[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a preparatory patch for solving race problems caused by
TLB batch. For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.
Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This work implements jiting of BPF_J{LT,LE,SLT,SLE} instructions
with BPF_X/BPF_K variants for the s390x eBPF JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Some hypervisors allow to toggle uid checking at runtime. Log
changes to uid checking in s390dbf.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split the vmcp header file and move the device driver internal
structure to the C file, and move the ioctl definitions to the uapi
directory.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If memory is fragmented it is unlikely that large order memory
allocations succeed. This has been an issue with the vmcp device
driver since a long time, since it requires large physical contiguous
memory ares for large responses.
To hopefully resolve this issue make use of the contiguous memory
allocator (cma). This patch adds a vmcp specific vmcp cma area with a
default size of 4MB. The size can be changed either via the
VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma"
kernel parameter (e.g. "vmcp_cma=16m").
For any vmcp response buffers larger than 16k memory from the cma area
will be allocated. If such an allocation fails, there is a fallback to
the buddy allocator.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
According to the CP Programming Services manual Diagnose Code 8
"Virtual Console Function" can be used in all addressing modes. Also
the input and output buffers do not have a limitation which specifies
they need to be below the 2GB line.
This is true at least since z/VM 5.4.
Therefore remove the sam31/64 instructions and allow for simple
GFP_KERNEL allocations. This makes it easier to allocate a 1MB page
if the user requested such a large return buffer.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Memory blocks that contain areas for the contiguous memory allocator
(cma) should not be allowed to go offline. Otherwise this would render
cma completely useless.
This might make sense on other architectures where memory might be
taken offline due to hardware errors, but not on architectures which
support memory hotplug for load balancing.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add all the missing little helper functions like virt_to_pfn(),
phys_to_page(), etc. While we had a couple of these helper functions
like e.g. page_to_phys() the other functions were missing, which is
quite annoying if one is looking for exactly such a function.
Therefore finally add all those little helper functions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This implements kvm_arch_vcpu_in_kernel() for s390. DIAG is a privileged
operation, so it cannot be called from problem state (user mode).
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a vcpu exits due to request a user mode spinlock, then
the spinlock-holder may be preempted in user mode or kernel mode.
(Note that not all architectures trap spin loops in user mode,
only AMD x86 and ARM/ARM64 currently do).
But if a vcpu exits in kernel mode, then the holder must be
preempted in kernel mode, so we should choose a vcpu in kernel mode
as a more likely candidate for the lock holder.
This introduces kvm_arch_vcpu_in_kernel() to decide whether the
vcpu is in kernel-mode when it's preempted. kvm_vcpu_on_spin's
new argument says the same of the spinning VCPU.
Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While testing some other work that required JIT modifications, I
run into test_bpf causing a hang when JIT enabled on s390. The
problematic test case was the one from ddc665a4bb (bpf, arm64:
fix jit branch offset related to ldimm64), and turns out that we
do have a similar issue on s390 as well. In bpf_jit_prog() we
update next instruction address after returning from bpf_jit_insn()
with an insn_count. bpf_jit_insn() returns either -1 in case of
error (e.g. unsupported insn), 1 or 2. The latter is only the
case for ldimm64 due to spanning 2 insns, however, next address
is only set to i + 1 not taking actual insn_count into account,
thus fix is to use insn_count instead of 1. bpf_jit_enable in
mode 2 provides also disasm on s390:
Before fix:
000003ff800349b6: a7f40003 brc 15,3ff800349bc ; target
000003ff800349ba: 0000 unknown
000003ff800349bc: e3b0f0700024 stg %r11,112(%r15)
000003ff800349c2: e3e0f0880024 stg %r14,136(%r15)
000003ff800349c8: 0db0 basr %r11,%r0
000003ff800349ca: c0ef00000000 llilf %r14,0
000003ff800349d0: e320b0360004 lg %r2,54(%r11)
000003ff800349d6: e330b03e0004 lg %r3,62(%r11)
000003ff800349dc: ec23ffeda065 clgrj %r2,%r3,10,3ff800349b6 ; jmp
000003ff800349e2: e3e0b0460004 lg %r14,70(%r11)
000003ff800349e8: e3e0b04e0004 lg %r14,78(%r11)
000003ff800349ee: b904002e lgr %r2,%r14
000003ff800349f2: e3b0f0700004 lg %r11,112(%r15)
000003ff800349f8: e3e0f0880004 lg %r14,136(%r15)
000003ff800349fe: 07fe bcr 15,%r14
After fix:
000003ff80ef3db4: a7f40003 brc 15,3ff80ef3dba
000003ff80ef3db8: 0000 unknown
000003ff80ef3dba: e3b0f0700024 stg %r11,112(%r15)
000003ff80ef3dc0: e3e0f0880024 stg %r14,136(%r15)
000003ff80ef3dc6: 0db0 basr %r11,%r0
000003ff80ef3dc8: c0ef00000000 llilf %r14,0
000003ff80ef3dce: e320b0360004 lg %r2,54(%r11)
000003ff80ef3dd4: e330b03e0004 lg %r3,62(%r11)
000003ff80ef3dda: ec230006a065 clgrj %r2,%r3,10,3ff80ef3de6 ; jmp
000003ff80ef3de0: e3e0b0460004 lg %r14,70(%r11)
000003ff80ef3de6: e3e0b04e0004 lg %r14,78(%r11) ; target
000003ff80ef3dec: b904002e lgr %r2,%r14
000003ff80ef3df0: e3b0f0700004 lg %r11,112(%r15)
000003ff80ef3df6: e3e0f0880004 lg %r14,136(%r15)
000003ff80ef3dfc: 07fe bcr 15,%r14
test_bpf.ko suite runs fine after the fix.
Fixes: 0546231057 ("s390/bpf: Add s390x eBPF JIT compiler backend")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.
Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And another header file for which we can use the generic variant,
even though it doesn't look obvious at first glance.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
clang doesn't like s390 specific inline assembler constraints. These
are present in our arch specific uapi/asm/swab.h which again is
required by some ebpf test cases.
For current compiler versions the generic swab.h already makes use of
gcc's builtin functions. Therefore we can simply remove our own header
file and use the generic one.
This will generate worse code if used with compilers before gcc 4.8,
which has no __builtin_bswap16(); or before gcc v4.4, which has no
__builtin_bswap[32|64](). For these cases a C implementation fallback
would be used which generates more code, but is still correct (170KB
extra code for gcc 4.3 with performance_defconfig).
However given that we need (and want) to get rid of the inline
assemblies anyway in order to be able to use clang, the above is just
a minor drawback if old gcc compilers are used.
With current compilers there is close to zero difference, except for
three btrfs bit functions which generate more out-of-line code. The
generated code looks still correct and also uses the s390 specific
byteswap instructions.
Reported-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Multiple architectures define this as an empty function, and I'm adding
another one as part of the RISC-V port. Add a __weak version of
pcibios_fixup_bus() and delete the now-obselete ones in a handful of
ports.
The only functional change should be that microblaze used to export
pcibios_fixup_bus(). None of the other architectures exports this, so I
just dropped it.
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Both header files only include the corresponding uapi header file and
therefore can be removed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The name of bit 36 of the machine check interruption code is "guarded
storage registers validity". Add the missing "validity" part in order
to be consistent with all other comments, which include this piece of
information.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
PPC: host crash fixes.
x86: bugfixes, including making nested posted interrupts really work.
Generic: tweaks to kvm_stat and to uevents
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZe2EYAAoJEL/70l94x66D4JYH/AnvioKWTsplhUKt4Y4JlpJX
EXYjQd/CIZ+MHNNUH+U+XEj6tKQymKrz4TeZSs1o0nyxCeyparR3gK27OYVpPspN
GkPSit3hyRgW9r5uXp6pZCJuFCAMpMZ6z4sKbT1FxDhnWnpWayV9w8KA+yQT/UUX
dNQ9JJPUxApcM4NCaj2OCQ8K1koNIDCc52+jATf0iK/Heiaf6UGqCcHXUIy5I5wM
OWk05Qm32VBAYb6P6FfoyGdLMNAAkJtr1fyOJDkxX730CYgwpjIP0zifnJ1bt8V2
YRnjvPO5QciDHbZ8VynwAkKi0ZAd8psjwXh0KbyahPL/2/sA2xCztMH25qweriI=
=fsfr
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"s390:
- SRCU fix
PPC:
- host crash fixes
x86:
- bugfixes, including making nested posted interrupts really work
Generic:
- tweaks to kvm_stat and to uevents"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Fix reentrancy issues with preempt notifiers
tools/kvm_stat: add '-f help' to get the available event list
tools/kvm_stat: use variables instead of hard paths in help output
KVM: nVMX: Fix loss of L2's NMI blocking state
KVM: nVMX: Fix posted intr delivery when vcpu is in guest mode
x86: irq: Define a global vector for nested posted interrupts
KVM: x86: do mask out upper bits of PAE CR3
KVM: make pid available for uevents without debugfs
KVM: s390: take srcu lock when getting/setting storage keys
KVM: VMX: remove unused field
KVM: PPC: Book3S HV: Fix host crash on changing HPT size
KVM: PPC: Book3S HV: Enable TM before accessing TM registers
We need to hold the srcu lock when accessing memory slots
during migration
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJZd0O1AAoJEBF7vIC1phx8WOIQAKLLmnqmKfD5sy3hIxkRUHG7
WlN33M9tNCjDa1gL6iurBqKbOHt7phBhZdp/Kzuy/UxwPtM5vMaajQv1mLKl7b2y
1GKEt2aTMJ3e/ql3Z3kVOASNguU2NHjVuzu+BObLhfzBXmNYp4hkyrMSkegb4Q2v
j6Kfhpz/ltrzCYhL1/WYquKwG/aAV0fDefbfSrZZ0yx1TCnbtAEfbqBO6LXfys2t
M3LW2OQwavV0YYrrwF4o7Seg0YfpQVYVH1j1/0dLd/Mb1w1D5/CB+gTr6zYhZ6z2
MzZAa5ADRJ1y+YEoB6aB8T95/4i9Aw3fYhPYfw4ijUJcHUwSbzllSt2brIUSkEZo
rajClCRjAMaMfeg2q5SZ6Mbg/wt5uo9O0hRG9weqZQR+NCnYkcrfgTe2kYSrorAs
Msp6CUSfP9cOL9JETD+f3hEWl4uqA1P21n6Ga/v5ERPDxqIrqU49mlgjxBKRcPK4
wHKaBrA4h53yOjNQ5U/Bdr130W2dQnRHmYAQXuUM9OjbBOliTbtgf8ND0U15S9TY
HZomb3kEkL1o8J5rH6NBxxntoU48SEtuppBDSIAYBnp9h02F+RvNPqMr4orFHfD/
OJe0zJHYobK44kz59DGmN5ed5wZl+Vzvdf7KAa62MotKmDMykbkWbcb9j+gcUFHf
Yal+Mk51xCKPUcadGD/e
=PjH2
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-master-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: fixup missing srcu lock
We need to hold the srcu lock when accessing memory slots
during migration
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The z14 machine introduces new mode of the next-instruction-access-intent
NIAI instruction. With NIAI-8 it is possible to pin a cache-line on a
CPU for a small amount of time, NIAI-7 releases the cache-line again.
Finally NIAI-4 can be used to prevent the CPU to speculatively access
memory beyond the compare-and-swap instruction to get the lock.
Use these instruction in the spinlock code.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add detection for machine type 0x3906 and set the ELF platform name
to z14. Add the miscellaneous-instruction-extension 2 facility to
the list of facilities for z14.
And allow to generate code that only runs on a z14 machine.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The TOD epoch extension adds 8 epoch bits to the TOD clock to provide
a continuous clock after 2042/09/17. The store-clock-extended (STCKE)
instruction will store the epoch index in the first byte of the
16 bytes stored by the instruction. The read_boot_clock64 and the
read_presistent_clock64 functions need to take the additional bits
into account to give the correct result after 2042/09/17.
The clock-comparator register will stay 64 bit wide. The comparison
of the clock-comparator with the TOD clock is limited to bytes
1 to 8 of the extended TOD format. To deal with the overflow problem
due to an epoch change the clock-comparator sign control in CR0 can
be used to switch the comparison of the 64-bit TOD clock with the
clock-comparator to a signed comparison.
The decision between the signed vs. unsigned clock-comparator
comparisons is done at boot time. Only if the TOD clock is in the
second half of a 142 year epoch the signed comparison is used.
This solves the epoch overflow issue as long as the machine is
booted at least once in an epoch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>