Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.
With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call
basr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1
The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
It's good to have SPDX identifiers in all files to make it easier to
audit the kernel tree for correct licenses.
Update the arch/s390/kernel/ files with the correct SPDX license
identifier based on the license text in the file itself. The SPDX
identifier is a legally binding shorthand, which can be used instead of
the full boiler plate text.
This work is based on a script and data from Thomas Gleixner, Philippe
Ombredanne, and Kate Stewart.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just use MACHINE_HAS_TE to decide if HWCAP_S390_TE needs
to be added to elf_hwcap.
Suggested-by: Dan Horák <dan@danny.cz>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
With commit 7fb2b2d512 ("s390/virtio: remove the old KVM virtio
transport") the pre-ccw virtio transport for s390 was removed. To
complete the removal the uapi header file that contains the related data
structures must also be removed.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
The machine check extended save area is needed to store the vector
registers and the guarded storage control block when a CPU is
interrupted by a machine check.
Move the slab cache allocation of the full save area to nmi.c,
for early boot use a static __initdata block.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The boot_vdso_data variable is related to the vdso code, the magic of the
initial vdso area for the early boot and the replacement of it in vdso_init
should all be put into vdso.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The queued spinlock code for s390 follows the principles of the common
code qspinlock implementation but with a few notable differences.
The format of the spinlock_t locking word differs, s390 needs to store
the logical CPU number of the lock holder in the spinlock_t to be able
to use the diagnose 9c directed yield hypervisor call.
The inline code sequences for spin_lock and spin_unlock are nice and
short. The inline portion of a spin_lock now typically looks like this:
lhi %r0,0 # 0 indicates an empty lock
l %r1,0x3a0 # CPU number + 1 from lowcore
cs %r0,%r1,<some_lock> # lock operation
jnz call_wait # on failure call wait function
locked:
...
call_wait:
la %r2,<some_lock>
brasl %r14,arch_spin_lock_wait
j locked
A spin_unlock is as simple as before:
lhi %r0,0
sth %r0,2(%r2) # unlock operation
After a CPU has queued itself it may not enable interrupts again for the
arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
is removed.
To improve performance the code implements opportunistic lock stealing.
If the wait function finds a spinlock_t that indicates that the lock is
free but there are queued waiters, the CPU may steal the lock up to three
times without queueing itself. The lock stealing update the steal counter
in the lock word to prevent more than 3 steals. The counter is reset at
the time the CPU next in the queue successfully takes the lock.
While the queued spinlocks improve performance in a system with dedicated
CPUs, in a virtualized environment with continuously overcommitted CPUs
the queued spinlocks can have a negative effect on performance. This
is due to the fact that a queued CPU that is preempted by the hypervisor
will block the queue at some point even without holding the lock. With
the classic spinlock it does not matter if a CPU is preempted that waits
for the lock. Therefore use the queued spinlock code only if the system
runs with dedicated CPUs and fall back to classic spinlocks when running
with shared CPUs.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If memory is fragmented it is unlikely that large order memory
allocations succeed. This has been an issue with the vmcp device
driver since a long time, since it requires large physical contiguous
memory ares for large responses.
To hopefully resolve this issue make use of the contiguous memory
allocator (cma). This patch adds a vmcp specific vmcp cma area with a
default size of 4MB. The size can be changed either via the
VMCP_CMA_SIZE config option at compile time or with the "vmcp_cma"
kernel parameter (e.g. "vmcp_cma=16m").
For any vmcp response buffers larger than 16k memory from the cma area
will be allocated. If such an allocation fails, there is a fallback to
the buddy allocator.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add detection for machine type 0x3906 and set the ELF platform name
to z14. Add the miscellaneous-instruction-extension 2 facility to
the list of facilities for z14.
And allow to generate code that only runs on a z14 machine.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The TOD epoch extension adds 8 epoch bits to the TOD clock to provide
a continuous clock after 2042/09/17. The store-clock-extended (STCKE)
instruction will store the epoch index in the first byte of the
16 bytes stored by the instruction. The read_boot_clock64 and the
read_presistent_clock64 functions need to take the additional bits
into account to give the correct result after 2042/09/17.
The clock-comparator register will stay 64 bit wide. The comparison
of the clock-comparator with the TOD clock is limited to bytes
1 to 8 of the extended TOD format. To deal with the overflow problem
due to an epoch change the clock-comparator sign control in CR0 can
be used to switch the comparison of the 64-bit TOD clock with the
clock-comparator to a signed comparison.
The decision between the signed vs. unsigned clock-comparator
comparisons is done at boot time. Only if the TOD clock is in the
second half of a 142 year epoch the signed comparison is used.
This solves the epoch overflow issue as long as the machine is
booted at least once in an epoch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
As Eric said,
"what we need to do is move the variable vmcoreinfo_note out of the
kernel's .bss section. And modify the code to regenerate and keep this
information in something like the control page.
Definitely something like this needs a page all to itself, and ideally
far away from any other kernel data structures. I clearly was not
watching closely the data someone decided to keep this silly thing in
the kernel's .bss section."
This patch allocates extra pages for these vmcoreinfo_XXX variables, one
advantage is that it enhances some safety of vmcoreinfo, because
vmcoreinfo now is kept far away from other kernel data structures.
Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Suggested-by: Eric Biederman <ebiederm@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds a new system call to enable the use of guarded storage for
user space processes. The system call takes two arguments, a command
and pointer to a guarded storage control block:
s390_guarded_storage(int command, struct gs_cb *gs_cb);
The second argument is relevant only for the GS_SET_BC_CB command.
The commands in detail:
0 - GS_ENABLE
Enable the guarded storage facility for the current task. The
initial content of the guarded storage control block will be
all zeros. After the enablement the user space code can use
load-guarded-storage-controls instruction (LGSC) to load an
arbitrary control block. While a task is enabled the kernel
will save and restore the current content of the guarded
storage registers on context switch.
1 - GS_DISABLE
Disables the use of the guarded storage facility for the current
task. The kernel will cease to save and restore the content of
the guarded storage registers, the task specific content of
these registers is lost.
2 - GS_SET_BC_CB
Set a broadcast guarded storage control block. This is called
per thread and stores a specific guarded storage control block
in the task struct of the current task. This control block will
be used for the broadcast event GS_BROADCAST.
3 - GS_CLEAR_BC_CB
Clears the broadcast guarded storage control block. The guarded-
storage control block is removed from the task struct that was
established by GS_SET_BC_CB.
4 - GS_BROADCAST
Sends a broadcast to all thread siblings of the current task.
Every sibling that has established a broadcast guarded storage
control block will load this control block and will be enabled
for guarded storage. The broadcast guarded storage control block
is used up, a second broadcast without a refresh of the stored
control block with GS_SET_BC_CB will not have any effect.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
But first introduce a trivial header and update usage sites.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We are going to split <linux/sched/task.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.
Create a trivial placeholder <linux/sched/task.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.
Include the new header in the files that are going to need it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Both MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE are just an alias for
MACHINE_HAS_EDAT1. So simply use MACHINE_HAS_EDAT1 instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add hardware capability bits and feature tags to /proc/cpuinfo for
the "Vector Packed Decimal Facility" (tag "vxd" / hwcap bit 2^12)
and the "Vector Enhancements Facility 1" (tag "vxe" / hwcap bit 2^13).
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current implementation of setup_randomness uses the stack address
and therefore the pointer to the SYSIB 3.2.2 block as input data
address. Furthermore the length of the input data is the number of
virtual-machine description blocks which is typically one.
This means that typically a single zero byte is fed to
add_device_randomness.
Fix both of these and use the address of the first virtual machine
description block as input data address and also use the correct
length.
Fixes: bcfcbb6bae ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit bcfcbb6bae ("s390: add system information as device
randomness") intended to add some virtual machine specific information
to the randomness pool.
Unfortunately it uses the page allocator before it is ready to use. In
result the page allocator always returns NULL and the setup_randomness
function never adds anything to the randomness pool.
To fix this use memblock_alloc and memblock_free instead.
Fixes: bcfcbb6bae ("s390: add system information as device randomness")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
reserve_initrd currently calls memblock_reserve even if the to be
reserved size is zero. Even though the memblock core code can handle
this correctly, it still yields confusing debug messages if
memblock debugging is enabled.
Therefore make sure to not call memblock_reserve with a size of zero.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In order to be able to setup the cpu to node mappings early it is a
prerequisite to know which cpus are present. Therefore cpus must be
detected much earlier than before.
For sclp based cpu detection this requires yet another early sclp
call, since the system is not ready to use the regular interrupt and
memory allocations.
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When converting from bootmem to memblock I missed a subtle difference:
the memblock_alloc() functions return uninitialized memory, while the
memblock_virt_alloc() functions return zeroed memory.
This led to quite random early boot crashes.
Therefore use the correct version everywhere now.
Hopefully.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When re-adding crash kernel memory within setup_resources() the
function memblock_add() is used. That function will add memory by
default to node "MAX_NUMNODES" instead of node 0, like the memory
detection code does. In case of !NUMA this will trigger this warning
when the kernel generates the vmemmap:
Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead
WARNING: CPU: 0 PID: 0 at mm/memblock.c:1261 memblock_virt_alloc_internal+0x76/0x220
CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc6 #16
Call Trace:
[<0000000000d0b2e8>] memblock_virt_alloc_try_nid+0x88/0xc8
[<000000000083c8ea>] __earlyonly_bootmem_alloc.constprop.1+0x42/0x50
[<000000000083e7f4>] vmemmap_populate+0x1ac/0x1e0
[<0000000000840136>] sparse_mem_map_populate+0x46/0x68
[<0000000000d0c59c>] sparse_init+0x184/0x238
[<0000000000cf45f6>] paging_init+0xbe/0xf8
[<0000000000cf1d4a>] setup_arch+0xa02/0xae0
[<0000000000ced75a>] start_kernel+0x72/0x450
[<0000000000100020>] _stext+0x20/0x80
If NUMA is selected numa_setup_memory() will fix the node assignments
before the vmemmap will be populated; so this warning will only appear
if NUMA is not selected.
To fix this simply use memblock_add_node() and re-add crash kernel
memory explicitly to node 0.
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 4e042af463 ("s390/kexec: fix crash on resize of reserved memory")
Cc: <stable@vger.kernel.org> # v4.8+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Get rid of all remaining alloc_bootmem calls and use memblock_alloc
instead everywhere. This way we get rid of the inconsistent mixture
of alloc_bootmem and memblock_alloc usages.
Two of the alloc_bootmem_low calls within arch/s390/kernel/setup.c are
replaced with memblock_alloc calls that don't enforce that the
allocated memory is below 2GB. This restriction was never necessary.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In order to make the cma infrastructure usable we need to add a small
architecture backend which calls dma_contiguous_reserve.
Otherwise we would end up with the cma allocator enabled, but no pool
where memory can be allocated from.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This is the s390 variant of commit 15f4eae70d ("x86: Move
thread_info into task_struct").
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert s390 to use a field in the struct lowcore for the CPU
preemption count. It is a bit cheaper to access a lowcore field
compared to a thread_info variable and it removes the depencency
on a task related structure.
bloat-o-meter on the vmlinux image for the default configuration
(CONFIG_PREEMPT_NONE=y) reports a small reduction in text size:
add/remove: 0/0 grow/shrink: 18/578 up/down: 228/-5448 (-5220)
A larger improvement is achieved with the default configuration
but with CONFIG_PREEMPT=y and CONFIG_DEBUG_PREEMPT=n:
add/remove: 2/6 grow/shrink: 59/4477 up/down: 1618/-228762 (-227144)
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Commit 97f2645f35 ("tree-wide: replace config_enabled() with
IS_ENABLED()") mostly killed config_enabled(), but some new users have
appeared for v4.8-rc1. They are all used for a boolean option, so can
be replaced with IS_ENABLED() safely.
Link: http://lkml.kernel.org/r/1471970749-24867-1-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use the same code structure when determining preferred consoles for
Linux running as KVM guest as with Linux running in LPAR and z/VM
guest:
- Extend the console_mode variable to cover vt220 and hvc consoles
- Determine sensible console defaults in conmode_default()
- Remove KVM-special handling in set_preferred_console()
Ensure that the sclp line mode console is also registered when the
vt220 console was selected to not change existing behavior that
someone might be relying on.
As an externally visible change, KVM guest users can now select
the 3270 or 3215 console devices using the conmode= kernel parameter,
provided that support for the corresponding driver was compiled into
the kernel.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Jing Liu <liujbjl@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Small cleanup patch to use the shorter __section macro everywhere.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reducing the size of reserved memory for the crash kernel will result
in an immediate crash on s390. Reason for that is that we do not
create struct pages for memory that is reserved. If that memory is
freed any access to struct pages which correspond to this memory will
result in invalid memory accesses and a kernel panic.
Fix this by properly creating struct pages when the system gets
initialized. Change the code also to make use of set_memory_ro() and
set_memory_rw() so page tables will be split if required.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Show the dynamic and static cpu mhz of each cpu. Since these values
are per cpu this requires a fundamental extension of the format of
/proc/cpuinfo.
Historically we had only a single line per cpu and a summary at the
top of the file. This format is hardly extendible if we want to add
more per cpu information.
Therefore this patch adds per cpu blocks at the end of /proc/cpuinfo:
cpu : 0
cpu Mhz dynamic : 5504
cpu Mhz static : 5504
cpu : 1
cpu Mhz dynamic : 5504
cpu Mhz static : 5504
cpu : 2
cpu Mhz dynamic : 5504
cpu Mhz static : 5504
cpu : 3
cpu Mhz dynamic : 5504
cpu Mhz static : 5504
Right now each block contains only the dynamic and static cpu mhz,
but it can be easily extended like on every other architecture.
This extension is supposed to be compatible with the old format.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Analog to git commit 0c8c0f03e3
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
move the struct fpu to the end of the struct thread_struct,
set CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and add the
setup_task_size() function to calculate the correct size
fo the task struct.
For the performance_defconfig this increases the size of
struct task_struct from 7424 bytes to 7936 bytes (MACHINE_HAS_VX==1)
or 7552 bytes (MACHINE_HAS_VX==0). The dynamic allocation of the
struct fpu is removed. The slab cache uses an 8KB block for the
task struct in all cases, there is enough room for the struct fpu.
For MACHINE_HAS_VX==1 each task now needs 512 bytes less memory.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
- Add the CPU id for the new z13s machine
- Add a s390 specific XOR template for RAID-5 checksumming based on the
XC instruction. Remove all other alternatives, XC is always faster
- The merge of our four different stack tracers into a single one
- Tidy up the code related to page tables, several large inline
functions are now out-of-line. Bloat-o-meter reports ~11K text size
reduction
- A binary interface for the priviledged CLP instruction to retrieve
the hardware view of the installed PCI functions
- Improvements for the dasd format code
- Bug fixes and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits)
s390/pci: enforce fmb page boundary rule
s390: fix floating pointer register corruption (again)
s390/cpumf: add missing lpp magic initialization
s390: Fix misspellings in comments
s390/mm: split arch/s390/mm/pgtable.c
s390/mm: uninline pmdp_xxx functions from pgtable.h
s390/mm: uninline ptep_xxx functions from pgtable.h
s390/pci: add ioctl interface for CLP
s390: Use pr_warn instead of pr_warning
s390/dasd: remove casts to dasd_*_private
s390/dasd: Refactor dasd format functions
s390/dasd: Simplify code in format logic
s390/dasd: Improve dasd format code
s390/percpu: remove this_cpu_cmpxchg_double_4
s390/cpumf: Improve guest detection heuristics
s390/fault: merge report_user_fault implementations
s390/dis: use correct escape sequence for '%' character
s390/kvm: simplify set_guest_storage_key
s390/oprofile: add z13/z13s model numbers
s390: add z13s model number to z13 elf platform
...
Add the missing lpp magic initialization for cpu 0. Without this all
samples on cpu 0 do not have the most significant bit set in the
program parameter field, which we use to distinguish between guest and
host samples if the pid is also 0.
We did initialize the lpp magic in the absolute zero lowcore but
forgot that when switching to the allocated lowcore on cpu 0 only.
Reported-by: Shu Juan Zhang <zhshuj@cn.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org # v4.4+
Fixes: e22cf8ca6f ("s390/cpumf: rework program parameter setting to detect guest samples")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
"System RAM", "Kernel code", "Kernel data", and "Kernel bss".
Note that:
- IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
as (IORESOURCE_MEM|IORESOURCE_SYSRAM).
- Some archs do not set 'flags' for children nodes, such as
"Kernel code". This patch does not change 'flags' in this
case.
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-mips@linux-mips.org
Cc: linux-mm <linux-mm@kvack.org>
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is a leftover from the 31 bit area. For CONFIG_64BIT the usual
operation "y = x | PSW_ADDR_AMODE" is a nop. Therefore remove all
usages of PSW_ADDR_AMODE and make the code a bit less confusing.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Pull s390 updates from Martin Schwidefsky:
"Among the traditional bug fixes and cleanups are some improvements:
- A tool to generated the facility lists, generating the bit fields
by hand has been a source of bugs in the past
- The spinlock loop is reordered to avoid bursts of hypervisor calls
- Add support for the open-for-business interface to the service
element
- The get_cpu call is added to the vdso
- A set of tracepoints is defined for the common I/O layer
- The deprecated sclp_cpi module is removed
- Update default configuration"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (56 commits)
s390/sclp: fix possible control register corruption
s390: fix normalization bug in exception table sorting
s390/configs: update default configurations
s390/vdso: optimize getcpu system call
s390: drop smp_mb in vdso_init
s390: rename struct _lowcore to struct lowcore
s390/mem_detect: use unsigned longs
s390/ptrace: get rid of long longs in psw_bits
s390/sysinfo: add missing SYSIB 1.2.2 multithreading fields
s390: get rid of CONFIG_SCHED_MC and CONFIG_SCHED_BOOK
s390/Kconfig: remove pointless 64 bit dependencies
s390/dasd: fix failfast for disconnected devices
s390/con3270: testing return kzalloc retval
s390/hmcdrv: constify hmcdrv_ftp_ops structs
s390/cio: add NULL test
s390/cio: Change I/O instructions from inline to normal functions
s390/cio: Introduce common I/O layer tracepoints
s390/cio: Consolidate inline assemblies and related data definitions
s390/cio: Fix incorrect xsch opcode specification
s390/cio: Remove unused inline assemblies
...
Finally get rid of the leading underscore. I tried this already two or
three years ago, however Michael Holzheu objected since this would
break the crash utility (again).
However Michael integrated support for the new name into the crash
utility back then, so it doesn't break if the name will be changed
now. So finally get rid of the ever confusing leading underscore.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch exposes the SIE capability (aka virtualization support) via
/proc/cpuinfo -> "features" as "sie".
As we don't want to expose this hwcap via elf, let's add a second,
"internal"/non-elf capability list. The content is simply concatenated
to the existing features when printing /proc/cpuinfo.
We also add the defines to elf.h to keep the hwcap stuff at a common
place.
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
To collect the CPU registers of the crashed system allocated a single
page with memblock_alloc_base and use it as a copy buffer. Replace the
stop-and-store-status sigp with a store-status-at-address sigp in
smp_save_dump_cpus() and smp_store_status(). In both cases the target
CPU is already stopped and store-status-at-address avoids the detour
via the absolute zero page.
For kexec simplify s390_reset_system and call store_status() before
the prefix register of the boot CPU has been set to zero. Use STPX
to store the prefix register and remove dump_prefix_page.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 architecture can store the CPU registers of the crashed system
after the kdump kernel has been started and this is the preferred way.
Remove the remaining code fragments that deal with storing CPU registers
while the crashed system is still active.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove dead code, since this could only happen on a 31 bit machine
where the kernel wouldn't IPL.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The novx parameter disables the vector facility but the HWCAP_S390_VXRS
bit in the ELf hardware capabilies is always set if the machine has
the vector facility. If the user space program uses the "vx" string
in the features field of /proc/cpuinfo to utilize vector instruction
it will crash if the novx kernel paramter is set.
Convert setup_hwcaps to an arch_initcall and use MACHINE_HAS_VX to
decide if the HWCAPS_S390_VXRS bit needs to be set.
Cc: stable@vger.kernel.org # 3.18+
Reported-by: Ulrich Weigand <uweigand@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Enable core NUMA support for s390 and add one simple default mode "plain"
that creates one single NUMA node.
This patch contains several changes from Michael Holzheu.
Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>