This adds a new system call to enable the use of guarded storage for
user space processes. The system call takes two arguments, a command
and pointer to a guarded storage control block:
s390_guarded_storage(int command, struct gs_cb *gs_cb);
The second argument is relevant only for the GS_SET_BC_CB command.
The commands in detail:
0 - GS_ENABLE
Enable the guarded storage facility for the current task. The
initial content of the guarded storage control block will be
all zeros. After the enablement the user space code can use
load-guarded-storage-controls instruction (LGSC) to load an
arbitrary control block. While a task is enabled the kernel
will save and restore the current content of the guarded
storage registers on context switch.
1 - GS_DISABLE
Disables the use of the guarded storage facility for the current
task. The kernel will cease to save and restore the content of
the guarded storage registers, the task specific content of
these registers is lost.
2 - GS_SET_BC_CB
Set a broadcast guarded storage control block. This is called
per thread and stores a specific guarded storage control block
in the task struct of the current task. This control block will
be used for the broadcast event GS_BROADCAST.
3 - GS_CLEAR_BC_CB
Clears the broadcast guarded storage control block. The guarded-
storage control block is removed from the task struct that was
established by GS_SET_BC_CB.
4 - GS_BROADCAST
Sends a broadcast to all thread siblings of the current task.
Every sibling that has established a broadcast guarded storage
control block will load this control block and will be enabled
for guarded storage. The broadcast guarded storage control block
is used up, a second broadcast without a refresh of the stored
control block with GS_SET_BC_CB will not have any effect.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split the API and FPU type definitions into separate header files
similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e54b).
The new header files and their meaning are:
asm/fpu/types.h:
FPU related data types, needed for 'struct thread_struct' and
'struct task_struct'.
asm/fpu/api.h:
FPU related 'public' functions for other subsystems and device
drivers.
asm/fpu/internal.h:
FPU internal functions mainly used to convert
FPU register contents in signal handling.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
All calls to save_fpu_regs() specify the fpu structure of the current task
pointer as parameter. The task pointer of the current task can also be
retrieved from the CPU lowcore directly. Remove the parameter definition,
load the __LC_CURRENT task pointer from the CPU lowcore, and rebase the FPU
structure onto the task structure. Apply the same approach for the
load_fpu_regs() function.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve the save and restore behavior of FPU register contents to use the
vector extension within the kernel.
The kernel does not use floating-point or vector registers and, therefore,
saving and restoring the FPU register contents are performed for handling
signals or switching processes only. To prepare for using vector
instructions and vector registers within the kernel, enhance the save
behavior and implement a lazy restore at return to user space from a
system call or interrupt.
To implement the lazy restore, the save_fpu_regs() sets a CPU information
flag, CIF_FPU, to indicate that the FPU registers must be restored.
Saving and setting CIF_FPU is performed in an atomic fashion to be
interrupt-safe. When the kernel wants to use the vector extension or
wants to change the FPU register state for a task during signal handling,
the save_fpu_regs() must be called first. The CIF_FPU flag is also set at
process switch. At return to user space, the FPU state is restored. In
particular, the FPU state includes the floating-point or vector register
contents, as well as, vector-enablement and floating-point control. The
FPU state restore and clearing CIF_FPU is also performed in an atomic
fashion.
For KVM, the restore of the FPU register state is performed when restoring
the general-purpose guest registers before the SIE instructions is started.
Because the path towards the SIE instruction is interruptible, the CIF_FPU
flag must be checked again right before going into SIE. If set, the guest
registers must be reloaded again by re-entering the outer SIE loop. This
is the same behavior as if the SIE critical section is interrupted.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a new structure to manage FP and VX registers. Refactor the
save and restore of floating point and vector registers with a set
of helper functions in fpu-internal.h.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the test_fp_ctl() to test the floating-point control word
for validity and use restore_fp_ctl() to set it in load_sigregs.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.
The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.
Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With this patch for kdump the s390 vector registers are stored into the
prepared save areas in the old kernel and into the REGSET_VX_LOW and
REGSET_VX_HIGH ELF notes for /proc/vmcore in the new kernel.
The NT_S390_VXRS_LOW note contains the lower halves of the first 16 vector
registers 0-15. The higher halves are stored in the floating point register
ELF note. The NT_S390_VXRS_HIGH contains the full vector registers 16-31.
The kernel provides a save area for storing vector register in case of
machine checks. A pointer to this save are is stored in the CPU lowcore
at offset 0x11b0. This save area is also used to save the registers for
kdump. In case of a dumped crashed kdump those areas are used to extract
the registers of the production system.
The vector registers for remote CPUs are stored using the "store additional
status at address" SIGP. For the dump CPU the vector registers are stored
with the VSTM instruction.
With this patch also zfcpdump stores the vector registers.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The vector extension introduces 32 128-bit vector registers and a set of
instruction to operate on the vector registers.
The kernel can control the use of vector registers for the problem state
program with a bit in control register 0. Once enabled for a process the
kernel needs to retain the content of the vector registers on context
switch. The signal frame is extended to include the vector registers.
Two new register sets NT_S390_VXRS_LOW and NT_S390_VXRS_HIGH are added
to the regset interface for the debugger and core dumps.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The fixup of the inline assembly to restore the floating-point-control
register needs to check for instruction address *after* the lfcp
instruction as the specification and data exceptions are suppresssing.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch fixes a problem introduced with git commit beef560b4c
"s390/uaccess: simplify control register updates".
The switch_mm function is not called if the next process is a kernel
thread without an attached mm or is a nop if the mm does not change.
But CR1 still needs to be loaded with the kernel ASCE in case the
code returns to a uaccess function that uses the secondary space mode.
In addition move the set_fs call from finish_arch_switch to
finish_arch_post_lock_switch and then remove finish_arch_switch.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.
However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.
Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.
So the solution is a different approach where we change address spaces:
User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.
The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.
So we end up with:
User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce
Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
-> the kernel asce is loaded when a uaccess requires primary or
secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce
In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
e.g. the mvcp instruction to access user space. However the kernel
will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
instructions come from primary address space and data from secondary
space
In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.
A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.
In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.
The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.
The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The FPC_VALID_MASK has been used to check the validity of the value
to be loaded into the floating-point-control register. With the
introduction of the floating-point extension facility and the
decimal-floating-point additional bits have been defined which need
to be checked in a non straight forward way. So far these bits have
been ignored which can cause an incorrect results for decimal-
floating-point operations, e.g. an incorrect rounding mode to be
set after signal return.
The static check with the FPC_VALID_MASK is replaced with a trial
load of the floating-point-control value, see test_fp_ctl.
In addition an information leak with the padding word between the
floating-point-control word and the floating-point registers in
the s390_fp_regs is fixed.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix broken contraints for both save_access_regs() and restore_access_regs().
The constraints are incorrect since they tell the compiler that the inline
assemblies only access the first element of an array of 16 elements.
Therefore the compiler could generate incorrect code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The patch implements a s390 specific ptrace request
PTRACE_TE_ABORT_RAND to modify the randomness of spontaneous
aborts of memory transactions of the transaction execution
facility. The data argument of the ptrace request is used to
specify the levels of randomness, 0 for normal operation, 1 to
abort every transaction at a random instruction, and 2 to abort
a random transaction at a random instruction. The default is 0
for normal operation.
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
"The main new feature is machine support for System zEC12 including
transactional memory, runtime instrumentation, support for scm block
devices via eadm subchannels, and support for CEX4 crypto cards.
In addition there are some nice improvements: bpf jit compiler, arch
backend for cmpxchg_double, relative exception table entries, dasd
partition detection independent from the dasd driver ioctls, and cpu
cache information in /proc/cpuinfo and /sys/device/cpu.
And last but not least a series of cleanup patches from Heiko."
Fix up trivial add-add conflict in arch/s390/Kconfig due to commit
b952741c80 ("cputime: Generalize CONFIG_VIRT_CPU_ACCOUNTING")
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (76 commits)
s390: update defconfig
s390/jump label,nss: let shared kernel support depend on !JUMP_LABEL
s390/disassembler: fix decoding of risblg instruction
s390/bpf,jit: add support for BPF_S_ANC_ALU_XOR_X instruction
s390/traps: move call to print_modules() out of show_regs()
s390/mm: mark free_initrd_mem() as __init
s390/dasd: check count address during online setting
drivers/s390/char/monreader.c: fix error return code
s390/cmpxchg,percpu: implement cmpxchg_double()
s390/percpu: implement this_cpu_add_return()
s390/percpu: implement this_cpu_xchg()
s390/kexec: remove CONFIG_KEXEC
s390/irq: use designated initializers for irq class array
s390: add uninitialized_var() to suppress false positive compiler warnings
s390/crashdump: move fill_cpu_elf_notes() prototype to header file
s390/process: add missing header include
s390/ptrace: add missing ifdef
s390/ipl,decrompressor: disable branch profiling
s390/perf_events: compile only for CONFIG_64BIT
s390/tape: remove even more tape block leftovers
...
Allow user-space threads to use runtime instrumentation (RI). To enable RI
for a thread there is a new s390 specific system call, sys_s390_runtime_instr,
that takes as parameter a realtime signal number. If the RI facility is
available the system call sets up a control block for the calling thread with
the appropriate permissions for the thread to modify the control block.
The user-space thread can then use the store and modify RI instructions to
alter the control block and start/stop the instrumentation via RION/RIOFF.
If the user specified program buffer runs full RI triggers an external
interrupt. The external interrupt is translated to a real-time signal that
is delivered to the thread that enabled RI on that CPU. The number of
the real-time signal is the number specified in the RI system call. So,
user-space can select any available real-time signal number in case the
application itself uses real-time signals for other purposes.
The kernel saves the RI control blocks on task switch only if the running
thread was enabled for RI. Therefore, the performance impact on task switch
should be negligible if RI is not used.
RI is only enabled for user-space mode and is disabled for the supervisor
state.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The archs that implement virtual cputime accounting all
flush the cputime of a task when it gets descheduled
and sometimes set up some ground initialization for the
next task to account its cputime.
These archs all put their own hooks in their context
switch callbacks and handle the off-case themselves.
Consolidate this by creating a new account_switch_vtime()
callback called in generic code right after a context switch
and that these archs must implement to flush the prev task
cputime and initialize the next task cputime related state.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>