Commit Graph

1578 Commits

Author SHA1 Message Date
David Hildenbrand
db0758b297 KVM: s390: step VCPU cpu timer during kvm_run ioctl
Architecturally we should only provide steal time if we are scheduled
away, and not if the host interprets a guest exit. We have to step
the guest CPU timer in these cases.

In the first shot, we will step the VCPU timer only during the kvm_run
ioctl. Therefore all time spent e.g. in interception handlers or on irq
delivery will be accounted for that VCPU.

We have to take care of a few special cases:
- Other VCPUs can test for pending irqs. We can only report a consistent
  value for the VCPU thread itself when adding the delta.
- We have to take care of STP sync, therefore we have to extend
  kvm_clock_sync() and disable preemption accordingly
- During any call to disable/enable/start/stop we could get premeempted
  and therefore get start/stop calls. Therefore we have to make sure we
  don't get into an inconsistent state.

Whenever a VCPU is scheduled out, sleeping, in user space or just about
to enter the SIE, the guest cpu timer isn't stepped.

Please note that all primitives are prepared to be called from both
environments (cpu timer accounting enabled or not), although not completely
used in this patch yet (e.g. kvm_s390_set_cpu_timer() will never be called
while cpu timer accounting is enabled).

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:52 +01:00
Alexander Yarygin
dce382b6c7 KVM: s390: Add diag "watchdog functions" to trace event decoding
DIAG 0x288 may occur now. Let's add its code to the diag table in
sie.h.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-03-08 13:57:51 +01:00
Christoph Hellwig
bc4b024a8b PCI: Move pci_dma_* helpers to common code
For a long time all architectures implement the pci_dma_* functions using
the generic DMA API, and they all use the same header to do so.

Move this header, pci-dma-compat.h, to include/linux and include it from
the generic pci.h instead of having each arch duplicate this include.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2016-03-07 10:40:02 -06:00
Martin Schwidefsky
988b86e69d s390/pci: add ioctl interface for CLP
Provide a user space interface to issue call logical-processor instructions.
Only selected CLP commands are allowed, enough to get the full overview of
the installed PCI functions.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-07 16:54:32 +01:00
Jiri Kosina
335e073faa klp: remove CONFIG_LIVEPATCH dependency from klp headers
There is no need for livepatch.h (generic and arch-specific) to depend
on CONFIG_LIVEPATCH. Remove that superfluous dependency.

Reported-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:22:10 +01:00
Miroslav Benes
b24b78a113 klp: remove superfluous errors in asm/livepatch.h
There is an #error in asm/livepatch.h for both x86 and s390 in
!CONFIG_LIVEPATCH cases. It does not make much sense as pointed out by
Michael Ellerman. One can happily include asm/livepatch.h with
CONFIG_LIVEPATCH. Remove it as useless.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-03-06 22:19:26 +01:00
Christian Borntraeger
e82becfc18 s390/dma: Allow per device dma ops
As virtio-ccw will have dma ops, we can no longer default to the
zPCI ones. Make use of dev_archdata to keep the dma_ops per device.
The pci devices now use that to override the default, and the
default is changed to use the noop ops for everything that does not
specify a device specific one.
To compile without PCI support we will enable HAS_DMA all the time,
via the default config in lib/Kconfig.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-03-02 17:01:56 +02:00
Heiko Carstens
f369b98ee3 s390/percpu: remove this_cpu_cmpxchg_double_4
git commit 26f15caaf9 ("s390/cmpxchg: simplify cmpxchg_double")
removed support for cmpxchg_double for two consecutive four byte
values, for which it would generate a cds instruction.

However I forgot to remove the corresponding define in our percpu
header file, which means that this_cpu_cmpxchg_double would now
incorrectly generate a cdsg instruction if being used on a double four
byte location. Therefore remove the percpu define as well.

There is currently no user and therefore no bug fixed with
this. Obviously any such user could and should simply use cmpxchg.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:30 -06:00
Heiko Carstens
5d7eccecf8 s390/fault: merge report_user_fault implementations
We have two close to identical report_user_fault functions.
Add a parameter to one and get rid of the other one in order
to reduce code duplication.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-03-02 06:44:27 -06:00
Ingo Molnar
6aa447bcbb Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-29 09:42:07 +01:00
Tom Herbert
a87cb3e48e net: Facility to report route quality of connected sockets
This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
purpose is to allow an application to give feedback to the kernel about
the quality of the network path for a connected socket. The value
argument indicates the type of quality report. For this initial patch
the only supported advice is a value of 1 which indicates "bad path,
please reroute"-- the action taken by the kernel is to call
dst_negative_advice which will attempt to choose a different ECMP route,
reset the TX hash for flow label and UDP source port in encapsulation,
etc.

This facility should be useful for connected UDP sockets where only the
application can provide any feedback about path quality. It could also
be useful for TCP applications that have additional knowledge about the
path outside of the normal TCP control loop.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 22:01:22 -05:00
Marcelo Tosatti
8577370fb0 KVM: Use simple waitqueue for vcpu->wq
The problem:

On -rt, an emulated LAPIC timer instances has the following path:

1) hard interrupt
2) ksoftirqd is scheduled
3) ksoftirqd wakes up vcpu thread
4) vcpu thread is scheduled

This extra context switch introduces unnecessary latency in the
LAPIC path for a KVM guest.

The solution:

Allow waking up vcpu thread from hardirq context,
thus avoiding the need for ksoftirqd to be scheduled.

Normal waitqueues make use of spinlocks, which on -RT
are sleepable locks. Therefore, waking up a waitqueue
waiter involves locking a sleeping lock, which
is not allowed from hard interrupt context.

cyclictest command line:

This patch reduces the average latency in my tests from 14us to 11us.

Daniel writes:
Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency
benchmark on mainline. The test was run 1000 times on
tip/sched/core 4.4.0-rc8-01134-g0905f04:

  ./x86-run x86/tscdeadline_latency.flat -cpu host

with idle=poll.

The test seems not to deliver really stable numbers though most of
them are smaller. Paolo write:

"Anything above ~10000 cycles means that the host went to C1 or
lower---the number means more or less nothing in that case.

The mean shows an improvement indeed."

Before:

               min             max         mean           std
count  1000.000000     1000.000000  1000.000000   1000.000000
mean   5162.596000  2019270.084000  5824.491541  20681.645558
std      75.431231   622607.723969    89.575700   6492.272062
min    4466.000000    23928.000000  5537.926500    585.864966
25%    5163.000000  1613252.750000  5790.132275  16683.745433
50%    5175.000000  2281919.000000  5834.654000  23151.990026
75%    5190.000000  2382865.750000  5861.412950  24148.206168
max    5228.000000  4175158.000000  6254.827300  46481.048691

After
               min            max         mean           std
count  1000.000000     1000.00000  1000.000000   1000.000000
mean   5143.511000  2076886.10300  5813.312474  21207.357565
std      77.668322   610413.09583    86.541500   6331.915127
min    4427.000000    25103.00000  5529.756600    559.187707
25%    5148.000000  1691272.75000  5784.889825  17473.518244
50%    5160.000000  2308328.50000  5832.025000  23464.837068
75%    5172.000000  2393037.75000  5853.177675  24223.969976
max    5222.000000  3922458.00000  6186.720500  42520.379830

[Patch was originaly based on the swait implementation found in the -rt
 tree. Daniel ported it to mainline's version and gathered the
 benchmark numbers for tscdeadline_latency test.]

Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-rt-users@vger.kernel.org
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-02-25 11:27:16 +01:00
Martin Schwidefsky
13c6a790d7 s390/mm: correct comment about segment table entries
The comment describing the bit encoding for segment table entries
is incorrect in regard to the read and write bits. The segment
read bit is 0x0002 and write is 0x0001, not the other way around.

Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:20 +01:00
Heiko Carstens
758d39ebd3 s390/dumpstack: merge all four stack tracers
We have four different stack tracers of which three had bugs. So it's
time to merge them to a single stack tracer which allows to specify a
call back function which will be called for each step.

This patch changes behavior a bit:

- the "nosched" and "in_sched_functions" check within
  save_stack_trace_tsk did work only for the last stack frame within a
  context. Now it considers the check for each stack frame like it
  should.

- both the oprofile variant and the perf_events variant did save a
  return address twice if a zero back chain was detected, which
  indicates an interrupt frame. The new dump_trace function will call
  the oprofile and perf_events backends with the psw address that is
  contained within the corresponding pt_regs structure instead.

- the original show_trace and save_context_stack functions did already
  use the psw address of the pt_regs structure if a zero back chain
  was detected. However now we ignore the psw address if it is a user
  space address. After all we trace the kernel stack and not the user
  space stack. This way we also get rid of the garbage user space
  address in case of warnings and / or panic call traces.

So this should make life easier since now there is only one stack
tracer left which we can break.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:20 +01:00
Martin Schwidefsky
3c2c126a67 s390/mm: remove unnecessary indirection with pgste_update_all
The first parameter of pgste_update_all is a pointer to a pte.
Simplify the code by passing the pte value.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:19 +01:00
Heiko Carstens
76737ce17a s390: add current_stack_pointer() helper function
Implement current_stack_pointer() helper function and use it
everywhere, instead of having several different inline assembly
variants.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:18 +01:00
Martin Schwidefsky
2cfc5f9ce7 s390/xor: optimized xor routing using the XC instruction
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:17 +01:00
Sebastian Ott
9a99649f2a s390/pci: remove pdev pointer from arch data
For each PCI function we need to maintain arch specific data in
struct zpci_dev which also contains a pointer to struct pci_dev.

When a function is registered or deregistered (which is triggered by PCI
common code) we need to adjust that pointer which could interfere with
the machine check handler (triggered by FW) using zpci_dev->pdev.

Since multiple instances of the same pdev could exist at a time this can't
be solved with locking.

Fix that by ditching the pdev pointer and use a bus walk to reach
struct pci_dev (only one instance of a pdev can be registered at the bus
at a time).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-23 08:56:16 +01:00
Martin Schwidefsky
1b17cb796f s390/fpu: signals vs. floating point control register
git commit 904818e2f2
"s390/kernel: introduce fpu-internal.h with fpu helper functions"
introduced the fpregs_store / fp_regs_load helper. These function
fail to save and restore the floating pointer control registers.

The effect is that the FPC is not correctly handled on signal
delivery and signal return.

Cc: stable@vger.kernel.org # 4.4
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-02-22 09:29:35 +01:00
Linus Torvalds
705d43dbe1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:

 - regression (from 4.4) fix for ordering issue, introduced by an
   earlier ftrace change, that broke live patching of modules.

   The fix replaces the ftrace module notifier by direct call in order
   to make the ordering guaranteed and well-defined.  The patch, from
   Jessica Yu, has been acked both by Steven and Rusty

 - error message fix from Miroslav Benes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  ftrace/module: remove ftrace module notifier
  livepatch: change the error message in asm/livepatch.h header files
2016-02-18 16:34:15 -08:00
Dave Hansen
d61172b4b6 mm/core, x86/mm/pkeys: Differentiate instruction fetches
As discussed earlier, we attempt to enforce protection keys in
software.

However, the code checks all faults to ensure that they are not
violating protection key permissions.  It was assumed that all
faults are either write faults where we check PKRU[key].WD (write
disable) or read faults where we check the AD (access disable)
bit.

But, there is a third category of faults for protection keys:
instruction faults.  Instruction faults never run afoul of
protection keys because they do not affect instruction fetches.

So, plumb the PF_INSTR bit down in to the
arch_vma_access_permitted() function where we do the protection
key checks.

We also add a new FAULT_FLAG_INSTRUCTION.  This is because
handle_mm_fault() is not passed the architecture-specific
error_code where we keep PF_INSTR, so we need to encode the
instruction fetch information in to the arch-generic fault
flags.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210224.96928009@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:29 +01:00
Dave Hansen
1b2ee1266e mm/core: Do not enforce PKEY permissions on remote mm access
We try to enforce protection keys in software the same way that we
do in hardware.  (See long example below).

But, we only want to do this when accessing our *own* process's
memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
tried to PTRACE_POKE a target process which just happened to have
some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
debugger access to that memory.  PKRU is fundamentally a
thread-local structure and we do not want to enforce it on access
to _another_ thread's data.

This gets especially tricky when we have workqueues or other
delayed-work mechanisms that might run in a random process's context.
We can check that we only enforce pkeys when operating on our *own* mm,
but delayed work gets performed when a random user context is active.
We might end up with a situation where a delayed-work gup fails when
running randomly under its "own" task but succeeds when running under
another process.  We want to avoid that.

To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
walking an mm which is not guranteed to be the same as
current->mm and should not be subject to protection key
enforcement.

Thanks to Jerome Glisse for pointing out this scenario.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Eric B Munson <emunson@akamai.com>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Cc: iommu@lists.linux-foundation.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 19:46:28 +01:00
Dave Hansen
33a709b25a mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
Today, for normal faults and page table walks, we check the VMA
and/or PTE to ensure that it is compatible with the action.  For
instance, if we get a write fault on a non-writeable VMA, we
SIGSEGV.

We try to do the same thing for protection keys.  Basically, we
try to make sure that if a user does this:

	mprotect(ptr, size, PROT_NONE);
	*ptr = foo;

they see the same effects with protection keys when they do this:

	mprotect(ptr, size, PROT_READ|PROT_WRITE);
	set_pkey(ptr, size, 4);
	wrpkru(0xffffff3f); // access disable pkey 4
	*ptr = foo;

The state to do that checking is in the VMA, but we also
sometimes have to do it on the page tables only, like when doing
a get_user_pages_fast() where we have no VMA.

We add two functions and expose them to generic code:

	arch_pte_access_permitted(pte_flags, write)
	arch_vma_access_permitted(vma, write)

These are, of course, backed up in x86 arch code with checks
against the PTE or VMA's protection key.

But, there are also cases where we do not want to respect
protection keys.  When we ptrace(), for instance, we do not want
to apply the tracer's PKRU permissions to the PTEs from the
process being traced.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-s390@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20160212210219.14D5D715@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:32:44 +01:00
David Hildenbrand
efa48163b8 KVM: s390: remove old fragment of vector registers
Since commit 9977e886cb ("s390/kernel: lazy restore fpu registers"),
vregs in struct sie_page is unsed. We can safely remove the field and
the definition.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:54 +01:00
David Hildenbrand
6fd8e67dd8 KVM: s390: sync of fp registers via kvm_run
As we already store the floating point registers in the vector save area
in floating point register format when we don't have MACHINE_HAS_VX, we can
directly expose them to user space using a new sync flag.

The floating point registers will be valid when KVM_SYNC_FPRS is set. The
fpc will also be valid when KVM_SYNC_FPRS is set.

Either KVM_SYNC_FPRS or KVM_SYNC_VRS will be enabled, never both.

Let's also change two positions where we access vrs, making the code easier
to read and one comment superfluous.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-02-10 13:12:49 +01:00
Linus Torvalds
6b292a8abd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "An optimization for irq-restore, the SSM instruction is quite a bit
  slower than an if-statement and a STOSM.

  The copy_file_range system all is added.

  Cleanup for PCI and CIO.

  And a couple of bug fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/cio: update measurement characteristics
  s390/cio: ensure consistent measurement state
  s390/cio: fix measurement characteristics memleak
  s390/zcrypt: Fix cryptographic device id in kernel messages
  s390/pci: remove iomap sanity checks
  s390/pci: set error state for unusable functions
  s390/pci: fix bar check
  s390/pci: resize iomap
  s390/pci: improve ZPCI_* macros
  s390/pci: provide ZPCI_ADDR macro
  s390/pci: adjust IOMAP_MAX_ENTRIES
  s390/numa: move numa_init_late() from device to arch_initcall
  s390: remove all usages of PSW_ADDR_INSN
  s390: remove all usages of PSW_ADDR_AMODE
  s390: wire up copy_file_range syscall
  s390: remove superfluous memblock_alloc() return value checks
  s390/numa: allocate memory with correct alignment
  s390/irqflags: optimize irq restore
  s390/mm: use TASK_MAX_SIZE where applicable
2016-01-29 16:05:18 -08:00
Linus Torvalds
f0ce3ff42e s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on s390.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWp5FIAAoJEL/70l94x66DozoIAKWFLnwi6PQ7Sm+Tvr0Sl/mp
 yGDM+PuVyG4C6bPS66xd4XkXEFIwfpIJQzq1Zt7Xg0m27t50DRtmInLJU4ql7MFD
 vYn3h6lOudtUR0i5kPYSy6SMehFjx8wS0GX+O+iX9tMlYI0vGWEJ+7C06FmnqpGP
 RUUjnVvljAMih8wsAHLOjH8rwZHnO+EfHNi+V7Q+eFBHJnn2R06IqK8z5DmTBeTQ
 ZecNdZslX5qZvMNulTo7nC2P6VShkdRuMJkLEFeSaTlCqx5WcnzgwEiMMavUeYI8
 aIWuE51v0RMxRhsvRXUQOeRpRgU0AF2OYJEKXBRbO3P6IRikFohJq4QH67zN2HA=
 =54tv
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on
  s390"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM doc: Fix KVM_SMI chapter number
  KVM: s390: fix memory overwrites when vx is disabled
  KVM: s390: Enable the KVM-VFIO device
  KVM: s390: fix guest fprs memory leak
  KVM: PPC: Fix ONE_REG AltiVec support
  KVM: PPC: Increase memslots to 512
  KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s'
  KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8
  KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
2016-01-27 10:50:42 -08:00
David Hildenbrand
9abc2a08a7 KVM: s390: fix memory overwrites when vx is disabled
The kernel now always uses vector registers when available, however KVM
has special logic if support is really enabled for a guest. If support
is disabled, guest_fpregs.fregs will only contain memory for the fpu.
The kernel, however, will store vector registers into that area,
resulting in crazy memory overwrites.

Simply extending that area is not enough, because the format of the
registers also changes. We would have to do additional conversions, making
the code even more complex. Therefore let's directly use one place for
the vector/fpu registers + fpc (in kvm_run). We just have to convert the
data properly when accessing it. This makes current code much easier.

Please note that vector/fpu registers are now always stored to
vcpu->run->s.regs.vrs. Although this data is visible to QEMU and
used for migration, we only guarantee valid values to user space  when
KVM_SYNC_VRS is set. As that is only the case when we have vector
register support, we are on the safe side.

Fixes: b5510d9b68 ("s390/fpu: always enable the vector facility if it is available")
Cc: stable@vger.kernel.org # v4.4 d9a3a09af5 s390/kvm: remove dependency on struct save_area definition
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[adopt to d9a3a09af5]
2016-01-26 15:40:21 +01:00
Sebastian Ott
bf19c94d5c s390/pci: improve ZPCI_* macros
Most of the constants defined in pci_io.h depend on each other
and thus can be calculated.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:49 +01:00
Sebastian Ott
9e00caaea1 s390/pci: provide ZPCI_ADDR macro
Provide and use a ZPCI_ADDR macro as the complement of ZPCI_IDX
to get rid of some constants in the code.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:41 +01:00
Sebastian Ott
c2e1fcf3ec s390/pci: adjust IOMAP_MAX_ENTRIES
ZPCI_IOMAP_MAX_ENTRIES is off by one. Let's adjust this
for the sake of correctness.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-26 12:45:33 +01:00
Christoph Hellwig
e1c7e32453 dma-mapping: always provide the dma_map_ops based implementation
Move the generic implementation to <linux/dma-mapping.h> now that all
architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
that everyone supports them.

[valentinrothberg@gmail.com: remove leftovers in Kconfig]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Helge Deller <deller@gmx.de>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-20 17:09:18 -08:00
Heiko Carstens
9cb1ccecb6 s390: remove all usages of PSW_ADDR_INSN
Yet another leftover from the 31 bit era. The usual operation
"y = x & PSW_ADDR_INSN" with the PSW_ADDR_INSN mask is a nop for
CONFIG_64BIT.

Therefore remove all usages and hope the code is a bit less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2016-01-19 12:14:03 +01:00
Heiko Carstens
fecc868a66 s390: remove all usages of PSW_ADDR_AMODE
This is a leftover from the 31 bit area. For CONFIG_64BIT the usual
operation "y = x | PSW_ADDR_AMODE" is a nop. Therefore remove all
usages of PSW_ADDR_AMODE and make the code a bit less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2016-01-19 12:14:02 +01:00
Heiko Carstens
696aafd369 s390: wire up copy_file_range syscall
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-19 12:14:02 +01:00
Christian Borntraeger
204ee2c564 s390/irqflags: optimize irq restore
The ssm instruction takes longer that stnsm/stosm as it is often
used to modify DAT and PER. We know that irqsave/irqrestore only
deals with external and I/O interrupts and we know that irqrestore
can transition only from disabled->disabled or disabled->enabled,
so we can use the faster stosm.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-19 12:14:01 +01:00
Linus Torvalds
a200dcb346 virtio: barrier rework+fixes
This adds a new kind of barrier, and reworks virtio and xen
 to use it.
 Plus some fixes here and there.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWlU2kAAoJECgfDbjSjVRpZ6IH/Ra19ecG8sCQo9zskr4zo22Z
 DZXC3u0sJDBYjjBAiw3IY1FKh7wx2Fr1RhUOj1bteBgcFCMCV1zInP5ITiCyzd1H
 YYh1w9C2tZaj2T4t9L4hIrAdtIF8fGS+oI2IojXPjOuDLEt6pfFBEjHp/sfl3UJq
 ZmZvw4OXviSNej7jBw8Xni3Uv18yfmLGXvMdkvMSPC1/XL29voGDqTVwhqJwxLVz
 k/ZLcKFOzIs9N7Nja0Jl1EiZtC2Y9cpItqweicNAzszlpkSL44vQxmCSefB+WyQ4
 gt0O3+AxYkLfrxzCBhUA4IpRex3/XPW1b+1e/V1XjfR2n/FlyLe+AIa8uPJElFc=
 =ukaV
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio barrier rework+fixes from Michael Tsirkin:
 "This adds a new kind of barrier, and reworks virtio and xen to use it.

  Plus some fixes here and there"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits)
  checkpatch: add virt barriers
  checkpatch: check for __smp outside barrier.h
  checkpatch.pl: add missing memory barriers
  virtio: make find_vqs() checkpatch.pl-friendly
  virtio_balloon: fix race between migration and ballooning
  virtio_balloon: fix race by fill and leak
  s390: more efficient smp barriers
  s390: use generic memory barriers
  xen/events: use virt_xxx barriers
  xen/io: use virt_xxx barriers
  xenbus: use virt_xxx barriers
  virtio_ring: use virt_store_mb
  sh: move xchg_cmpxchg to a header by itself
  sh: support 1 and 2 byte xchg
  virtio_ring: update weak barriers to use virt_xxx
  Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb"
  asm-generic: implement virt_xxx memory barriers
  x86: define __smp_xxx
  xtensa: define __smp_xxx
  tile: define __smp_xxx
  ...
2016-01-18 16:44:24 -08:00
Miroslav Benes
383bf44d1a livepatch: change the error message in asm/livepatch.h header files
If anyone includes asm/livepatch.h when CONFIG_LIVEPATCH=n the build
fails with the existing error message. Change it to something saner.

[jkosina@suse.cz: fixed changelog typo spotted by Josh]
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2016-01-18 21:35:43 +01:00
Kirill A. Shutemov
fecffad254 s390, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Linus Torvalds
cbd88cd4c0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "Among the traditional bug fixes and cleanups are some improvements:

   - A tool to generated the facility lists, generating the bit fields
     by hand has been a source of bugs in the past

   - The spinlock loop is reordered to avoid bursts of hypervisor calls

   - Add support for the open-for-business interface to the service
     element

   - The get_cpu call is added to the vdso

   - A set of tracepoints is defined for the common I/O layer

   - The deprecated sclp_cpi module is removed

   - Update default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (56 commits)
  s390/sclp: fix possible control register corruption
  s390: fix normalization bug in exception table sorting
  s390/configs: update default configurations
  s390/vdso: optimize getcpu system call
  s390: drop smp_mb in vdso_init
  s390: rename struct _lowcore to struct lowcore
  s390/mem_detect: use unsigned longs
  s390/ptrace: get rid of long longs in psw_bits
  s390/sysinfo: add missing SYSIB 1.2.2 multithreading fields
  s390: get rid of CONFIG_SCHED_MC and CONFIG_SCHED_BOOK
  s390/Kconfig: remove pointless 64 bit dependencies
  s390/dasd: fix failfast for disconnected devices
  s390/con3270: testing return kzalloc retval
  s390/hmcdrv: constify hmcdrv_ftp_ops structs
  s390/cio: add NULL test
  s390/cio: Change I/O instructions from inline to normal functions
  s390/cio: Introduce common I/O layer tracepoints
  s390/cio: Consolidate inline assemblies and related data definitions
  s390/cio: Fix incorrect xsch opcode specification
  s390/cio: Remove unused inline assemblies
  ...
2016-01-13 13:16:16 -08:00
Linus Torvalds
aee3bfa330 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from Davic Miller:

 1) Support busy polling generically, for all NAPI drivers.  From Eric
    Dumazet.

 2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

 3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

 4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

 5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

 6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

 7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

 8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

 9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers.  From Kalle Valo.

10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation.  From Lorenzo Colitti.

11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

14) Dead route support in MPLS, from Roopa Prabhu.

15) TSO support for thunderx chips, from Sunil Goutham.

16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack.  From Tom Herbert.

18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
  net: bnxt: always return values from _bnxt_get_max_rings
  net: bpf: reject invalid shifts
  phonet: properly unshare skbs in phonet_rcv()
  dwc_eth_qos: Fix dma address for multi-fragment skbs
  phy: remove an unneeded condition
  mdio: remove an unneed condition
  mdio_bus: NULL dereference on allocation error
  net: Fix typo in netdev_intersect_features
  net: freescale: mac-fec: Fix build error from phy_device API change
  net: freescale: ucc_geth: Fix build error from phy_device API change
  bonding: Prevent IPv6 link local address on enslaved devices
  IB/mlx5: Add flow steering support
  net/mlx5_core: Export flow steering API
  net/mlx5_core: Make ipv4/ipv6 location more clear
  net/mlx5_core: Enable flow steering support for the IB driver
  net/mlx5_core: Initialize namespaces only when supported by device
  net/mlx5_core: Set priority attributes
  net/mlx5_core: Connect flow tables
  net/mlx5_core: Introduce modify flow table command
  net/mlx5_core: Managing root flow table
  ...
2016-01-12 18:57:02 -08:00
Linus Torvalds
1baa5efbeb * s390: Support for runtime instrumentation within guests,
support of 248 VCPUs.
 
 * ARM: rewrite of the arm64 world switch in C, support for
 16-bit VM identifiers.  Performance counter virtualization
 missed the boat.
 
 * x86: Support for more Hyper-V features (synthetic interrupt
 controller), MMU cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWlSKwAAoJEL/70l94x66DY0UIAK5vp4zfQoQOJC4KP4Xgxwdu
 kpnK2Boz3/74o1b0y5+eJZoUZCsXCVLtmP5uhmMxUYWDgByFG2X8ZDhPFwB5FYLT
 2dN+Lr4tsolgIfRdHZtrT6Svp9SDL039bWTdscnbR6l37/j9FRWvpKdhI3orloFD
 /i4CSW2dVIq1/9Xctwu/rtcOEesEx4Cad+6YV3/530eVAXFzE908nXfmqJNZTocY
 YCGcmrMVCOu0ng5QM4xSzmmYjKMLUcRs+QzZWkVBzdJtTgwZUr09yj7I2dZ1yj/i
 cxYrJy6shSwE74XkXsmvG+au3C5u3vX4tnXjBFErnPJ99oqzHatVnFWNRhj4dLQ=
 =PIj1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC changes will come next week.

   - s390: Support for runtime instrumentation within guests, support of
     248 VCPUs.

   - ARM: rewrite of the arm64 world switch in C, support for 16-bit VM
     identifiers.  Performance counter virtualization missed the boat.

   - x86: Support for more Hyper-V features (synthetic interrupt
     controller), MMU cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (115 commits)
  kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL
  kvm/x86: Hyper-V SynIC timers tracepoints
  kvm/x86: Hyper-V SynIC tracepoints
  kvm/x86: Update SynIC timers on guest entry only
  kvm/x86: Skip SynIC vector check for QEMU side
  kvm/x86: Hyper-V fix SynIC timer disabling condition
  kvm/x86: Reorg stimer_expiration() to better control timer restart
  kvm/x86: Hyper-V unify stimer_start() and stimer_restart()
  kvm/x86: Drop stimer_stop() function
  kvm/x86: Hyper-V timers fix incorrect logical operation
  KVM: move architecture-dependent requests to arch/
  KVM: renumber vcpu->request bits
  KVM: document which architecture uses each request bit
  KVM: Remove unused KVM_REQ_KICK to save a bit in vcpu->requests
  kvm: x86: Check kvm_write_guest return value in kvm_write_wall_clock
  KVM: s390: implement the RI support of guest
  kvm/s390: drop unpaired smp_mb
  kvm: x86: fix comment about {mmu,nested_mmu}.gva_to_gpa
  KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
  arm/arm64: KVM: Detect vGIC presence at runtime
  ...
2016-01-12 13:22:12 -08:00
Michael S. Tsirkin
779a6a3696 s390: more efficient smp barriers
As per: lkml.kernel.org/r/20150921112252.3c2937e1@mschwide
atomics imply a barrier on s390, so s390 should change
smp_mb__before_atomic and smp_mb__after_atomic to barrier() instead of
smp_mb() and hence should not use the generic versions.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:47:05 +02:00
Michael S. Tsirkin
a677f48695 s390: use generic memory barriers
The s390 kernel is SMP to 99.99%, we just didn't bother with a
non-smp variant for the memory-barriers. If the generic header
is used we'd get the non-smp version for free. It will save a
small amount of text space for CONFIG_SMP=n.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:47:04 +02:00
Michael S. Tsirkin
82b44496ab s390: define __smp_xxx
This defines __smp_xxx barriers for s390,
for use by virtualization.

Some smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Note: smp_mb, smp_rmb and smp_wmb are defined as full barriers
unconditionally on this architecture.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:46:56 +02:00
Michael S. Tsirkin
21535aaed9 s390: reuse asm-generic/barrier.h
On s390 read_barrier_depends, smp_read_barrier_depends
smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the
asm-generic variants exactly. Drop the local definitions and pull in
asm-generic/barrier.h instead.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2016-01-12 20:46:48 +02:00
Davidlohr Bueso
5a1b26d7c6 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2016-01-12 20:46:46 +02:00
Martin Schwidefsky
249c543b97 s390/vdso: optimize getcpu system call
Add the CPU number to the per-cpu vdso data page and add the
__kernel_getcpu function to the vdso object to retrieve the
CPU number in user space.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 13:01:24 +01:00
Heiko Carstens
c667aeacc1 s390: rename struct _lowcore to struct lowcore
Finally get rid of the leading underscore. I tried this already two or
three years ago, however Michael Holzheu objected since this would
break the crash utility (again).

However Michael integrated support for the new name into the crash
utility back then, so it doesn't break if the name will be changed
now.  So finally get rid of the ever confusing leading underscore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:15 +01:00
Heiko Carstens
423d5b364c s390/mem_detect: use unsigned longs
The memory detection code historically had to use unsigned long long
since the machine reported the true memory size (>4GB) even if the
virtual machine was running in ESA/390 mode.

Since the old code is gone use unsigned long everywhere and also get
rid of an unused ADDR2G define.

(this patch converts all long longs within sclp_info to longs)

There are many more possible conversions, however that can be done if
somebody touches the corresponding code.  Since people started to
convert unrelated long types to long longs because of the types within
struct sclp_info convert this now.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:11 +01:00
Heiko Carstens
cb951785a7 s390/ptrace: get rid of long longs in psw_bits
The long longs were introduced by me in order to have a working
definition of the struct psw_bits also in 31 bit mode. Since that is
gone also get rid of the long longs.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:07 +01:00
Heiko Carstens
7022ec4960 s390/sysinfo: add missing SYSIB 1.2.2 multithreading fields
Add missing multithreading fields of SYSIB 1.2.2 (Basic-Machine CPUs)
to the output of /proc/sysinfo.

Also use bitfields for SYSIB 2.2.2 to simplify the C code a bit.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-01-11 12:27:00 +01:00
Paolo Bonzini
2860c4b167 KVM: move architecture-dependent requests to arch/
Since the numbers now overlap, it makes sense to enumerate
them in asm/kvm_host.h rather than linux/kvm_host.h.  Functions
that refer to architecture-specific requests are also moved
to arch/.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-01-08 19:04:36 +01:00
Fan Zhang
c6e5f16637 KVM: s390: implement the RI support of guest
This patch adds runtime instrumentation support for KVM guest. We need to
setup a save area for the runtime instrumentation-controls control block(RICCB)
and implement the necessary interfaces to live migrate the guest settings.

We setup the sie control block in a way, that the runtime
instrumentation instructions of a guest are handled by hardware.

We also add a capability KVM_CAP_S390_RI to make this feature opt-in as
it needs migration support.

Signed-off-by: Fan Zhang <zhangfan@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-01-07 14:48:26 +01:00
Craig Gallek
538950a1b7 soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek <kraig@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-04 22:49:59 -05:00
Heiko Carstens
9236b4dd6b s390: get rid of CONFIG_SCHED_MC and CONFIG_SCHED_BOOK
Use CONFIG_TOPOLOGY which selects CONFIG_SCHED_* all over the place to
reduce the random usage of the previous config options.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-30 10:34:57 +01:00
Peter Oberparleiter
2ab59de7c5 s390/cio: Consolidate inline assemblies and related data definitions
Replace the current semi-arbitrary distribution of inline assemblies:
 - Inline assemblies used by CIO go into ioasm.h
 - Data definitions used by inline assemblies go into cio.h

Beyond cleaning up the current structure this is also required for
use of tracepoints in inline assemblies introduced by a follow-on
patch.

Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:34 +01:00
Christian Borntraeger
d7ae65aec0 s390/setup: cleanup machine flags
Over time some machine flags got unused (e.g. MACHINE_FLAG_MVPG)
or are available on all 64bit systems (MACHINE_FLAG_CSP,
MACHINE_FLAG_IEEE) - let's remove them.
Reorder the other ones to match the order of the MACHINE_HAS_*
macros and renumber all bits to avoid holes.
Also fix the comment about where the flags are detected.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:32 +01:00
Heiko Carstens
f3d05f9de7 s390/facilities: add z13 als bit
If configured for z13 assume the kernel makes use of the instructions
that are part of the load-and-zero-rightmost-byte facility and
load/store-on-condition facility 2.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:24 +01:00
Heiko Carstens
6ab6c31a54 s390/facilities: optimize test_facility()
test_facility() can be optimized for bits which must be set anyway,
due to the check in head.S. This removes a couple of superfluous
runtime checks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:23 +01:00
Heiko Carstens
185edd4431 s390/facilities: remove unneeded facility bits
The facility lists contain a lot of bits which are not necessary to
run the kernel.  Therefore remove them and keep only those bits which
are required for the kernel.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:22 +01:00
Heiko Carstens
c30f6828fe s390/facilities: add helper tool to generate facility lists
Modifying the architecture level set facility lists was always very
error prone. Given the numbering of the facility bits within the
Principles of Operation, where the most significant bit number is 0,
it happened a lot of times that wrong bits were set or cleared.

Therefore this patch adds a tool "gen_facilities" which generates
include/generated/facilites.h.  The definition of the bits to be set
is contained within arch/s390/include/asm/facilities_src.h and can be
easily extended to e.g. also generate such lists for the KVM module.

The generated file looks like this:

 #define FACILITIES_ALS _AC(0xc1006450f0040000,UL)
 #define FACILITIES_ALS_DWORDS 1

The facility bits defined in this patch match 1:1 to the current masks
that can be found in head.S.

That is if the tool gets executed with -march=z990 then the generated
masks will equal the masks in head.S for CONFIG_MARCH_Z990.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:20 +01:00
Heiko Carstens
9552a66fe6 s390/facilities: use stfl mnemonic instead of insn magic
Now that 31 bit support is gone, the assembler always knows about the
stfl instruction. Therefore lets use a readable mnemonic.  Also remove
the not needed extable entry for the inline assembly and fix the
output constraint.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-12-18 14:59:18 +01:00
Dominik Dingel
a3a92c31bf KVM: s390: fix mismatch between user and in-kernel guest limit
While the userspace interface requests the maximum size the gmap code
expects to get a maximum address.

This error resulted in bigger page tables than necessary for some guest
sizes, e.g. a 2GB guest used 3 levels instead of 2.

At the same time we introduce KVM_S390_NO_MEM_LIMIT, which allows in a
bright future that a guest spans the complete 64 bit address space.

We also switch to TASK_MAX_SIZE for the initial memory size, this is a
cosmetic change as the previous size also resulted in a 4 level pagetable
creation.

Reported-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-12-15 17:08:21 +01:00
Davidlohr Bueso
d5a73cadf3 lcoking/barriers, arch: Use smp barriers in smp_store_release()
With commit b92b8b35a2 ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-12-04 11:39:51 +01:00
David Hildenbrand
7f16d7e787 s390: show virtualization support in /proc/cpuinfo
This patch exposes the SIE capability (aka virtualization support) via
/proc/cpuinfo -> "features" as "sie".

As we don't want to expose this hwcap via elf, let's add a second,
"internal"/non-elf capability list. The content is simply concatenated
to the existing features when printing /proc/cpuinfo.

We also add the defines to elf.h to keep the hwcap stuff at a common
place.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:11 +01:00
David Hildenbrand
8dfd523f85 s390/sclp: introduce check for SIE
This patch adds a way to check if the SIE with zArchitecture support is
available.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:11 +01:00
Eugene (jno) Dvurechenski
fe0edcb731 KVM: s390: Enable up to 248 VCPUs per VM
This patch allows s390 to have more than 64 VCPUs for a guest (up to
248 for memory usage considerations), if supported by the underlaying
hardware (sclp.has_esca).

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski
5e04431523 KVM: s390: Introduce switching code
This patch adds code that performs transparent switch to Extended
SCA on addition of 65th VCPU in a VM. Disposal of ESCA is added too.
The entier ESCA functionality, however, is still not enabled.
The enablement will be provided in a separate patch.

This patch also uses read/write lock protection of SCA and its subfields for
possible disposal at the BSCA-to-ESCA transition. While only Basic SCA needs such
a protection (for the swap), any SCA access is now guarded.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski
7d43bafcff KVM: s390: Make provisions for ESCA utilization
This patch updates the routines (sca_*) to provide transparent access
to and manipulation on the data for both Basic and Extended SCA in use.
The kvm.arch.sca is generalized to (void *) to handle BSCA/ESCA cases.
Also the kvm.arch.use_esca flag is provided.
The actual functionality is kept the same.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:08 +01:00
Eugene (jno) Dvurechenski
bc784ccee5 KVM: s390: Introduce new structures
This patch adds new structures and updates some existing ones to
provide the base for Extended SCA functionality.

The old sca_* structures were renamed to bsca_* to keep things uniform.

The access to fields of SIGP controls were turned into bitfields instead
of hardcoded bitmasks.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:07 +01:00
Eugene (jno) Dvurechenski
f7ba1d3426 s390/sclp: introduce checks for ESCA and HVS
Introduce sclp.has_hvs and sclp.has_esca to provide a way for kvm to check
whether the extended-SCA and the home-virtual-SCA facilities are available.

Signed-off-by: Eugene (jno) Dvurechenski <jno@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-11-30 12:47:06 +01:00
Martin Schwidefsky
419123f900 s390/spinlock: do not yield to a CPU in udelay/mdelay
It does not make sense to try to relinquish the time slice with diag 0x9c
to a CPU in a state that does not allow to schedule the CPU. The scenario
where this can happen is a CPU waiting in udelay/mdelay while holding a
spin-lock.

Add a CIF bit to tag a CPU in enabled wait and use it to detect that the
yield of a CPU will not be successful and skip the diagnose call.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:18 +01:00
Heiko Carstens
9eb31be33c s390: remove is_32bit_task() helper
is_32bit_task() used to be helpful when we still had CONFIG_32BIT.
Since that is gone, it is nowadays identical to is_compat_task().
So remove it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:17 +01:00
Sascha Silbe
3f975df69d s390/sclp: Add VT220 support to early sclp console
When running under qemu with the default configuration (-nographic),
there is only a VT220 SCLP console, no line-mode SCLP console. Add
VT220 support to the early SCLP console so the user has a chance to
see critical error messages during early boot.

None of the existing users of _sclp_print_early() check the return
code. Instead of trying to come up with return code semantics when
printing to multiple consoles (any or all of which may fail), we just
drop the return code entirely.

Tested on z/VM (line mode console) and LPAR (VT220 and line mode
console). Tested on qemu/KVM with VT220 console and / or line mode
console.

Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:17 +01:00
Gerald Schaefer
69eea95c48 s390/pci_dma: fix DMA table corruption with > 4 TB main memory
DMA addresses returned from map_page() are calculated by using an iommu
bitmap plus a start_dma offset. The size of this bitmap is based on the main
memory size. If we have more than (4 TB - start_dma) main memory, the DMA
address calculation will also produce addresses > 4 TB. Such addresses
cannot be inserted in the 3-level DMA page table, instead the entries
modulo 4 TB will be overwritten.

Fix this by restricting the iommu bitmap size to (4 TB - start_dma).
Also set zdev->end_dma to the actual end address of the usable
range, instead of the theoretical maximum as reported by the hardware,
which fixes a sanity check in dma_map() and also the IOMMU API domain
geometry aperture calculation.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:15 +01:00
Martin Schwidefsky
1a2c5840ac s390/dump: cleanup CPU save area handling
Introduce save_area_alloc(), save_area_boot_cpu(), save_area_add_regs()
and save_area_add_vxrs to deal with storing the CPU state in case of a
system dump. Remove struct save_area and save_area_ext, and create a new
struct save_area as a local definition to arch/s390/kernel/crash_dump.c.
Copy each individual field from the hardware status area to the save area,
storing the minimum of required data.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:14 +01:00
Martin Schwidefsky
1a36a39e22 s390/dump: rework CPU register dump code
To collect the CPU registers of the crashed system allocated a single
page with memblock_alloc_base and use it as a copy buffer. Replace the
stop-and-store-status sigp with a store-status-at-address sigp in
smp_save_dump_cpus() and smp_store_status(). In both cases the target
CPU is already stopped and store-status-at-address avoids the detour
via the absolute zero page.

For kexec simplify s390_reset_system and call store_status() before
the prefix register of the boot CPU has been set to zero. Use STPX
to store the prefix register and remove dump_prefix_page.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:14 +01:00
Martin Schwidefsky
df9694c797 s390/dump: streamline oldmem copy functions
Introduce two copy functions for the memory of the dumped system,
copy_oldmem_kernel() to copy to the virtual kernel address space
and copy_oldmem_user() to copy to user space.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky
8a07dd02d7 s390/kdump: remove code to create ELF notes in the crashed system
The s390 architecture can store the CPU registers of the crashed system
after the kdump kernel has been started and this is the preferred way.
Remove the remaining code fragments that deal with storing CPU registers
while the crashed system is still active.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky
ffa52d02c5 s390/zcore: remove /sys/kernel/debug/zcore/mem
New versions of the SCSI dumper use the /dev/vmcore interface instead
of zcore mem. Remove the outdated interface.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:12 +01:00
Martin Schwidefsky
bbfed511c2 s390/zcore: copy vector registers into the image data
The /sys/kernel/debug/zcore/mem interface delivers the memory of the
old system with the CPU registers stored to the assigned locations in
each prefix page.

For the vector registers the prefix page of each CPU has an address of
a 1024 byte save area at 0x11b0. But the /sys/kernel/debug/zcore/mem
interface fails copy the vector registers saved at boot of the zfcpdump
kernel into the dump image.

Copy the saved vector registers of a CPU to the outout buffer if the
memory area that is read via /sys/kernel/debug/zcore/mem intersects
with the vector register save area of this CPU.

Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-27 09:24:11 +01:00
Heiko Carstens
932f608193 s390: wire up mlock2 system call
Passes mlock2-tests test case in 64 bit and compat mode.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:51:07 +01:00
Martin Schwidefsky
c7e8b2c21c s390: avoid cache aliasing under z/VM and KVM
commit 1f6b83e5e4 ("s390: avoid z13 cache aliasing") checks for the
machine type to optimize address space randomization and zero page
allocation to avoid cache aliases.

This check might fail under a hypervisor with migration support.
z/VMs "Single System Image and Live Guest Relocation" facility will
"fake" the machine type of the oldest system in the group. For example
in a group of zEC12 and Z13 the guest appears to run on a zEC12
(architecture fencing within the relocation domain)

Remove the machine type detection and always use cache aliasing
rules that are known to work for all machines. These are the z13
aliasing rules.

Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-16 12:04:18 +01:00
Sebastian Ott
18e22a1772 s390: add support for ipl devices in subchannel sets > 0
Allow to ipl from CCW based devices residing in any subchannel set.

Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-11 13:56:27 +01:00
Sebastian Ott
66728eeea6 s390/pci_dma: handle dma table failures
We use lazy allocation for translation table entries but don't handle
allocation (and other) failures during translation table updates.

Handle these failures and undo translation table updates when it's
meaningful.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:49 +01:00
Heiko Carstens
86b68c3873 s390/syscalls: remove system call number calculation
Explicitly write the system call number for each define instead of
calculating it. This makes it easier to parse the file when generating
system call tables for various tools and libraries.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:48 +01:00
Martin Schwidefsky
230ccb370f s390/diag: add a s390 prefix to the diagnose trace point
Documentation/trace/tracepoints.txt states that the naming scheme
for tracepoints is "subsys_event" to avoid collisions. Rename
the 'diagnose' tracepoint to 's390_diagnose'.

Reported-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-09 09:10:47 +01:00
Linus Torvalds
933425fb00 s390: A bunch of fixes and optimizations for interrupt and time
handling.
 
 PPC: Mostly bug fixes.
 
 ARM: No big features, but many small fixes and prerequisites including:
 - a number of fixes for the arch-timer
 - introducing proper level-triggered semantics for the arch-timers
 - a series of patches to synchronously halt a guest (prerequisite for
   IRQ forwarding)
 - some tracepoint improvements
 - a tweak for the EL2 panic handlers
 - some more VGIC cleanups getting rid of redundant state
 
 x86: quite a few changes:
 
 - support for VT-d posted interrupts (i.e. PCI devices can inject
 interrupts directly into vCPUs).  This introduces a new component (in
 virt/lib/) that connects VFIO and KVM together.  The same infrastructure
 will be used for ARM interrupt forwarding as well.
 
 - more Hyper-V features, though the main one Hyper-V synthetic interrupt
 controller will have to wait for 4.5.  These will let KVM expose Hyper-V
 devices.
 
 - nested virtualization now supports VPID (same as PCID but for vCPUs)
 which makes it quite a bit faster
 
 - for future hardware that supports NVDIMM, there is support for clflushopt,
 clwb, pcommit
 
 - support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
 userspace, which reduces the attack surface of the hypervisor
 
 - obligatory smattering of SMM fixes
 
 - on the guest side, stable scheduler clock support was rewritten to not
 require help from the hypervisor.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
 f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
 VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
 p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
 PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
 iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
 =iv2z
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "First batch of KVM changes for 4.4.

  s390:
     A bunch of fixes and optimizations for interrupt and time handling.

  PPC:
     Mostly bug fixes.

  ARM:
     No big features, but many small fixes and prerequisites including:

      - a number of fixes for the arch-timer

      - introducing proper level-triggered semantics for the arch-timers

      - a series of patches to synchronously halt a guest (prerequisite
        for IRQ forwarding)

      - some tracepoint improvements

      - a tweak for the EL2 panic handlers

      - some more VGIC cleanups getting rid of redundant state

  x86:
     Quite a few changes:

      - support for VT-d posted interrupts (i.e. PCI devices can inject
        interrupts directly into vCPUs).  This introduces a new
        component (in virt/lib/) that connects VFIO and KVM together.
        The same infrastructure will be used for ARM interrupt
        forwarding as well.

      - more Hyper-V features, though the main one Hyper-V synthetic
        interrupt controller will have to wait for 4.5.  These will let
        KVM expose Hyper-V devices.

      - nested virtualization now supports VPID (same as PCID but for
        vCPUs) which makes it quite a bit faster

      - for future hardware that supports NVDIMM, there is support for
        clflushopt, clwb, pcommit

      - support for "split irqchip", i.e.  LAPIC in kernel +
        IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
        the hypervisor

      - obligatory smattering of SMM fixes

      - on the guest side, stable scheduler clock support was rewritten
        to not require help from the hypervisor"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
  KVM: VMX: Fix commit which broke PML
  KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
  KVM: x86: allow RSM from 64-bit mode
  KVM: VMX: fix SMEP and SMAP without EPT
  KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
  KVM: device assignment: remove pointless #ifdefs
  KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
  KVM: x86: zero apic_arb_prio on reset
  drivers/hv: share Hyper-V SynIC constants with userspace
  KVM: x86: handle SMBASE as physical address in RSM
  KVM: x86: add read_phys to x86_emulate_ops
  KVM: x86: removing unused variable
  KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
  KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
  KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
  KVM: arm/arm64: Optimize away redundant LR tracking
  KVM: s390: use simple switch statement as multiplexer
  KVM: s390: drop useless newline in debugging data
  KVM: s390: SCA must not cross page boundaries
  KVM: arm: Do not indent the arguments of DECLARE_BITMAP
  ...
2015-11-05 16:26:26 -08:00
Linus Torvalds
39cf7c3981 IOMMU Updates for Linux v4.4
This time including:
 
 	* A new IOMMU driver for s390 pci devices
 
 	* Common dma-ops support based on iommu-api for ARM64. The plan is to
 	  use this as a basis for ARM32 and hopefully other architectures as
 	  well in the future.
 
 	* MSI support for ARM-SMMUv3
 
 	* Cleanups and dead code removal in the AMD IOMMU driver
 
 	* Better RMRR handling for the Intel VT-d driver
 
 	* Various other cleanups and small fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJWOz7hAAoJECvwRC2XARrjbvYQALwtITTA5iTm0y/ApwNMxI7n
 pZpjZVPoBPNsGBc4t/MT8pVhUSdmpBOljbV4Y4CayL1mSSB6Bl2gooZjd66m7Z81
 qMJYEVWhFQqVsIKkCSNOgaO7W5y+xt3rTgqN6vCu86/CCDfKrTPP/+CRl1T/z9bo
 1J8ioM3KnZG9KzG8JuXYFg5wwbKToaBh6swSmj+O4U9hru7zV/ILP7ikcc9pyMji
 12WbzCqchRORsJZD65xMRYAqRaPNN/3IlDejs00TOFhY3qpWgEgFUucyeRJBJ/+q
 K4U8T5vZsnr1a04l7/BeYbLmP7y/9Qv0N0xMGtTyoy/w/BieGqRWu4hHhqf/44NO
 EhCSXcEThMNCGTjP2VWC4dnQ/s7Y8OmSW9nCreUcFVxHoE5LfDoh8RngA2fpeNuS
 ixb3OwP+YXHN9Ck+1BQqQCeBznsPTLuDxlhRjCJsWntIfMSkXebOkz83YxyZ9b0Q
 gFvptfuknU7cotUwWa3dg8RiUB8kNlKJyEEByaVpWEbEOabnONKEMkstvuBx6Ots
 kA63wbe7QcPgbUYuq7g0nijDw6E2aEtf0nx2Xx4ZDL932qjg/xUkiBpmbDXHw4Gu
 nimNXVQtbCzF74SyTvxEtupiijOTm5eHtoKtg0mYnqPZ+V9eOwEvW8IHaFFf8XHD
 SecikoTtH1Q4RVtqOcAQ
 =jLlB
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull iommu updates from Joerg Roedel:
 "This time including:

   - A new IOMMU driver for s390 pci devices

   - Common dma-ops support based on iommu-api for ARM64.  The plan is
     to use this as a basis for ARM32 and hopefully other architectures
     as well in the future.

   - MSI support for ARM-SMMUv3

   - Cleanups and dead code removal in the AMD IOMMU driver

   - Better RMRR handling for the Intel VT-d driver

   - Various other cleanups and small fixes"

* tag 'iommu-updates-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (41 commits)
  iommu/vt-d: Fix return value check of parse_ioapics_under_ir()
  iommu/vt-d: Propagate error-value from ir_parse_ioapic_hpet_scope()
  iommu/vt-d: Adjust the return value of the parse_ioapics_under_ir
  iommu: Move default domain allocation to iommu_group_get_for_dev()
  iommu: Remove is_pci_dev() fall-back from iommu_group_get_for_dev
  iommu/arm-smmu: Switch to device_group call-back
  iommu/fsl: Convert to device_group call-back
  iommu: Add device_group call-back to x86 iommu drivers
  iommu: Add generic_device_group() function
  iommu: Export and rename iommu_group_get_for_pci_dev()
  iommu: Revive device_group iommu-ops call-back
  iommu/amd: Remove find_last_devid_on_pci()
  iommu/amd: Remove first/last_device handling
  iommu/amd: Initialize amd_iommu_last_bdf for DEV_ALL
  iommu/amd: Cleanup buffer allocation
  iommu/amd: Remove cmd_buf_size and evt_buf_size from struct amd_iommu
  iommu/amd: Align DTE flag definitions
  iommu/amd: Remove old alias handling code
  iommu/amd: Set alias DTE in do_attach/do_detach
  iommu/amd: WARN when __[attach|detach]_device are called with irqs enabled
  ...
2015-11-05 16:12:10 -08:00
Paolo Bonzini
197a4f4b06 KVM/ARM Changes for v4.4-rc1
Includes a number of fixes for the arch-timer, introducing proper
 level-triggered semantics for the arch-timers, a series of patches to
 synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
 improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
 getting rid of redundant state, and finally a stylistic change that gets rid of
 some ctags warnings.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWOhgAAAoJEEtpOizt6ddyKS8H/2ZHMTPjo6yChnrusNWy4Qbr
 6laPDlzL+g45oMQRwNL7GnM1deRftaxvT2Wi+X84D/6Y/BD6MPds4HgtBfuWcSZ1
 CyRJ0Ot/zrxenucSuJuOjq+a9gdizdAczkbB1MfYDULJH8fb6D+7RYLo3zgh4Xo4
 pla3L9U6gSWe+YopBjZtZH43m3fwiwSM/v+uHOTIcXrsbR+fEgx/EFSKmA/DUCuo
 P5cFO/ceUGu7nATCexu5V82TgR2hvurrsR7mqfwY8YcF6HRM+NEOoS29xWC77v5S
 u/F08TKuKQLv0YTEFTyLETI/oEeuC0cHtrRQBNf4+9kXEOzKyXaae0wR/I6X2Ss=
 =GMNk
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM Changes for v4.4-rc1

Includes a number of fixes for the arch-timer, introducing proper
level-triggered semantics for the arch-timers, a series of patches to
synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint
improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups
getting rid of redundant state, and finally a stylistic change that gets rid of
some ctags warnings.

Conflicts:
	arch/x86/include/asm/kvm_host.h
2015-11-04 16:24:17 +01:00
Martin Schwidefsky
b38feccd66 s390: remove runtime instrumentation interrupts
The external interrupts for runtime instrumentation buffer-full
and runtime instrumentation halted are unused and have no current
user. Remove the support and ignore the second parameter of the
s390_runtime_instr system call from now on.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-11-03 14:40:51 +01:00
Joerg Roedel
b67ad2f7c7 Merge branches 'x86/vt-d', 'arm/omap', 'arm/smmu', 's390', 'core' and 'x86/amd' into next
Conflicts:
	drivers/iommu/amd_iommu_types.h
2015-11-02 20:03:34 +09:00
Heiko Carstens
dc6e15556a s390/nmi: remove casts
Remove all the casts to and from the machine check interruption code.
This patch changes struct mci to a union, which contains an anonymous
structure with the already known bits and in addition an unsigned
long field, which contains the raw machine check interruption code.

This allows to simply assign and decoce the interruption code value
without the need for all those casts we had all the time.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:55 +01:00
Heiko Carstens
f9e6edfb9c s390: don't store registers on disabled wait anymore
The current disabled wait code stores register contents into their
save areas, however it is (at least) missing the new vector registers.

Given the fact that the whole exercise seems to be rather pointless
simply don't save any registers anymore.

In a "live" system it is always possible to inspect register contents,
and in case of a dump the register contents will be stored by the
dump mechanism.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:48 +01:00
Heiko Carstens
ecbafda853 s390: get rid of __set_psw_mask()
With the removal of 31 bit code we can always assume that the epsw
instruction is available. Therefore use the __extract_psw() function
to disable and enable machine checks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-27 09:33:44 +01:00
Christoffer Dall
3217f7c25b KVM: Add kvm_arch_vcpu_{un}blocking callbacks
Some times it is useful for architecture implementations of KVM to know
when the VCPU thread is about to block or when it comes back from
blocking (arm/arm64 needs to know this to properly implement timers, for
example).

Therefore provide a generic architecture callback function in line with
what we do elsewhere for KVM generic-arch interactions.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2015-10-22 23:01:41 +02:00
Hendrik Brueckner
b0753902d4 s390/fpu: split fpu-internal.h into fpu internals, api, and type headers
Split the API and FPU type definitions into separate header files
similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e54b).

The new header files and their meaning are:

asm/fpu/types.h:
	FPU related data types, needed for 'struct thread_struct' and
	'struct task_struct'.

asm/fpu/api.h:
	FPU related 'public' functions for other subsystems and device
	drivers.

asm/fpu/internal.h:
	FPU internal functions mainly used to convert
	FPU register contents in signal handling.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-16 09:41:12 +02:00
Christian Borntraeger
fdbbe8e791 s390/spinlock: remove unneeded serializations at unlock
the kernel locks have aqcuire/release semantics. No operation done
after the lock can be "moved" before the lock and no operation before
the unlock can be moved after the unlock. But it is perfectly fine
that memory accesses which happen code wise after unlock are performed
within the critical section.
On s390x, reads are in-order with other reads (PoP section
"Storage-Operand Fetch References") and writes are in-order with
other writes (PoP section "Storage-Operand Store References"). Writes
are also in-order with reads to the same memory location (PoP section
"Storage-Operand Store References"). To other CPUs (and the channel
subsystem), reads additionally appear to be performed prior to reads or
writes that happen after them in the conceptual sequence (PoP section
"Relation between Operand Accesses").
So at least as observed by other CPUs and the channel subsystem, reads
inside the critical sections will not happen after unlock (and writes
are in-order anyway). That's exactly what we need for "RELEASE
operations" (memory-barriers.txt): "It guarantees that all memory
operations before the RELEASE operation will appear to happen before the
RELEASE operation with respect to the other components of the system."

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-By: Sascha Silbe <silbe@linux.vnet.ibm.com>
[cross-reading and lot of improvements for the patch description]
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:25 +02:00
Heiko Carstens
29b0a8250b s390/etr,stp: fix possible deadlock on machine check
The first level machine check handler for etr and stp machine checks may
call queue_work() while in nmi context. This may deadlock e.g. if the
machine check happened when the interrupted context did hold a lock, that
also will be acquired by queue_work().
Therefore split etr and stp machine check handling into first and second
level handling. The second level handling will then issue the queue_work()
call in process context which avoids the potential deadlock.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-10-14 14:32:18 +02:00