Commit Graph

1183 Commits

Author SHA1 Message Date
Michael Mueller
658b6eda20 KVM: s390: add cpu model support
This patch enables cpu model support in kvm/s390 via the vm attribute
interface.

During KVM initialization, the host properties cpuid, IBC value and the
facility list are stored in the architecture specific cpu model structure.

During vcpu setup, these properties are taken to initialize the related SIE
state. This mechanism allows to adjust the properties from user space and thus
to implement different selectable cpu models.

This patch uses the IBC functionality to block instructions that have not
been implemented at the requested CPU type and GA level compared to the
full host capability.

Userspace has to initialize the cpu model before vcpu creation. A cpu model
change of running vcpus is not possible.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:13 +01:00
Michael Mueller
9d8d578605 KVM: s390: use facilities and cpu_id per KVM
The patch introduces facilities and cpu_ids per virtual machine.
Different virtual machines may want to expose different facilities and
cpu ids to the guest, so let's make them per-vm instead of global.

Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:12 +01:00
Tony Krowiak
45c9b47c58 KVM: s390/CPACF: Choose crypto control block format
We need to specify a different format for the crypto control block
depending on whether the APXA facility is installed or not. Let's
test for it by executing the PQAP(QCI) function and use either a
format-1 or a format-2 crypto control block accordingly. This is a
host only change for z13 and does not affect the guest view.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:12 +01:00
Ekaterina Tumanova
f3d0bd6c7f s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
A new architecture extends STSI 3.2.2 with UUID and long names. KVM
will provide the first implementation. This patch adds the additional
data  fields (Extended Name and UUID) from the 4KB block returned by
the STSI 3.2.2 command and reflect this information in the
/proc/sysinfo file accordingly.

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-02-09 12:44:11 +01:00
Paolo Bonzini
f781951299 kvm: add halt_poll_ns module parameter
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.

This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest.  KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too.  When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.

With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest.  This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.

Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host.  The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.

The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter.  It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.

While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls.  During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark.  The wasted time is thus very low.  Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.

The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer.  Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-06 13:08:37 +01:00
Tony Krowiak
a374e892c3 KVM: s390/cpacf: Enable/disable protected key functions for kvm guest
Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest.  The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.

Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
2015-01-23 13:25:40 +01:00
Jason J. Herne
72f250206f KVM: s390: Provide guest TOD Clock Get/Set Controls
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.

Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.

TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:40 +01:00
David Hildenbrand
2444b352c3 KVM: s390: forward most SIGP orders to user space
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU

We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).

This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:37 +01:00
David Hildenbrand
9fbd80828c KVM: s390: clear the pfault queue if user space sets the invalid token
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).

This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
ea5f496925 KVM: s390: only one external call may be pending at a time
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.

SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.

SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.

If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:36 +01:00
David Hildenbrand
d614be05c8 s390/sclp: introduce check for the SIGP Interpretation Facility
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:35 +01:00
David Hildenbrand
6cddd432e3 KVM: s390: handle stop irqs without action_bits
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.

The new local interrupt infrastructure is used to track pending stop
requests.

STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).

If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).

Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
David Hildenbrand
2822545f9f KVM: s390: new parameter for SIGP STOP irqs
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.

For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.

Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2015-01-23 13:25:33 +01:00
Linus Torvalds
66dcff86ba 3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-assisted
 virtualization on the PPC970
 - ARM, PPC, s390 all had only small fixes
 
 For x86:
 - small performance improvements (though only on weird guests)
 - usual round of hardware-compliancy fixes from Nadav
 - APICv fixes
 - XSAVES support for hosts and guests.  XSAVES hosts were broken because
 the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
 ABI whenever XSAVES was enabled; hence, this part is going to stable.
 Guest support is just a matter of exposing the feature and CPUID leaves
 support.
 
 Right now KVM is broken for PPC BookE in your tree (doesn't compile).
 I'll reply to the pull request with a patch, please apply it either
 before the pull request or in the merge commit, in order to preserve
 bisectability somewhat.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
 lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
 yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
 DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
 MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
 6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
 =Zwq1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM update from Paolo Bonzini:
 "3.19 changes for KVM:

   - spring cleaning: removed support for IA64, and for hardware-
     assisted virtualization on the PPC970

   - ARM, PPC, s390 all had only small fixes

  For x86:
   - small performance improvements (though only on weird guests)
   - usual round of hardware-compliancy fixes from Nadav
   - APICv fixes
   - XSAVES support for hosts and guests.  XSAVES hosts were broken
     because the (non-KVM) XSAVES patches inadvertently changed the KVM
     userspace ABI whenever XSAVES was enabled; hence, this part is
     going to stable.  Guest support is just a matter of exposing the
     feature and CPUID leaves support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
  KVM: move APIC types to arch/x86/
  KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
  KVM: PPC: Book3S HV: Improve H_CONFER implementation
  KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
  KVM: PPC: Book3S HV: Remove code for PPC970 processors
  KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
  KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
  arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
  arch: powerpc: kvm: book3s_pr.c: Remove unused function
  arch: powerpc: kvm: book3s.c: Remove some unused functions
  arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
  KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
  KVM: PPC: Book3S HV: ptes are big endian
  KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
  KVM: PPC: Book3S HV: Fix KSM memory corruption
  KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
  KVM: PPC: Book3S HV: Fix computation of tlbie operand
  KVM: PPC: Book3S HV: Add missing HPTE unlock
  KVM: PPC: BookE: Improve irq inject tracepoint
  arm/arm64: KVM: Require in-kernel vgic for the arch timers
  ...
2014-12-18 16:05:28 -08:00
Linus Torvalds
f96fe22567 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull another networking update from David Miller:
 "Small follow-up to the main merge pull from the other day:

  1) Alexander Duyck's DMA memory barrier patch set.

  2) cxgb4 driver fixes from Karen Xie.

  3) Add missing export of fixed_phy_register() to modules, from Mark
     Salter.

  4) DSA bug fixes from Florian Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
  net/macb: add TX multiqueue support for gem
  linux/interrupt.h: remove the definition of unused tasklet_hi_enable
  jme: replace calls to redundant function
  net: ethernet: davicom: Allow to select DM9000 for nios2
  net: ethernet: smsc: Allow to select SMC91X for nios2
  cxgb4: Add support for QSA modules
  libcxgbi: fix freeing skb prematurely
  cxgb4i: use set_wr_txq() to set tx queues
  cxgb4i: handle non-pdu-aligned rx data
  cxgb4i: additional types of negative advice
  cxgb4/cxgb4i: set the max. pdu length in firmware
  cxgb4i: fix credit check for tx_data_wr
  cxgb4i: fix tx immediate data credit check
  net: phy: export fixed_phy_register()
  fib_trie: Fix trie balancing issue if new node pushes down existing node
  vlan: Add ability to always enable TSO/UFO
  r8169:update rtl8168g pcie ephy parameter
  net: dsa: bcm_sf2: force link for all fixed PHY devices
  fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
  r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
  ...
2014-12-12 16:11:12 -08:00
Alexander Duyck
1077fa36f2 arch: Add lightweight memory barriers dma_rmb() and dma_wmb()
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be.  For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.

This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb().  In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb().  For example on
ARM the rmb barriers break down as follows:

  Barrier   Call     Explanation
  --------- -------- ----------------------------------
  rmb()     dsb()    Data synchronization barrier - system
  dma_rmb() dmb(osh) data memory barrier - outer sharable
  smp_rmb() dmb(ish) data memory barrier - inner sharable

These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories.  The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.

It may also be noted that there is no dma_mb().  Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb().  As such there isn't much to
be gained in trying to define such a function.

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:06 -05:00
Alexander Duyck
8a44971841 arch: Cleanup read_barrier_depends() and comments
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends.  In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.

With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header.  In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 21:15:05 -05:00
Linus Torvalds
27afc5dbda Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "The most notable change for this pull request is the ftrace rework
  from Heiko.  It brings a small performance improvement and the ground
  work to support a new gcc option to replace the mcount blocks with a
  single nop.

  Two new s390 specific system calls are added to emulate user space
  mmio for PCI, an artifact of the how PCI memory is accessed.

  Two patches for the memory management with changes to common code.
  For KVM mm_forbids_zeropage is added which disables the empty zero
  page for an mm that is used by a KVM process.  And an optimization,
  pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.

  Some micro optimization for the cmpxchg and the spinlock code.

  And as usual bug fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
  s390/cputime: fix 31-bit compile
  s390/scm_block: make the number of reqs per HW req configurable
  s390/scm_block: handle multiple requests in one HW request
  s390/scm_block: allocate aidaw pages only when necessary
  s390/scm_block: use mempool to manage aidaw requests
  s390/eadm: change timeout value
  s390/mm: fix memory leak of ptlock in pmd_free_tlb
  s390: use local symbol names in entry[64].S
  s390/ptrace: always include vector registers in core files
  s390/simd: clear vector register pointer on fork/clone
  s390: translate cputime magic constants to macros
  s390/idle: convert open coded idle time seqcount
  s390/idle: add missing irq off lockdep annotation
  s390/debug: avoid function call for debug_sprintf_*
  s390/kprobes: fix instruction copy for out of line execution
  s390: remove diag 44 calls from cpu_relax()
  s390/dasd: retry partition detection
  s390/dasd: fix list corruption for sleep_on requests
  s390/dasd: fix infinite term I/O loop
  s390/dasd: remove unused code
  ...
2014-12-11 17:30:55 -08:00
Linus Torvalds
70e71ca0af Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

 2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers.  Thanks to Al Viro
    and Herbert Xu.

 3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

 4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

 5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

 6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

 7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

 8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

 9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets.  From Alexei
    Starovoitov.

10) Support TSO/LSO in sunvnet driver, from David L Stevens.

11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

12) Remote checksum offload, from Tom Herbert.

13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

14) Add MPLS support to openvswitch, from Simon Horman.

15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet.  This tries to resolve the conflicting goals between the
    desired handling of bulk vs.  RPC-like traffic.

17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU.  From Eric Dumazet.

18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

21) Add VLAN packet scheduler action, from Jiri Pirko.

22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
  Fix race condition between vxlan_sock_add and vxlan_sock_release
  net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
  net/mlx4: Add support for A0 steering
  net/mlx4: Refactor QUERY_PORT
  net/mlx4_core: Add explicit error message when rule doesn't meet configuration
  net/mlx4: Add A0 hybrid steering
  net/mlx4: Add mlx4_bitmap zone allocator
  net/mlx4: Add a check if there are too many reserved QPs
  net/mlx4: Change QP allocation scheme
  net/mlx4_core: Use tasklet for user-space CQ completion events
  net/mlx4_core: Mask out host side virtualization features for guests
  net/mlx4_en: Set csum level for encapsulated packets
  be2net: Export tunnel offloads only when a VxLAN tunnel is created
  gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
  cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
  net: fec: only enable mdio interrupt before phy device link up
  net: fec: clear all interrupt events to support i.MX6SX
  net: fec: reset fep link status in suspend function
  net: sock: fix access via invalid file descriptor
  net: introduce helper macro for_each_cmsghdr
  ...
2014-12-11 14:27:06 -08:00
Daniel Borkmann
0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Linus Torvalds
3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Martin Schwidefsky
3519978101 s390/cputime: fix 31-bit compile
git commit 8461b63ca0
"s390: translate cputime magic constants to macros"
introduce a built error for 31-bit:

  kernel/built-in.o: In function `posix_cpu_timer_set':
  posix-cpu-timers.c:(.text+0x2a8cc): undefined reference to `__udivdi3'

The original code is actually broken for 31-bit and has been
corrected by the above commit by forcing the compiler to use
64-bit arithmetic through the CPUTIME_PER_USEC define.

To fix the compile error replace the 64-bit division with
a call to __div().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 14:03:43 +01:00
Martin Schwidefsky
9de45f736f s390/mm: fix memory leak of ptlock in pmd_free_tlb
The pmd_free_tlb function fails to call pgtable_pmd_page_dtor.
Without the call the ptlock for the pmd tables will not be freed.
Add the missing call.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:40 +01:00
Frederic Weisbecker
8461b63ca0 s390: translate cputime magic constants to macros
Make the code more self-explanatory by naming magic constants.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:34 +01:00
Frederic Weisbecker
1ce2180498 s390/idle: convert open coded idle time seqcount
s390 uses open coded seqcount to synchronize idle time accounting.
Lets consolidate it with the standard API.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:32 +01:00
Christian Borntraeger
832a771034 s390/debug: avoid function call for debug_sprintf_*
debug_sprintf_event/exception are called even for debug events
with a disabling debug level. All other functions already do
the check in a wrapper function. Lets do the same here.
Due to the var_args the compiler rejects to make this function
inline. So let's wrap this via a macro.
This patch saves around 80 ns on my z196 for a KVM round trip (we
have two debug statements for entry and exit) when KVM is build as
a module.
The savings for built-in drivers is smaller as we then avoid the
PLT overhead for a function call.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-12-08 09:42:29 +01:00
Jens Freimann
383d0b0501 KVM: s390: handle pending local interrupts via bitmap
This patch adapts handling of local interrupts to be more compliant with
the z/Architecture Principles of Operation and introduces a data
structure
which allows more efficient handling of interrupts.

* get rid of li->active flag, use bitmap instead
* Keep interrupts in a bitmap instead of a list
* Deliver interrupts in the order of their priority as defined in the
  PoP
* Use a second bitmap for sigp emergency requests, as a CPU can have
  one request pending from every other CPU in the system.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:59:04 +01:00
Jens Freimann
c0e6159d51 KVM: s390: add bitmap for handling cpu-local interrupts
Adds a bitmap to the vcpu structure which is used to keep track
of local pending interrupts. Also add enum with all interrupt
types sorted in order of priority (highest to lowest)

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:59:04 +01:00
Jason J. Herne
9fcf93b5de KVM: S390: Create helper function get_guest_storage_key
Define get_guest_storage_key which can be used to get the value of a guest
storage key. This compliments the functionality provided by the helper function
set_guest_storage_key. Both functions are needed for live migration of s390
guests that use storage keys.

Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 13:58:48 +01:00
Thomas Huth
04b41acd06 KVM: s390: Fix rewinding of the PSW pointing to an EXECUTE instruction
A couple of our interception handlers rewind the PSW to the beginning
of the instruction to run the intercepted instruction again during the
next SIE entry. This normally works fine, but there is also the
possibility that the instruction did not get run directly but via an
EXECUTE instruction.
In this case, the PSW does not point to the instruction that caused the
interception, but to the EXECUTE instruction! So we've got to rewind the
PSW to the beginning of the EXECUTE instruction instead.
This is now accomplished with a new helper function kvm_s390_rewind_psw().

Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-11-28 12:32:56 +01:00
Heiko Carstens
57f2ffe14f s390: remove diag 44 calls from cpu_relax()
Simplify cpu_relax() to a simple barrier(). Performance wise this doesn't
seem to make any big difference anymore, since nearly all lock variants
have directed yield semantics in the meantime.
Also this makes s390 behave like all other architectures.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-28 09:47:49 +01:00
Heiko Carstens
7eed2e09ab s390/ftrace: provide working ftrace_return_address()
The common code ftrace_return_address(n), which is just a wrapper for
__builtin_return_address(n), will only work for n > 0 if CONFIG_FRAME_POINTER
is set to 'y'. Otherwise it will return 0.
Since on s390 we will never have that config option set to 'y'
ftrace_return_address() won't work at all for n > 0.

Luckily we always compile the kernel with -mkernel-backchain which
in turn means that __builtin_return_address(n) will always work.

So let ftrace_return_address(n) map to __builtin_return_address(n).

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-28 09:47:15 +01:00
Dave Hansen
62e88b1c00 mm: Make arch_unmap()/bprm_mm_init() available to all architectures
The x86 MPX patch set calls arch_unmap() and arch_bprm_mm_init()
from fs/exec.c, so we need at least a stub for them in all
architectures.  They are only called under an #ifdef for
CONFIG_MMU=y, so we can at least restict this to architectures
with MMU support.

blackfin/c6x have no MMU support, so do not call arch_unmap().
They also do not include mm_hooks.h or mmu_context.h at all and
do not need to be touched.

s390, um and unicore32 do not use asm-generic/mm_hooks.h, so got
their own arch_unmap() versions.  (I also moved um's
arch_dup_mmap() to be closer to the other mm_hooks.h functions).

xtensa only includes mm_hooks when MMU=y, which should be fine
since arch_unmap() is called only from MMU=y code.

For the rest, we use the stub copies of these functions in
asm-generic/mm_hook.h.

I cross compiled defconfigs for cris (to check NOMMU) and s390
to make sure that this works.  I also checked a 64-bit build
of UML and all my normal x86 builds including PARAVIRT on and
off.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182350.8B4AA2C2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-19 11:54:13 +01:00
Sebastian Ott
afaa7d29bc s390/irq: use irq 0
Irq 0 is currently unused on s390. Since there is no reason to
do this start counting at the beginning and gain an additional
irq. Also correctly report the smallest usable irq number for
dynamic allocation.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-18 18:23:03 +01:00
Frank Blaschka
99e97b7106 s390/io: add ioport_map stubs
add ioport_map stubs to make vfio build on s390.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-18 18:23:01 +01:00
Arnd Bergmann
1c8d29696f Merge branch 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into asm-generic
* 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  documentation: memory-barriers: clarify relaxed io accessor semantics
  x86: io: implement dummy relaxed accessor macros for writes
  tile: io: implement dummy relaxed accessor macros for writes
  sparc: io: implement dummy relaxed accessor macros for writes
  powerpc: io: implement dummy relaxed accessor macros for writes
  parisc: io: implement dummy relaxed accessor macros for writes
  mn10300: io: implement dummy relaxed accessor macros for writes
  m68k: io: implement dummy relaxed accessor macros for writes
  m32r: io: implement dummy relaxed accessor macros for writes
  ia64: io: implement dummy relaxed accessor macros for writes
  cris: io: implement dummy relaxed accessor macros for writes
  frv: io: implement dummy relaxed accessor macros for writes
  xtensa: io: remove dummy relaxed accessor macros for reads
  s390: io: remove dummy relaxed accessor macros for reads
  microblaze: io: remove dummy relaxed accessor macros
  asm-generic: io: implement relaxed accessor macros as conditional wrappers

Conflicts:
	include/asm-generic/io.h

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2014-11-11 19:55:45 +01:00
Thierry Reding
4707a341b4 /dev/mem: Use more consistent data types
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address
or a kernel virtual address, so data types should be phys_addr_t and
void *. They both return a kernel virtual address which is only ever
used in calls to copy_{from,to}_user(), so make variables that store it
void * rather than char * for consistency.

Also only define a weak unxlate_dev_mem_ptr() function if architectures
haven't overridden them in the asm/io.h header file.

Signed-off-by: Thierry Reding <treding@nvidia.com>
2014-11-10 15:59:21 +01:00
Martin Schwidefsky
5b9f2081e0 s390/pci: add sparse annotations
Fix the following warnings from the sparse code checker:

arch/s390/include/asm/pci_io.h:165:49: warning: cast removes address space of expression
arch/s390/pci/pci.c:476:44: warning: cast removes address space of expression
arch/s390/pci/pci.c:491:36: warning: incorrect type in argument 2 (different address spaces)
arch/s390/pci/pci.c:491:36:    expected void [noderef] <asn:2>*addr
arch/s390/pci/pci.c:491:36:    got void *<noident>

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:30:23 +01:00
Sebastian Ott
b19148f6e2 s390/pci: improve irq number check for msix
s390s arch_setup_msi_irqs function ensures that we don't return with
more irqs than the PCI architecture allows and that a single PCI
function doesn't consume more irqs than the kernel is configured for.

At least the last check doesn't help much and should take the sum of
all irqs into account. Since that's already done by irq_alloc_desc
we can remove this check.

As for the first check we should use the value provided by the
firmware which can be less than what the PCI architecture allows.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:30:12 +01:00
Martin Schwidefsky
f318a1229b s390/cmpxchg: use compiler builtins
The kernel build for s390 fails for gcc compilers with version 3.x,
set the minimum required version of gcc to version 4.3.

As the atomic builtins are available with all gcc 4.x compilers,
use the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
functions to replace the complex macro and inline assembler magic
in include/asm/cmpxchg.h. The compiler can just-do-it and generates
better code with the builtins.

While we are at it use __sync_bool_compare_and_swap for the
_raw_compare_and_swap function in the spinlock code as well.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-11-03 13:29:47 +01:00
David Hildenbrand
42cb0c9ff9 KVM: s390: sigp: instruction counters for all sigp orders
This patch introduces instruction counters for all known sigp orders and also a
separate one for unknown orders that are passed to user space.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:09:13 +01:00
David Hildenbrand
b898383082 KVM: s390: sigp: separate preparation handlers
This patch introduces in preparation for further code changes separate handler
functions for:
- SIGP (RE)START - will not be allowed to terminate pending orders
- SIGP (INITIAL) CPU RESET - will be allowed to terminate certain pending orders
- unknown sigp orders

All sigp orders that require user space intervention are logged.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:09:13 +01:00
Thomas Huth
a6b7e459ff KVM: s390: Make the simple ipte mutex specific to a VM instead of global
The ipte-locking should be done for each VM seperately, not globally.
This way we avoid possible congestions when the simple ipte-lock is used
and multiple VMs are running.

Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-10-28 13:08:59 +01:00
Martin Schwidefsky
fcbe08d66f s390/mm: pmdp_get_and_clear_full optimization
Analog to ptep_get_and_clear_full define a variant of the
pmpd_get_and_clear primitive which gets the full hint from the
mmu_gather struct. This allows s390 to avoid a costly instruction
when destroying an address space.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:30 +01:00
Heiko Carstens
c933146a5e s390/ftrace,kprobes: allow to patch first instruction
If the function tracer is enabled, allow to set kprobes on the first
instruction of a function (which is the function trace caller):

If no kprobe is set handling of enabling and disabling function tracing
of a function simply patches the first instruction. Either it is a nop
(right now it's an unconditional branch, which skips the mcount block),
or it's a branch to the ftrace_caller() function.

If a kprobe is being placed on a function tracer calling instruction
we encode if we actually have a nop or branch in the remaining bytes
after the breakpoint instruction (illegal opcode).
This is possible, since the size of the instruction used for the nop
and branch is six bytes, while the size of the breakpoint is only
two bytes.
Therefore the first two bytes contain the illegal opcode and the last
four bytes contain either "0" for nop or "1" for branch. The kprobes
code will then execute/simulate the correct instruction.

Instruction patching for kprobes and function tracer is always done
with stop_machine(). Therefore we don't have any races where an
instruction is patched concurrently on a different cpu.
Besides that also the program check handler which executes the function
trace caller instruction won't be executed concurrently to any
stop_machine() execution.

This allows to keep full fault based kprobes handling which generates
correct pt_regs contents automatically.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:27 +01:00
Dominik Dingel
3ac8e38015 s390/mm: disable KSM for storage key enabled pages
When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:26 +01:00
Dominik Dingel
2faee8ff9d s390/mm: prevent and break zero page mappings in case of storage keys
As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:25 +01:00
Dominik Dingel
a13cff318c s390/mm: recfactor global pgste updates
Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-10-27 13:27:23 +01:00
Will Deacon
916136b374 s390: io: remove dummy relaxed accessor macros for reads
These are now defined by asm-generic/io.h, so we don't need the private
definitions anymore.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-10-20 18:49:17 +01:00
Linus Torvalds
0429fbc0bd Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu consistent-ops changes from Tejun Heo:
 "Way back, before the current percpu allocator was implemented, static
  and dynamic percpu memory areas were allocated and handled separately
  and had their own accessors.  The distinction has been gone for many
  years now; however, the now duplicate two sets of accessors remained
  with the pointer based ones - this_cpu_*() - evolving various other
  operations over time.  During the process, we also accumulated other
  inconsistent operations.

  This pull request contains Christoph's patches to clean up the
  duplicate accessor situation.  __get_cpu_var() uses are replaced with
  with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

  Unfortunately, the former sometimes is tricky thanks to C being a bit
  messy with the distinction between lvalues and pointers, which led to
  a rather ugly solution for cpumask_var_t involving the introduction of
  this_cpu_cpumask_var_ptr().

  This converts most of the uses but not all.  Christoph will follow up
  with the remaining conversions in this merge window and hopefully
  remove the obsolete accessors"

* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
  irqchip: Properly fetch the per cpu offset
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
  ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
  percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
  Revert "powerpc: Replace __get_cpu_var uses"
  percpu: Remove __this_cpu_ptr
  clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
  sparc: Replace __get_cpu_var uses
  avr32: Replace __get_cpu_var with __this_cpu_write
  blackfin: Replace __get_cpu_var uses
  tile: Use this_cpu_ptr() for hardware counters
  tile: Replace __get_cpu_var uses
  powerpc: Replace __get_cpu_var uses
  alpha: Replace __get_cpu_var
  ia64: Replace __get_cpu_var uses
  s390: cio driver &__get_cpu_var replacements
  s390: Replace __get_cpu_var uses
  mips: Replace __get_cpu_var uses
  MIPS: Replace __get_cpu_var uses in FPU emulator.
  arm: Replace __this_cpu_ptr with raw_cpu_ptr
  ...
2014-10-15 07:48:18 +02:00