This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends. In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.
With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header. In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.
This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 MPX support from Thomas Gleixner:
"This enables support for x86 MPX.
MPX is a new debug feature for bound checking in user space. It
requires kernel support to handle the bound tables and decode the
bound violating instruction in the trap handler"
* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
asm-generic: Remove asm-generic arch_bprm_mm_init()
mm: Make arch_unmap()/bprm_mm_init() available to all architectures
x86: Cleanly separate use of asm-generic/mm_hooks.h
x86 mpx: Change return type of get_reg_offset()
fs: Do not include mpx.h in exec.c
x86, mpx: Add documentation on Intel MPX
x86, mpx: Cleanup unused bound tables
x86, mpx: On-demand kernel allocation of bounds tables
x86, mpx: Decode MPX instruction to get bound violation information
x86, mpx: Add MPX-specific mmap interface
x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
x86, mpx: Add MPX to disabled features
ia64: Sync struct siginfo with general version
mips: Sync struct siginfo with general version
mpx: Extend siginfo structure to include bound violation information
x86, mpx: Rename cfg_reg_u and status_reg
x86: mpx: Give bndX registers actual names
x86: Remove arbitrary instruction size limit in instruction decoder
The x86 MPX patch set calls arch_unmap() and arch_bprm_mm_init()
from fs/exec.c, so we need at least a stub for them in all
architectures. They are only called under an #ifdef for
CONFIG_MMU=y, so we can at least restict this to architectures
with MMU support.
blackfin/c6x have no MMU support, so do not call arch_unmap().
They also do not include mm_hooks.h or mmu_context.h at all and
do not need to be touched.
s390, um and unicore32 do not use asm-generic/mm_hooks.h, so got
their own arch_unmap() versions. (I also moved um's
arch_dup_mmap() to be closer to the other mm_hooks.h functions).
xtensa only includes mm_hooks when MMU=y, which should be fine
since arch_unmap() is called only from MMU=y code.
For the rest, we use the stub copies of these functions in
asm-generic/mm_hook.h.
I cross compiled defconfigs for cris (to check NOMMU) and s390
to make sure that this works. I also checked a 64-bit build
of UML and all my normal x86 builds including PARAVIRT on and
off.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: linux-arch@vger.kernel.org
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20141118182350.8B4AA2C2@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The xlate_dev_{kmem,mem}_ptr() functions take either a physical address
or a kernel virtual address, so data types should be phys_addr_t and
void *. They both return a kernel virtual address which is only ever
used in calls to copy_{from,to}_user(), so make variables that store it
void * rather than char * for consistency.
Also only define a weak unxlate_dev_mem_ptr() function if architectures
haven't overridden them in the asm/io.h header file.
Signed-off-by: Thierry Reding <treding@nvidia.com>
These are now defined by asm-generic/io.h, so we don't need the private
definitions anymore.
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Pull percpu consistent-ops changes from Tejun Heo:
"Way back, before the current percpu allocator was implemented, static
and dynamic percpu memory areas were allocated and handled separately
and had their own accessors. The distinction has been gone for many
years now; however, the now duplicate two sets of accessors remained
with the pointer based ones - this_cpu_*() - evolving various other
operations over time. During the process, we also accumulated other
inconsistent operations.
This pull request contains Christoph's patches to clean up the
duplicate accessor situation. __get_cpu_var() uses are replaced with
with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().
Unfortunately, the former sometimes is tricky thanks to C being a bit
messy with the distinction between lvalues and pointers, which led to
a rather ugly solution for cpumask_var_t involving the introduction of
this_cpu_cpumask_var_ptr().
This converts most of the uses but not all. Christoph will follow up
with the remaining conversions in this merge window and hopefully
remove the obsolete accessors"
* 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
irqchip: Properly fetch the per cpu offset
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
Revert "powerpc: Replace __get_cpu_var uses"
percpu: Remove __this_cpu_ptr
clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
sparc: Replace __get_cpu_var uses
avr32: Replace __get_cpu_var with __this_cpu_write
blackfin: Replace __get_cpu_var uses
tile: Use this_cpu_ptr() for hardware counters
tile: Replace __get_cpu_var uses
powerpc: Replace __get_cpu_var uses
alpha: Replace __get_cpu_var
ia64: Replace __get_cpu_var uses
s390: cio driver &__get_cpu_var replacements
s390: Replace __get_cpu_var uses
mips: Replace __get_cpu_var uses
MIPS: Replace __get_cpu_var uses in FPU emulator.
arm: Replace __this_cpu_ptr with raw_cpu_ptr
...
Pull s390 updates from Martin Schwidefsky:
"This patch set contains the main portion of the changes for 3.18 in
regard to the s390 architecture. It is a bit bigger than usual,
mainly because of a new driver and the vector extension patches.
The interesting bits are:
- Quite a bit of work on the tracing front. Uprobes is enabled and
the ftrace code is reworked to get some of the lost performance
back if CONFIG_FTRACE is enabled.
- To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
IPTE range facility is added.
- The rwlock code is re-factored to improve writer fairness and to be
able to use the interlocked-access instructions.
- The kernel part for the support of the vector extension is added.
- The device driver to access the CD/DVD on the HMC is added, this
will hopefully come in handy to improve the installation process.
- Add support for control-unit initiated reconfiguration.
- The crypto device driver is enhanced to enable the additional AP
domains and to allow the new crypto hardware to be used.
- Bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
s390/ftrace: remove 31 bit ftrace support
s390/kdump: add support for vector extension
s390/disassembler: add vector instructions
s390: add support for vector extension
s390/zcrypt: Toleration of new crypto hardware
s390/idle: consolidate idle functions and definitions
s390/nohz: use a per-cpu flag for arch_needs_cpu
s390/vtime: do not reset idle data on CPU hotplug
s390/dasd: add support for control unit initiated reconfiguration
s390/dasd: fix infinite loop during format
s390/mm: make use of ipte range facility
s390/setup: correct 4-level kernel page table detection
s390/topology: call set_sched_topology early
s390/uprobes: architecture backend for uprobes
s390/uprobes: common library for kprobes and uprobes
s390/rwlock: use the interlocked-access facility 1 instructions
s390/rwlock: improve writer fairness
s390/rwlock: remove interrupt-enabling rwlock variant.
s390/mm: remove change bit override support
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
Hansen)
- Various sched/idle refinements for better idle handling (Nicolas
Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)
- sched/numa updates and optimizations (Rik van Riel)
- sysbench speedup (Vincent Guittot)
- capacity calculation cleanups/refactoring (Vincent Guittot)
- Various cleanups to thread group iteration (Oleg Nesterov)
- Double-rq-lock removal optimization and various refactorings
(Kirill Tkhai)
- various sched/deadline fixes
... and lots of other changes"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
sched/fair: Delete resched_cpu() from idle_balance()
sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
sched: Improve sysbench performance by fixing spurious active migration
sched/x86: Fix up typo in topology detection
x86, sched: Add new topology for multi-NUMA-node CPUs
sched/rt: Use resched_curr() in task_tick_rt()
sched: Use rq->rd in sched_setaffinity() under RCU read lock
sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
sched: Use dl_bw_of() under RCU read lock
sched/fair: Remove duplicate code from can_migrate_task()
sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
sched: print_rq(): Don't use tasklist_lock
sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
sched: Fix the task-group check in tg_has_rt_tasks()
sched/fair: Leverage the idle state info when choosing the "idlest" cpu
sched: Let the scheduler see CPU idle states
sched/deadline: Fix inter- exclusive cpusets migrations
sched/deadline: Clear dl_entity params when setscheduling to different class
sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
...
Pull dma-mapping update from Marek Szyprowski:
"Provide the dma write coherent api (available previously on ARM
architecture) for all other architectures, which use dma_ops-based dma
mapping implementation.
This lets one to use the same code in the device drivers regardless of
the selected architecture"
* 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
dma-mapping: Provide write-combine allocations
s390: Implement dma_{alloc,free}_attrs()
Pull timer fixes from Ingo Molnar:
"Main changes:
- Fix the deadlock reported by Dave Jones et al
- Clean up and fix nohz_full interaction with arch abilities
- nohz init code consolidation/cleanup"
* 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: nohz full depends on irq work self IPI support
nohz: Consolidate nohz full init code
arm64: Tell irq work about self IPI support
arm: Tell irq work about self IPI support
x86: Tell irq work about self IPI support
irq_work: Force raised irq work to run on irq work interrupt
irq_work: Introduce arch_irq_work_has_interrupt()
nohz: Move nohz full init call to tick init
31 bit and 64 bit diverge more and more and it is rather painful
to keep both parts running.
To make things simpler just remove the 31 bit support which nobody
uses anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
With this patch for kdump the s390 vector registers are stored into the
prepared save areas in the old kernel and into the REGSET_VX_LOW and
REGSET_VX_HIGH ELF notes for /proc/vmcore in the new kernel.
The NT_S390_VXRS_LOW note contains the lower halves of the first 16 vector
registers 0-15. The higher halves are stored in the floating point register
ELF note. The NT_S390_VXRS_HIGH contains the full vector registers 16-31.
The kernel provides a save area for storing vector register in case of
machine checks. A pointer to this save are is stored in the CPU lowcore
at offset 0x11b0. This save area is also used to save the registers for
kdump. In case of a dumped crashed kdump those areas are used to extract
the registers of the production system.
The vector registers for remote CPUs are stored using the "store additional
status at address" SIGP. For the dump CPU the vector registers are stored
with the VSTM instruction.
With this patch also zfcpdump stores the vector registers.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The vector extension introduces 32 128-bit vector registers and a set of
instruction to operate on the vector registers.
The kernel can control the use of vector registers for the problem state
program with a bit in control register 0. Once enabled for a process the
kernel needs to retain the content of the vector registers on context
switch. The signal frame is extended to include the vector registers.
Two new register sets NT_S390_VXRS_LOW and NT_S390_VXRS_HIGH are added
to the regset interface for the debugger and core dumps.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the C functions and definitions related to the idle state handling
to arch/s390/include/asm/idle.h and arch/s390/kernel/idle.c. The function
s390_get_idle_time is renamed to arch_cpu_idle_time and vtime_stop_cpu to
enabled_wait.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the nohz_delay bit from the s390_idle data structure to the
per-cpu flags. Clear the nohz delay flag in __cpu_disable and
remove the cpu hotplug notifier that used to do this.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest and
via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put firmware
in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization improvements
(including improved Windows support on Intel and Jailhouse hypervisor
support on AMD), adaptive PLE which helps overcommitting of huge guests.
Also included are some patches that make KVM more friendly to memory
hot-unplug, and fixes for rare caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes. To verify
future signed pull requests from me, please update my key with
"gpg --recv-keys 9B4D86F2". You should see 3 new subkeys---the
one for signing will be a 2048-bit RSA key, 4E6B09D7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJUL5sPAAoJEBvWZb6bTYbyfkEP/3MNhSyn6HCjPjtjLNPAl9KL
WpExZSUFL2+4CztpdGIsek1BeJYHmqv3+c5S+WvaWVA1aqh2R7FT1D1ErBLjgLQq
lq23IOr+XxmC3dXQUEEk+TlD+283UzypzEG4l4UD3JYg79fE3UrXAz82SeyewJDY
x7aPYhkZG3RHu+wAyMPasG6E3zS5LySdUtGWbiPwz5BejrhBJoJdeb2WIL/RwnUK
7ppSLB5EoFj/uMkuyeAAdAbdfSrhHA6faDZxNdxS9k9wGutrhhfUoQ49ONrKG4dV
sFo1tSPTVgRs8QFYUZ2fJUPBAmUVddsgqh2K9d0NftGTq7b8YszaCsfFrs2/Y4MU
YxssWEhxsfszerCu12bbAJrv6JBZYQ7TwGvI9L7P0iFU6IVw/djmukU4AkM9/e91
YS/cue/PN+9Pn2ccXzL9J7xRtZb8FsOuRsCXTCmbOwDkLmrKPDBN2t3RUbeF+Eam
ABrpWnLKX13kZSo4LKU+/niarzmPMp7odQfHVdr8ea0fiYLp4iN8puA20WaSPIgd
CLvm+RAvXe5Lm91L4mpFotJ2uFyK6QlIYJV4FsgeWv/0D0qppWQi0Utb/aCNHCgy
z8MyUMD48y7EpoQrFYr/7cddXIu0/NegnM8I1coVjIPEk4NfeebGUlCJ/V3D8wMG
BgEfS2x6jRc5zB3hjwDr
=iEVi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Fixes and features for 3.18.
Apart from the usual cleanups, here is the summary of new features:
- s390 moves closer towards host large page support
- PowerPC has improved support for debugging (both inside the guest
and via gdbstub) and support for e6500 processors
- ARM/ARM64 support read-only memory (which is necessary to put
firmware in emulated NOR flash)
- x86 has the usual emulator fixes and nested virtualization
improvements (including improved Windows support on Intel and
Jailhouse hypervisor support on AMD), adaptive PLE which helps
overcommitting of huge guests. Also included are some patches that
make KVM more friendly to memory hot-unplug, and fixes for rare
caching bugs.
Two patches have trivial mm/ parts that were acked by Rik and Andrew.
Note: I will soon switch to a subkey for signing purposes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits)
kvm: do not handle APIC access page if in-kernel irqchip is not in use
KVM: s390: count vcpu wakeups in stat.halt_wakeup
KVM: s390/facilities: allow TOD-CLOCK steering facility bit
KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode
arm/arm64: KVM: Report correct FSC for unsupported fault types
arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc
kvm: Fix kvm_get_page_retry_io __gup retval check
arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset
kvm: x86: Unpin and remove kvm_arch->apic_access_page
kvm: vmx: Implement set_apic_access_page_addr
kvm: x86: Add request bit to reload APIC access page address
kvm: Add arch specific mmu notifier for page invalidation
kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static
kvm: Fix page ageing bugs
kvm/x86/mmu: Pass gfn and level to rmapp callback.
x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only
kvm: x86: use macros to compute bank MSRs
KVM: x86: Remove debug assertion of non-PAE reserved bits
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls
kvm: Faults which trigger IO release the mmap_sem
...
On 32 bit systems cmpxchg cannot handle 64 bit values, so
some additional magic is required to allow a 32 bit system
with CONFIG_VIRT_CPU_ACCOUNTING_GEN=y enabled to build.
Make sure the correct cmpxchg function is used when doing
an atomic swap of a cputime_t.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: umgwanakikbuti@gmail.com
Cc: fweisbec@gmail.com
Cc: srao@redhat.com
Cc: lwoodman@redhat.com
Cc: atheurer@redhat.com
Cc: oleg@redhat.com
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linux390@de.ibm.com
Cc: linux-arch@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Link: http://lkml.kernel.org/r/20140930155947.070cdb1f@annuminas.surriel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch introduces the halt_wakeup counter used by common code and uses it to
count vcpu wakeups done in s390 arch specific code.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Invalidate several pte entries at once if the ipte range facility
is available. Currently this works only for DEBUG_PAGE_ALLOC where
several up to 2 ^ MAX_ORDER may be invalidated at once.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch moves common functions from kprobes.c to probes.c.
Thus its possible for uprobes to use them without enabling kprobes.
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make use of the load-and-add, load-and-or and load-and-and instructions
to atomically update the read-write lock without a compare-and-swap loop.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an owner field to the arch_rwlock_t to be able to pass the timeslice
of a virtual CPU with diagnose 0x9c to the lock owner in case the rwlock
is write-locked. The undirected yield in case the rwlock is acquired
writable but the lock is read-locked is removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This device driver allows accessing a HMC drive CD/DVD-ROM.
It can be used in a LPAR and z/VM environment.
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Ralf Hoppe <rhoppe@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The nohz full code needs irq work to trigger its own interrupt so that
the subsystem can work even when the tick is stopped.
Lets introduce arch_irq_work_has_interrupt() that archs can override to
tell about their support for this ability.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
We have to provide a per guest crypto block for the CPUs to
enable MSA4 instructions. According to icainfo on z196 or
later this enables CCM-AES-128, CMAC-AES-128, CMAC-AES-192
and CMAC-AES-256.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
Use a memory barrier + store sequence instead of a load + compare and swap
sequence to unlock a spinlock and an rw lock.
For the spinlock case this saves us two memory reads and a not needed cpu
serialization after the compare and swap instruction stored the new value.
The kernel size (performance_defconfig) gets reduced by ~14k.
Average execution time of a tight inlined spin_unlock loop drops from
5.8ns to 0.7ns on a zEC12 machine.
An artificial stress test case where several counters are protected with
a single spinlock and which are only incremented while holding the spinlock
shows ~30% improvement on a 4 cpu machine.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reduce the number of executed instructions within the mcount block if
function tracing is enabled. We achieve that by using a non-standard
C function call ABI. Since the called function is also written in
assembler this is not a problem.
This also allows to replace the unconditional store at the beginning
of the mcount block with a larl instruction, which doesn't touch
memory.
In theory we could also patch the first instruction of the mcount block
to enable and disable function tracing. However this would break kprobes.
This could be fixed with implementing the "kprobes_on_ftrace" feature;
however keeping the odd jprobes working seems not to be possible without
a lot of code churn. Therefore keep the code easy and simply accept one
wasted 1-cycle "larl" instruction per function prologue.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This code is based on a patch from Vojtech Pavlik.
http://marc.info/?l=linux-s390&m=140438885114413&w=2
The actual implementation now differs significantly:
Instead of adding a second function "ftrace_regs_caller" which would be nearly
identical to the existing ftrace_caller function, the current ftrace_caller
function is now an alias to ftrace_regs_caller and always passes the needed
pt_regs structure and function_trace_op parameters unconditionally.
Besides that also use asm offsets to correctly allocate and access the new
struct pt_regs on the stack.
While at it we can make use of new instruction to get rid of some indirect
loads if compiled for new machines.
The passed struct pt_regs can be changed by the called function and it's new
contents will replace the current contents.
Note: to change the return address the embedded psw member of the pt_regs
structure must be changed. The psw member is right now incomplete, since
the mask part is missing. For all current use cases this should be sufficent.
Providing and restoring a sane mask would mean we need to add an epsw/lpswe
pair to the mcount code. Only these two instruction would cost us ~120 cycles
which currently seems not necessary.
Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the function graph tracer is disabled we can skip three additional
instructions. So let's just do this.
So if function tracing is enabled but function graph tracing is
runtime disabled, we get away with a single unconditional branch.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE optimization to
the 64-bit and 31-bit vdso.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 fixes from Martin Schwidefsky:
"A bug fix for the vdso code, the loadparm for booting from SCSI is
added and the access permissions for the dasd module parameters are
corrected"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vdso: remove NULL pointer check from clock_gettime
s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
s390/dasd: Make module parameter visible in sysfs
commit 0944fe3f4a ("s390/mm: implement software referenced bits")
triggered another paging/storage key corruption. There is an
unhandled invalid->valid pte change where we have to set the real
storage key from the pgste.
When doing paging a guest page might be swapcache or swap and when
faulted in it might be read-only and due to a parallel scan old.
An do_wp_page will make it writeable and young. Due to software
reference tracking this page was invalid and now becomes valid.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
Since 3.12 or more precisely commit 0944fe3f4a ("s390/mm:
implement software referenced bits") guest storage keys get
corrupted during paging. This commit added another valid->invalid
translation for page tables - namely ptep_test_and_clear_young.
We have to transfer the storage key into the pgste in that case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
Currently the loadparm is only supported for CCW IPL. But also for SCSI
IPL it can be specified either on the HMC load panel respectively
z/VM console or via diagnose 308.
So fix this for SCSI and add the required sysfs attributes for reading the
IPL loadparm and for setting the loadparm for re-IPL.
With this patch the following two sysfs attributes are introduced:
- /sys/firmware/ipl/loadparm (for system that have been IPLed from SCSI)
- /sys/firmware/reipl/fcp/loadparm
Because the loadparm is now available for SCSI and CCW it is moved
now from "struct ipl_block_ccw" to the generic "struct ipl_list_hdr".
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.
Remove unnecessary arguments that stem from this.
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)
This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
e790d9ef6 KVM: add kvm_arch_sched_in
Signed-off-by: Radim KrÄmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).
Move them from individual files to a generic place in linux/kvm_types.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x). This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() is defined as :
#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less registers
are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e. using a global
register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
this_cpu_inc(y)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: linux390@de.ibm.com
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The radix tree rework removed all code that uses the gmap_rmap
and gmap_pgtable data structures. Remove these outdated definitions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Add an addressing limit to the gmap address spaces and only allocate
the page table levels that are needed for the given limit. The limit
is fixed and can not be changed after a gmap has been created.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Store the target address for the gmap segments in a radix tree
instead of using invalid segment table entries. gmap_translate
becomes a simple radix_tree_lookup, gmap_fault is split into the
address translation with gmap_translate and the part that does
the linking of the gmap shadow page table with the process page
table.
A second radix tree is used to keep the pointers to the segment
table entries for segments that are mapped in the guest address
space. On unmap of a segment the pointer is retrieved from the
radix tree and is used to carry out the segment invalidation in
the gmap shadow page table. As the radix tree can only store one
pointer, each host segment may only be mapped to exactly one
guest location.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The S390 architecture advertises support for HAVE_DMA_ATTRS when PCI is
enabled. Patches to unify some of the DMA API would like to rely on the
dma_alloc_attrs() and dma_free_attrs() functions to be provided when an
architecture supports DMA attributes.
Rename dma_alloc_coherent() and dma_free_coherent() to dma_alloc_attrs()
and dma_free_attrs() since they are functionally equivalent and alias
the former to the latter for compatibility.
For consistency with other architectures, also reuse the existing symbol
HAVE_DMA_ATTRS defined in arch/Kconfig instead of providing a duplicate.
Select it when PCI is enabled.
While at it, drop a redundant 'default n' from the PCI Kconfig symbol.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>