Conflicts:
kernel/Makefile
There are conflicts in kernel/Makefile due to file moving in the
scheduler tree - resolve them.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Completions already have their own header file: linux/completion.h
Move the implementation out of kernel/sched/core.c and into its own
file: kernel/sched/completion.c.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-x2y49rmxu5dljt66ai2lcfuw@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There are conflicts in lockdep.c due to RCU changes, and also the RCU
tree changes kernel/Makefile - so pre-merge it to ease the moving of
locking related .c files to kernel/locking/.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Negative message lengths make no sense -- so don't do negative queue
lenghts or identifier counts. Prevent them from getting negative.
Also change the underlying data types to be unsigned to avoid hairy
surprises with sign extensions in cases where those variables get
evaluated in unsigned expressions with bigger data types, e.g size_t.
In case a user still wants to have "unlimited" sizes she could just use
INT_MAX instead.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__wait_event_interruptible_lock_irq_timeout() needs the timeout
parameter passed instead of "ret".
This magically compiled since the only user has a local ret
variable. Luckily we got a build warning:
CC drivers/s390/scsi/zfcp_qdio.o
drivers/s390/scsi/zfcp_qdio.c: In function 'zfcp_qdio_sbal_get':
include/linux/wait.h:780:15: warning: 'ret' may be used uninitialized
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20131031114814.GB5551@osiris
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.c
See this upstream merge commit for more details:
52469b4fcd Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
this_cpu_sub() is implemented as negation and addition.
This patch casts the adjustment to the counter type before negation to
sign extend the adjustment. This helps in cases where the counter type
is wider than an unsigned adjustment. An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.
This patch specifically helps the following example:
unsigned int delta = 1
preempt_disable()
this_cpu_write(long_counter, 0)
this_cpu_sub(long_counter, delta)
preempt_enable()
Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff. This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
long_counter = 0 + 0xffffffff
Also apply the same cast to:
__this_cpu_sub()
__this_cpu_sub_return()
this_cpu_sub_return()
All percpu_test.ko passes, especially the following cases which
previously failed:
l -= ui_one;
__this_cpu_sub(long_counter, ui_one);
CHECK(l, long_counter, -1);
l -= ui_one;
this_cpu_sub(long_counter, ui_one);
CHECK(l, long_counter, -1);
CHECK(l, long_counter, 0xffffffffffffffff);
ul -= ui_one;
__this_cpu_sub(ulong_counter, ui_one);
CHECK(ul, ulong_counter, -1);
CHECK(ul, ulong_counter, 0xffffffffffffffff);
ul = this_cpu_sub_return(ulong_counter, ui_one);
CHECK(ul, ulong_counter, 2);
ul = __this_cpu_sub_return(ulong_counter, ui_one);
CHECK(ul, ulong_counter, 1);
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The wait_event_interruptible_lock_irq() macro is missing a
semi-colon which causes a build failure in the i915 DRM driver.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1382528455-29911-1-git-send-email-treding@nvidia.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
"Sorry I let so much accumulate, I was in Buffalo and wanted a few
things to cook in my tree for a while before sending to you. Anyways,
it's a lot of little things as usual at this stage in the game"
1) Make bonding MAINTAINERS entry reflect reality, from Andy
Gospodarek.
2) Fix accidental sock_put() on timewait mini sockets, from Eric
Dumazet.
3) Fix crashes in l2tp due to mis-handling of ipv4 mapped ipv6
addresses, from François CACHEREUL.
4) Fix heap overflow in __audit_sockaddr(), from the eagle eyed Dan
Carpenter.
5) tcp_shifted_skb() doesn't take handle FINs properly, from Eric
Dumazet.
6) SFC driver bug fixes from Ben Hutchings.
7) Fix TX packet scheduling wedge after channel change in ath9k driver,
from Felix Fietkau.
8) Fix user after free in BPF JIT code, from Alexei Starovoitov.
9) Source address selection test is reversed in
__ip_route_output_key(), fix from Jiri Benc.
10) VLAN and CAN layer mis-size netlink attributes, from Marc
Kleine-Budde.
11) Fix permission checks in sysctls to use current_euid() instead of
current_uid(). From Eric W Biederman.
12) IPSEC policies can go away while a timer is still pending for them,
add appropriate ref-counting to fix, from Steffen Klassert.
13) Fix mis-programming of FDR and RMCR registers on R8A7740 sh_eth
chips, from Nguyen Hong Ky and Simon Horman.
14) MLX4 forgets to DMA unmap pages on RX, fix from Amir Vadai.
15) IPV6 GRE tunnel MTU upper limit is miscalculated, from Oussama
Ghorbel.
16) Fix typo in fq_change(), we were assigning "initial quantum" to
"quantum". From Eric Dumazet.
17) Set a more appropriate sk_pacing_rate for non-TCP sockets, otherwise
FQ packet scheduler does not pace those flows properly. Also from
Eric Dumazet.
18) rtlwifi miscalculates packet pointers, from Mark Cave-Ayland.
19) l2tp_xmit_skb() can be called from process context, not just softirq
context, so we must always make sure to BH disable around it. From
Eric Dumazet.
20) On qdisc reset, we forget to purge the RB tree of SKBs in netem
packet scheduler. From Stephen Hemminger.
21) Fix info leak in farsync WAN driver ioctl() handler, from Dan
Carpenter and Salva Peiró.
22) Fix PHY reset and other issues in dm9000 driver, from Nikita
Kiryanov and Michael Abbott.
23) When hardware can do SCTP crc32 checksums, we accidently don't
disable the csum offload when IPSEC transformations have been
applied. From Fan Du and Vlad Yasevich.
24) Tail loss probing in TCP leaves the socket in the wrong congestion
avoidance state. From Yuchung Cheng.
25) In CPSW driver, enable NAPI before interrupts are turned on, from
Markus Pargmann.
26) Integer underflow and dual-assignment in YAM hamradio driver, from
Dan Carpenter.
27) If we are going to mangle a packet in tcp_set_skb_tso_segs() we must
unclone it. This fixes various hard to track down crashes in
drivers where the SKBs ->gso_segs was changing right from underneath
the driver during TX queueing. From Eric Dumazet.
28) Fix the handling of VLAN IDs, and in particular the special IDs 0
and 4095, in the bridging layer. From Toshiaki Makita.
29) Another info leak, this time in wanxl WAN driver, from Salva Peiró.
30) Fix race in socket credential passing, from Daniel Borkmann.
31) WHen NETLABEL is disabled, we don't validate CIPSO packets properly,
from Seif Mazareeb.
32) Fix identification of fragmented frames in ipv4/ipv6 UDP
Fragmentation Offload output paths, from Jiri Pirko.
33) Virtual Function fixes in bnx2x driver from Yuval Mintz and Ariel
Elior.
34) When we removed the explicit neighbour pointer from ipv6 routes a
slight regression was introduced for users such as IPVS, xt_TEE, and
raw sockets. We mix up the users requested destination address with
the routes assigned nexthop/gateway. From Julian Anastasov and
Simon Horman.
35) Fix stack overruns in rt6_probe(), the issue is that can end up
doing two full packet xmit paths at the same time when emitting
neighbour discovery messages. From Hannes Frederic Sowa.
36) davinci_emac driver doesn't handle IFF_ALLMULTI correctly, from
Mariusz Ceier.
37) Make sure to set TCP sk_pacing_rate after the first legitimate RTT
sample, from Neal Cardwell.
38) Wrong netlink attribute passed to xfrm_replay_verify_len(), from
Steffen Klassert.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits)
ax88179_178a: Add VID:DID for Samsung USB Ethernet Adapter
ax88179_178a: Correct the RX error definition in RX header
Revert "bridge: only expire the mdb entry when query is received"
tcp: initialize passive-side sk_pacing_rate after 3WHS
davinci_emac.c: Fix IFF_ALLMULTI setup
mac802154: correct a typo in ieee802154_alloc_device() prototype
ipv6: probe routes asynchronous in rt6_probe
netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper
ipv6: fill rt6i_gateway with nexthop address
ipv6: always prefer rt6i_gateway if present
bnx2x: Set NETIF_F_HIGHDMA unconditionally
bnx2x: Don't pretend during register dump
bnx2x: Lock DMAE when used by statistic flow
bnx2x: Prevent null pointer dereference on error flow
bnx2x: Fix config when SR-IOV and iSCSI are enabled
bnx2x: Fix Coalescing configuration
bnx2x: Unlock VF-PF channel on MAC/VLAN config error
bnx2x: Prevent an illegal pointer dereference during panic
bnx2x: Fix Maximum CoS estimation for VFs
drivers: net: cpsw: fix kernel warn during iperf test with interrupt pacing
...
We cap bitrate at YAM_MAXBITRATE in yam_ioctl(), but it could also be
negative. I don't know the impact of using a negative bitrate but let's
prevent it.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option USB
serial driver for new Huawei devices. Other than that, just some small
bug fixes for issues that people have reported (run-time and
build-time), nothing major.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlJgHZMACgkQMUfUDdst+yl5xgCePinICkasywmkWO0CgndCelCx
nyMAoIQ51ot0igMbmXoSAyq5u0EgVHmG
=/iU9
-----END PGP SIGNATURE-----
Merge tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes and new device ids for 3.12-rc6
The largest change here is a bunch of new device ids for the option
USB serial driver for new Huawei devices. Other than that, just some
small bug fixes for issues that people have reported (run-time and
build-time), nothing major"
* tag 'usb-3.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register
usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
usb/chipidea: fix oops on memory allocation failure
usb-storage: add quirk for mandatory READ_CAPACITY_16
usb: serial: option: blacklist Olivetti Olicard200
USB: quirks: add touchscreen that is dazzeled by remote wakeup
Revert "usb: musb: gadget: fix otg active status flag"
USB: quirks.c: add one device that cannot deal with suspension
USB: serial: option: add support for Inovia SEW858 device
USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well.
USB: support new huawei devices in option.c
usb: musb: start musb on the udc side, too
xhci: Fix spurious wakeups after S5 on Haswell
xhci: fix write to USB3_PSSEN and XUSB2PRM pci config registers
xhci: quirk for extra long delay for S4
xhci: Don't enable/disable RWE on bus suspend/resume.
Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
changed the conditional around the declaration of usb_nop_xceiv_register
from
#if defined(CONFIG_NOP_USB_XCEIV) ||
(defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
to
#if IS_ENABLED(CONFIG_NOP_USB_XCEIV)
While that looks the same, it is semantically different. The first expression
is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
code is built as module. The second expression is true if code depending on
CONFIG_NOP_USB_XCEIV if built as module or into the kernel.
As a result, the arm:allmodconfig build fails with
arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
`usb_nop_xceiv_register'
Fix the problem by reverting to the old conditional.
Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 3812c8c8f3 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some USB drive enclosures do not correctly report an
overflow condition if they hold a drive with a capacity
over 2TB and are confronted with a READ_CAPACITY_10.
They answer with their capacity modulo 2TB.
The generic layer cannot cope with that. It must be told
to use READ_CAPACITY_16 from the beginning.
Signed-off-by: Oliver Neukum <oneukum@suse.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add the new helper, prepare_to_wait_event() which should only be used
by ___wait_event().
prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state()
is true, otherwise it does prepare_to_wait/exclusive. This allows to
uninline the signal-pending checks in wait_event*() macros.
Also, it can initialize wait->private/func. We do not care if they were
already initialized, the values are the same. This also shaves a couple
of insns from the inlined code.
This obviously makes prepare_*() path a little bit slower, but we are
likely going to sleep anyway, so I think it makes sense to shrink .text:
text data bss dec hex filename
===================================================
before: 5126092 2959248 10117120 18202460 115bf5c vmlinux
after: 5124618 2955152 10117120 18196890 115a99a vmlinux
on my build.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131007161824.GA29757@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 4c663cfc ("wait: fix false timeouts when using
wait_event_timeout()") introduced the additional condition checks
after a timeout but only in the "slow" __wait*() paths.
wait_event_timeout(wq, CONDITION, 0) still returns 0 if CONDITION
is already true and we do not call __wait*().
Now that we have ___wait_cond_timeout() we can use it instead to
ensure that __ret will be properly updated.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131007183106.GA10973@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One bug fix and three reverts. The reverts back out the slightly
controversial feeding the entire device tree into the random pool and
the reserved-memory binding which isn't fully baked yet. Expect the
reserved-memory patches at least to resurface for v3.13. The bug fixes
removes a scary but harmless warning on SPARC that was introduced in the
v3.12 merge window. v3.13 will contain a proper fix that makes the new
code work on SPARC.
On the plus side, the diffstat looks *awesome*. I love removing lines of code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJSXbO9AAoJEJZcAETA+ZUxuH4IAI/KF71fq0/O7en7GHo6ofcZ
3nA3fUyYpnjehaqmKDXxix4TUgKTYU1ZP6tn8VfM6fn0QthR2YXPYAqiIaRot3ab
arCPvJ9N52CU04Ug8dqMPEuFlSqRfTYc0EVPTbdgv8GYWX+rjP4qmOWS13exorai
hJSRFmbyvwOVvRAl7KtkBPEZ3ri40mfkTrMs61v55GajhaZFyoTQgMMNhboUskI7
qztWiecw1stlvbfQEoN+BA11ohp5kDf4d5jeTCMNFx81liBbZHYfwWbmwYPCH7/z
6s4gX6PQeCuygdhQK65q1tYebTmRbFxkuf8P/tO7lEpmR3fKIVBr+oMJff0q9ww=
=8W6e
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree fixes and reverts from Grant Likely:
"One bug fix and three reverts. The reverts back out the slightly
controversial feeding the entire device tree into the random pool and
the reserved-memory binding which isn't fully baked yet. Expect the
reserved-memory patches at least to resurface for v3.13.
The bug fixes removes a scary but harmless warning on SPARC that was
introduced in the v3.12 merge window. v3.13 will contain a proper fix
that makes the new code work on SPARC.
On the plus side, the diffstat looks *awesome*. I love removing lines
of code"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
Revert "drivers: of: add initialization code for dma reserved memory"
Revert "ARM: init: add support for reserved memory defined by device tree"
Revert "of: Feed entire flattened device tree into the random pool"
of: fix unnecessary warning on missing /cpus node
This reverts commit 9d8eab7af7. There is
still no consensus on the bindings for the reserved memory and various
drawbacks of the proposed solution has been shown, so the best now is to
revert it completely and start again from scratch later.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
one trivial semicolon cleanup.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSXCYCAAoJEENa44ZhAt0hWKgP/3R5GUceKeDn9R0ENGPSpQ1/
dvsVmMccMGuJnyZSfUgoGb++6YY5rEjjj6epFlqXtkfoqUvNzDmw8nRO/Pkx+IAT
e/FyrDpcpbuPyOQeGZLIAC2hoQRUPsmayfBOIN+mW8Qu3vUYTKjs12QRqDi3EP6m
itJ07CfAX09LoiZ1S5QxSnEhPvR5MA7zy5ebgdk0QC+6tNcBWx7tOtCY7/BX4MnO
zZL2ZVzxZbHIT7HY+gYID4QxGHFf7JvGX9ATLh9HUzOom3c1XLtdDhH/6mONsTTL
BWTUJIa86DGJwY4fc6wDrOsC8DBo3o3YB98DUWUb6FQswQtx+PcyFg1dAhJuYFTQ
Risjpty4y/EVfUTjBCirf2R8BLCKZyUIFL40ZJvgwhKsH569hS5sVTXMPrQNmsuY
x7C17KJ1iabmtAswJCtM/aoeoodqZnAUg63aV+mbwQXQu9l06fx4UOo/TfG3tH1+
FxVVD3ord98nh77Nv+sGB7ek7x0d3XxEaP7pZscDqRTUx7TT7dXXQY9GC5qAjnfr
YhE8Exxmey+oZ3y6QTYI6scF5x8j0CJlSURfzDgOpKxYnSsdhgujGaQI++e9VF+W
pHAWRqAGsf3wkoMJKZI6DC3lZka81yiByeROSmk08FSVNp7SkVNjl6VC8cAxkVfM
nfNjy6fP/UQp/tcHp68R
=0XuA
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"Last batch of IB changes for 3.12: many mlx5 hardware driver fixes
plus one trivial semicolon cleanup"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Remove unnecessary semicolons
IB/mlx5: Ensure proper synchronization accessing memory
IB/mlx5: Fix alignment of reg umr gather buffers
IB/mlx5: Fix eq names to display nicely in /proc/interrupts
mlx5: Fix error code translation from firmware to driver
IB/mlx5: Fix opt param mask according to firmware spec
mlx5: Fix opt param mask for sq err to rts transition
IB/mlx5: Disable atomic operations
mlx5: Fix layout of struct mlx5_init_seg
mlx5: Keep polling to reclaim pages while any returned
IB/mlx5: Avoid async events on invalid port number
IB/mlx5: Decrease memory consumption of mr caches
mlx5: Remove checksum on command interface commands
IB/mlx5: Fix memory leak in mlx5_ib_create_srq
IB/mlx5: Flush cache workqueue before destroying it
IB/mlx5: Fix send work queue size calculation
Pull gcc "asm goto" miscompilation workaround from Ingo Molnar:
"This is the fix for the GCC miscompilation discussed in the following
lkml thread:
[x86] BUG: unable to handle kernel paging request at 00740060
The bug in GCC has been fixed by Jakub and the fix will be part of the
GCC 4.8.2 release expected to be released next week - so the quirk's
version test checks for <= 4.8.1.
The quirk is only added to compiler-gcc4.h and not to the higher level
compiler.h because all asm goto uses are behind a feature check"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
compiler/gcc4: Add quirk for 'asm goto' miscompilation bug
Pull drm fixes from Dave Airlie:
"All over the map..
- nouveau:
disable MSI, needs more work, will try again next merge window
- radeon:
audio + uvd regression fixes, dpm fixes, reset fixes
- i915:
the dpms fix might fix your haswell
And one pain in the ass revert, so we have VGA arbitration that when
implemented 4-5 years ago really hoped that GPUs could remove
themselves from arbitration completely once they had a kernel driver.
It seems Intel hw designers decided that was too nice a facility to
allow us to have so they removed it when they went on-die (so since
Ironlake at least). Now Alex Williamson added support for VGA
arbitration for newer GPUs however this now exposes itself to
userspace as requireing arbitration of GPU VGA regions and the X
server gets involved and disables things that it can't handle when VGA
access is possibly required around every operation.
So in order to not break userspace we just reverted things back to the
old known broken status so maybe we can try and design out way out.
Ville also had a patch to use stop machine for the two times Intel
needs to access VGA space, that might be acceptable with some rework,
but for now myself and Daniel agreed to just go back"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (23 commits)
Revert "i915: Update VGA arbiter support for newer devices"
Revert "drm/i915: Delay disabling of VGA memory until vgacon->fbcon handoff is done"
drm/radeon: re-enable sw ACR support on pre-DCE4
drm/radeon/dpm: disable bapm on TN asics
drm/radeon: improve soft reset on CIK
drm/radeon: improve soft reset on SI
drm/radeon/dpm: off by one in si_set_mc_special_registers()
drm/radeon/dpm/btc: off by one in btc_set_mc_special_registers()
drm/radeon: forever loop on error in radeon_do_test_moves()
drm/radeon: fix hw contexts for SUMO2 asics
drm/radeon: fix typo in CP DMA register headers
drm/radeon/dpm: disable multiple UVD states
drm/radeon: use hw generated CTS/N values for audio
drm/radeon: fix N/CTS clock matching for audio
drm/radeon: use 64-bit math to calculate CTS values for audio (v2)
drm/edid: catch kmalloc failure in drm_edid_to_speaker_allocation
Revert "drm/fb-helper: don't sleep for screen unblank when an oops is in progress"
drm/gma500: fix things after get/put page helpers
drm/nouveau/mc: disable msi support by default, it's busted in tons of places
drm/i915: Only apply DPMS to the encoder if enabled
...
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670
Implement a workaround suggested by Jakub Jelinek.
Reported-and-tested-by: Fengguang Wu <fengguang.wu@intel.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit 81b5c7bc8d.
Adding drm/i915 into the vga arbiter chain means that X (in a piece of
well-meant paranoia) will do a get/put on the vga decoding around
_every_ accel call down into the ddx. Which results in some nice
performance disasters [1]. This really breaks userspace, by disabling
DRI for everyone, and stops OpenGL from working, this isn't limited
to just the i915 but both the integrated and discrete GPUs on
multi-gpu systems, in other words this causes untold worlds of pain,
Ville tried to come up with a Great Hack to fiddle the required VGA
I/O ops behind everyone's back using stop_machine, but that didn't
really work out [2]. Given that we're fairly late in the -rc stage for
such games let's just revert this all.
One thing we might want to keep is to delay the disabling of the vga
decoding until the fbdev emulation and the fbcon screen is set up. If
we kill vga mem decoding beforehand fbcon can end up with a white
square in the top-left corner it tried to save from the vga memory for
a seamless transition. And we have bug reports on older platforms
which seem to match these symptoms.
But again that's something to play around with in -next.
References: [1] http://lists.x.org/archives/xorg-devel/2013-September/037763.html
References: [2] http://www.spinics.net/lists/intel-gfx/msg34062.html
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
non-x86 platforms, in particular MIPS and ARM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABCAAGBQJSVvO/AAoJENNvdpvBGATwNZ4P+wadRWY/Gdz/p9332qdVrGYs
nP4DPWSg+n3RH/fOnacEwHF5vqapTe03G82NriCaVGFP8O9j7bo6ByMKKkIR7yvr
4sHUX4YMc/DwchaIHH2xp8fQoMc3Mv7mn8bodTtPXgveeldEvtuUQM0q+j4DXZUT
qSLMGElgJYrpIf2Cm8JAIBkt2QuzpZPPX7Z6glZunpvfLSMmgn3Vj2ilNEx1YCFH
v+Rk1ZYLjg2LzUYqaO7HOXlRJqmE10I7ZmNvPXJZ9fuPmGYD9FU6WeHhmIAFYdFw
V6bAzou+LbnuNVoW6yiDvrKcOXgh2Spbk6SaKVSrcjVPfc87ocNzGWI4OTfNy1xI
Kv9u4YfU3pIUWPDGx0mvT/KXAXl/PGVfu7bYXDcN2I2tqlrbBPdIWqpFB2eTn7/j
//XbatoT6gGZTuseCKhYXWpG8AE5pCfbjGnd9il21fvlUDdkIq42wAs96qjc6Ruj
tPCi5yYzLiHsn4eau+SJqI1KxPLf6YJw9Qo+f70FGl63wXJU9Vr07ID2rGTwXm1m
Qf1joTtx900PvfzUaD0ODbQZaTbX6ebSOkriKpKWYwg+26Gdc7JAxIVI3HDOlOR+
++r1M4ERwDic/xdVsB6Mngmop3d1BeNU2IAoiRDZwcJpS1+MLivlIbd1PjBAt0bU
+oOm+wseHEzSnlgucQ0g
=qnTe
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull /dev/random changes from Ted Ts'o:
"These patches are designed to enable improvements to /dev/random for
non-x86 platforms, in particular MIPS and ARM"
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: allow architectures to optionally define random_get_entropy()
random: run random_int_secret_init() run after all late_initcalls
Allow architectures which have a disabled get_cycles() function to
provide a random_get_entropy() function which provides a fine-grained,
rapidly changing counter that can be used by the /dev/random driver.
For example, an architecture might have a rapidly changing register
used to control random TLB cache eviction, or DRAM refresh that
doesn't meet the requirements of get_cycles(), but which is good
enough for the needs of the random driver.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
It's helpful for a driver to put the pci slot name in its interrupt
names, so /proc/interrupts will show the pci slot of the device.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
The layout of struct health_buffer was not according to firmware
specification. Fix it to comply.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Checksum calculations consume CPU resources and can be significant to
the rate of resource creation/destruction.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Shared faults can lead to lots of unnecessary page migrations,
slowing down the system, and causing private faults to hit the
per-pgdat migration ratelimit.
This patch adds sysctl numa_balancing_migrate_deferred, which specifies
how many shared page migrations to skip unconditionally, after each page
migration that is skipped because it is a shared fault.
This reduces the number of page migrations back and forth in
shared fault situations. It also gives a strong preference to
the tasks that are already running where most of the memory is,
and to moving the other tasks to near the memory.
Testing this with a much higher scan rate than the default
still seems to result in fewer page migrations than before.
Memory seems to be somewhat better consolidated than previously,
with multi-instance specjbb runs on a 4 node system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With the scan rate code working (at least for multi-instance specjbb),
the large hammer that is "sched: Do not migrate memory immediately after
switching node" can be replaced with something smarter. Revert temporarily
migration disabling and all traces of numa_migrate_seq.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-61-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
With scan rate adaptions based on whether the workload has properly
converged or not there should be no need for the scan period reset
hammer. Get rid of it.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adjust numa_scan_period in task_numa_placement, depending on how much
useful work the numa code can do. The more local faults there are in a
given scan window the longer the period (and hence the slower the scan rate)
during the next window. If there are excessive shared faults then the scan
period will decrease with the amount of scaling depending on whether the
ratio of shared/private faults. If the preferred node changes then the
scan rate is reset to recheck if the task is properly placed.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-59-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Due to the way the pid is truncated, and tasks are moved between
CPUs by the scheduler, it is possible for the current task_numa_fault
to group together tasks that do not actually share memory together.
This patch adds a few easy sanity checks to task_numa_fault, joining
tasks together if they share the same tsk->mm, or if the fault was on
a page with an elevated mapcount, in a shared VMA.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-57-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.
This has the potential to break numa grouping, and is trivial
to fix.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch uses the fraction of faults on a particular node for both task
and group, to figure out the best node to place a task. If the task and
group statistics disagree on what the preferred node should be then a full
rescan will select the node with the best combined weight.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-50-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A newly spawned thread inside a process should stay on the same
NUMA node as its parent. This prevents processes from being "torn"
across multiple NUMA nodes every time they spawn a new thread.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-49-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
And here's a little something to make sure not the whole world ends up
in a single group.
As while we don't migrate shared executable pages, we do scan/fault on
them. And since everybody links to libc, everybody ends up in the same
group.
Suggested-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1381141781-10992-47-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While parallel applications tend to align their data on the cache
boundary, they tend not to align on the page or THP boundary.
Consequently tasks that partition their data can still "false-share"
pages presenting a problem for optimal NUMA placement.
This patch uses NUMA hinting faults to chain tasks together into
numa_groups. As well as storing the NID a task was running on when
accessing a page a truncated representation of the faulting PID is
stored. If subsequent faults are from different PIDs it is reasonable
to assume that those two tasks share a page and are candidates for
being grouped together. Note that this patch makes no scheduling
decisions based on the grouping information.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1381141781-10992-44-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change the per page last fault tracking to use cpu,pid instead of
nid,pid. This will allow us to try and lookup the alternate task more
easily. Note that even though it is the cpu that is store in the page
flags that the mpol_misplaced decision is still based on the node.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1381141781-10992-43-git-send-email-mgorman@suse.de
[ Fixed build failure on 32-bit systems. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Use the new stop_two_cpus() to implement migrate_swap(), a function that
flips two tasks between their respective cpus.
I'm fairly sure there's a less crude way than employing the stop_two_cpus()
method, but everything I tried either got horribly fragile and/or complex. So
keep it simple for now.
The notable detail is how we 'migrate' tasks that aren't runnable
anymore. We'll make it appear like we migrated them before they went to
sleep. The sole difference is the previous cpu in the wakeup path, so we
override this.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1381141781-10992-39-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce stop_two_cpus() in order to allow controlled swapping of two
tasks. It repurposes the stop_machine() state machine but only stops
the two cpus which we can do with on-stack structures and avoid
machine wide synchronization issues.
The ordering of CPUs is important to avoid deadlocks. If unordered then
two cpus calling stop_two_cpus on each other simultaneously would attempt
to queue in the opposite order on each CPU causing an AB-BA style deadlock.
By always having the lowest number CPU doing the queueing of works, we can
guarantee that works are always queued in the same order, and deadlocks
are avoided.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
[ Implemented deadlock avoidance. ]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1381141781-10992-38-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a preferred node is selected for a tasks there is an attempt to migrate
the task to a CPU there. This may fail in which case the task will only
migrate if the active load balancer takes action. This may never happen if
the conditions are not right. This patch will check at NUMA hinting fault
time if another attempt should be made to migrate the task. It will only
make an attempt once every five seconds.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-34-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is a 90% regression observed with a large Oracle performance test
on a 4 node system. Profiles indicated that the overhead was due to
contention on sp_lock when looking up shared memory policies. These
policies do not have the appropriate flags to allow them to be
automatically balanced so trapping faults on them is pointless. This
patch skips VMAs that do not have MPOL_F_MOF set.
[riel@redhat.com: Initial patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-and-tested-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-32-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ideally it would be possible to distinguish between NUMA hinting faults that
are private to a task and those that are shared. If treated identically
there is a risk that shared pages bounce between nodes depending on
the order they are referenced by tasks. Ultimately what is desirable is
that task private pages remain local to the task while shared pages are
interleaved between sharing tasks running on different nodes to give good
average performance. This is further complicated by THP as even
applications that partition their data may not be partitioning on a huge
page boundary.
To start with, this patch assumes that multi-threaded or multi-process
applications partition their data and that in general the private accesses
are more important for cpu->memory locality in the general case. Also,
no new infrastructure is required to treat private pages properly but
interleaving for shared pages requires additional infrastructure.
To detect private accesses the pid of the last accessing task is required
but the storage requirements are a high. This patch borrows heavily from
Ingo Molnar's patch "numa, mm, sched: Implement last-CPU+PID hash tracking"
to encode some bits from the last accessing task in the page flags as
well as the node information. Collisions will occur but it is better than
just depending on the node information. Node information is then used to
determine if a page needs to migrate. The PID information is used to detect
private/shared accesses. The preferred NUMA node is selected based on where
the maximum number of approximately private faults were measured. Shared
faults are not taken into consideration for a few reasons.
First, if there are many tasks sharing the page then they'll all move
towards the same node. The node will be compute overloaded and then
scheduled away later only to bounce back again. Alternatively the shared
tasks would just bounce around nodes because the fault information is
effectively noise. Either way accounting for shared faults the same as
private faults can result in lower performance overall.
The second reason is based on a hypothetical workload that has a small
number of very important, heavily accessed private pages but a large shared
array. The shared array would dominate the number of faults and be selected
as a preferred node even though it's the wrong decision.
The third reason is that multiple threads in a process will race each
other to fault the shared page making the fault information unreliable.
Signed-off-by: Mel Gorman <mgorman@suse.de>
[ Fix complication error when !NUMA_BALANCING. ]
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-30-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>