A quiet release for regmap, some cleanups, fixes and:
- Improved node coalescing for rbtree, reducing memory usage and
improving performance during syncs.
- Support for registering multiple register patches.
- A quirk for handling interrupts that need to be clear when masked
in regmap-irq.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSI6GWAAoJELSic+t+oim9Yo0P/iF0kj93XcN93urEjZ+TYWsd
EfeqRv0AB+GB6WB7KR7Tg/bkYhH7xdm4pmTRHcSOZpwDu6tSyGZYD3lgnc9ZTeBF
W4V5TLHyZu+VLORQcR3dQgzyzSHe+JtwVBYJPoA+JUAMSaKSchRZ0O8bQ4pw0CCH
COw+UoSEt2/eBiVRkp5XkUZp6tb2jChRva+lZbTfEa3rBNFNnJ7dzlQAKx80Sc80
ygK3nmLNfux7ZloeghOfHJFlCuo4Bf0u44lyOucOj+4ZgHq0b0CW6LZGGQBc2d9I
6iVu2GZ81wbVtUg1mnTccmhVCd0MSFCytQrH9qYM7H/BK0L3gjMSnl6xJkBVItHx
LuicibmdNNue6ToMsWS8nNQLDOqHC8p/RTJc+JY+9EqZF1e78/EIox42XAH/60mG
PqRRtVtOsKreUzcfnpO4J6zhA8FCjgr4BsuQ7DXkl1SiKcikbIpzb+wlUxszD25m
SiH4NsHzqwTv0AE4vIvfvz02cQwyPppBhPGJzmeMtlq45rhsZqhuhhPLu/XWY1H8
n0uK+q0rMXtr7NZJArFNTrgKQet+gZTvJqlmi7Zl2QS7Zd4BAPFgOz7HpaF8lH4i
9faUZgms6t4QmUncEGjOqyS62GiTHAApG9gA1UVE0xWT5q//zHN0BZN+rHx3aAR1
TSwN/gxtLfCgYeWy6BWi
=um4f
-----END PGP SIGNATURE-----
Merge tag 'regmap-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"A quiet release for regmap, some cleanups, fixes and:
- Improved node coalescing for rbtree, reducing memory usage and
improving performance during syncs.
- Support for registering multiple register patches.
- A quirk for handling interrupts that need to be clear when masked
in regmap-irq"
* tag 'regmap-v3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: rbtree: Make cache_present bitmap per node
regmap: rbtree: Reduce number of nodes, take 2
regmap: rbtree: Simplify adjacent node look-up
regmap: debugfs: Fix continued read from registers file
regcache-rbtree: Fix reg_stride != 1
regmap: Allow multiple patches to be registered
regmap: regcache: allow read-only regs to be cached
regmap: fix regcache_reg_present() for empty cache
regmap: core: allow a virtual range to cover its own data window
regmap: irq: document mask/wake_invert flags
regmap: irq: make flags bool and put them in a bitfield
regmap: irq: Allow to acknowledge masked interrupts during initialization
regmap: Provide __acquires/__releases annotations
Merge lockref infrastructure code by me and Waiman Long.
I already merged some of the preparatory patches that didn't actually do
any semantic changes earlier, but this merges the actual _reason_ for
those preparatory patches.
The "lockref" structure is a combination "spinlock and reference count"
that allows optimized reference count accesses. In particular, it
guarantees that the reference count will be updated AS IF the spinlock
was held, but using atomic accesses that cover both the reference count
and the spinlock words, we can often do the update without actually
having to take the lock.
This allows us to avoid the nastiest cases of spinlock contention on
large machines under heavy pathname lookup loads. When updating the
dentry reference counts on a large system, we'll still end up with the
cache line bouncing around, but that's much less noticeable than
actually having to spin waiting for the lock.
* lockref:
lockref: implement lockless reference count updates using cmpxchg()
lockref: uninline lockref helper functions
vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
lockref: add 'lockref_get_or_lock() helper
Instead of taking the spinlock, the lockless versions atomically check
that the lock is not taken, and do the reference count update using a
cmpxchg() loop. This is semantically identical to doing the reference
count update protected by the lock, but avoids the "wait for lock"
contention that you get when accesses to the reference count are
contended.
Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
Even when the lockref reference counts are updated atomically with
cmpxchg, the fact that they also verify the state of the spinlock means
that the lockless updates can never happen while somebody else holds the
spinlock.
So while "lockref_put_or_lock()" looks a lot like just another name for
"atomic_dec_and_lock()", and both optimize to lockless updates, they are
fundamentally different: the decrement done by atomic_dec_and_lock() is
truly independent of any lock (as long as it doesn't decrement to zero),
so a locked region can still see the count change.
The lockref structure, in contrast, really is a *locked* reference
count. If you hold the spinlock, the reference count will be stable and
you can modify the reference count without using atomics, because even
the lockless updates will see and respect the state of the lock.
In order to enable the cmpxchg lockless code, the architecture needs to
do three things:
(1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
in an aligned u64, and have a "cmpxchg()" implementation that works
on such a u64 data type.
(2) define a helper function to test for a spinlock being unlocked
("arch_spin_value_unlocked()")
(3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
Kconfig file.
This enables it for x86-64 (but not 32-bit, we'd need to make sure
cmpxchg() turns into the proper cmpxchg8b in order to enable it for
32-bit mode).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They aren't very good to inline, since they already call external
functions (the spinlock code), and we're going to create rather more
complicated versions of them that can do the reference count updates
locklessly.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
and re-implements it using the lockref infrastructure instead. It also
adds a lot of comments about what is actually going on, because turning
a dentry that was looked up using RCU into a long-lived reference
counted entry is one of the more subtle parts of the rcu walk.
We also used to be _particularly_ subtle in unlazy_walk() where we
re-validate both the dentry and its parent using the same sequence
count. We used to do it by nesting the locks and then verifying the
sequence count just once.
That was silly, because nested locking is expensive, but the sequence
count check is not. So this just re-validates the dentry and the parent
separately, avoiding the nested locking, and making the lockref lookup
possible.
Acked-by: Waiman Long <waiman.long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A valid parent pointer is always going to have a non-zero reference
count, but if we look up the parent optimistically without locking, we
have to protect against the (very unlikely) race against renaming
changing the parent from under us.
We do that by using lockref_get_not_zero(), and then re-checking the
parent pointer after getting a valid reference.
[ This is a re-implementation of a chunk from the original patch by
Waiman Long: "dcache: Enable lockless update of dentry's refcount".
I've completely rewritten the patch-series and split it up, but I'm
attributing this part to Waiman as it's close enough to his earlier
patch - Linus ]
Signed-off-by: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This behaves like "lockref_get_not_zero()", but instead of doing nothing
if the count was zero, it returns with the lock held.
This allows callers to revalidate the lockref-protected data structure
if required even if the count was zero to begin with, and possibly
increment the count if it passes muster.
In particular, the dentry code wants this when it wants to turn an
RCU-protected dentry into a stable refcounted one: if the dentry count
it zero, but the sequence number still validates the dentry, we can take
a reference to it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a bug fix for the pm80xx driver. It turns out that when the new
hardware support was added in 3.10 the IO command size was kept at the old
hard coded value. This means that the driver attaches to some new cards and
then simply hangs the system.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJSIh8wAAoJEDeqqVYsXL0MCOIH/3Ii/4xKN7BK/G7UYVj7QuIu
lxmshuc6FUJJkg4fZiV3oHQgkYiUoOOYTVWg+rEKycE1XZS8b3E5BVTlM2+NHezo
OcjFmctDb5HrElbBL7BrsJwNwSeSL+ATZEqPuOoXQ+CIJ9pkFwm3u1ernDLsM0bB
PuDRn1duAbyUscHNqYsInpg2a21F1cuoLIzz/ziHgXtjRre30An2wZjmNVwDKeaY
UhnCvjUy37LFFWL3mLVaS0fhkCS484uKRyloX0FJdLgtfzGvOFGF01f02gmcziti
o0+PqIhV2wPvGpiNea761JN5opxc/IhhhPapR0kaj9Qig79TP9wjEZ8ynnQvvG4=
=i73i
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fix from James Bottomley:
"This is a bug fix for the pm80xx driver. It turns out that when the
new hardware support was added in 3.10 the IO command size was kept at
the old hard coded value. This means that the driver attaches to some
new cards and then simply hangs the system"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
[SCSI] pm80xx: fix Adaptec 71605H hang
Pull x86 boot fix from Peter Anvin:
"A single very small boot fix for very large memory systems (> 0.5T)"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAM
Pull slave-dma fix from Vinod Koul:
"A fix for resolving TI_EDMA driver's build error in allmodconfig to
have filter function built in""
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dma/Kconfig: TI_EDMA needs to be boolean
Pull networking fixes from David Miller:
1) There was a simplification in the ipv6 ndisc packet sending
attempted here, which avoided using memory accounting on the
per-netns ndisc socket for sending NDISC packets. It did fix some
important issues, but it causes regressions so it gets reverted here
too. Specifically, the problem with this change is that the IPV6
output path really depends upon there being a valid skb->sk
attached.
The reason we want to do this change in some form when we figure out
how to do it right, is that if a device goes down the ndisc_sk
socket send queue will fill up and block NDISC packets that we want
to send to other devices too. That's really bad behavior.
Hopefully Thomas can come up with a better version of this change.
2) Fix a severe TCP performance regression by reverting a change made
to dev_pick_tx() quite some time ago. From Eric Dumazet.
3) TIPC returns wrongly signed error codes, fix from Erik Hugne.
4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
skb->sk too early. Fix from Li Hongjun.
5) RAW ipv4 sockets can use the wrong routing key during lookup, from
Chris Clark.
6) Similar to #1 revert an older change that tried to use plain
alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
mark which needs to see the skb->sk for such frames. From Phil
Oester.
7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
specifically in the handling of virtual functions.
8) IPSEC path error propagations to sockets is not done properly when
we have v4 in v6, and v6 in v4 type rules. Fix from Hannes Frederic
Sowa.
9) Fix missing channel context release in mac80211, from Johannes Berg.
10) Fix network namespace handing wrt. SCM_RIGHTS, from Andy
Lutomirski.
11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
drivers. From Michal Schmidt.
12) Hopefully a complete and correct fix for the genetlink dump locking
and module reference counting. From Pravin B Shelar.
13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.
14) Fix handling of timestamp offset when restoring a snapshotted TCP
socket. From Andrew Vagin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
net: fec: fix time stamping logic after napi conversion
net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
mISDN: return -EINVAL on error in dsp_control_req()
net: revert 8728c544a9 ("net: dev_pick_tx() fix")
Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
ipv4 tunnels: fix an oops when using ipip/sit with IPsec
tipc: set sk_err correctly when connection fails
tcp: tcp_make_synack() should use sock_wmalloc
bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
ipv6: Don't depend on per socket memory for neighbour discovery messages
ipv4: sendto/hdrincl: don't use destination address found in header
tcp: don't apply tsoffset if rcv_tsecr is zero
tcp: initialize rcv_tstamp for restored sockets
net: xilinx: fix memleak
net: usb: Add HP hs2434 device to ZLP exception table
net: add cpu_relax to busy poll loop
net: stmmac: fixed the pbl setting with DT
genl: Hold reference on correct module while netlink-dump.
genl: Fix genl dumpit() locking.
xfrm: Fix potential null pointer dereference in xdst_queue_output
...
Filtering capabilities on my work email are pretty much non-existent and this
has turned out to be something of a firehose...
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This contains two Oops fixes (opti9xx and HD-audio) and a simple
fixup for an Acer laptop. All marked as stable patches.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJSIEmoAAoJEGwxgFQ9KSmkKJEP/AtElmlf8PtQWx1j98ojY5wH
2NsDqMS7/XOrOiUcQJhng2aRWN0gRwR1SVeOwJpBr3u3cNiGNI+Hu4BdTL6djsBX
muFNxJkA70TdHd0+/XrQF/uTX1gG7R5B+Dq3M3fU5T4DTk75f39vEGS+ZN+eSwZW
Bf1jlogvVcoTW678m4XOtZYYVCPLTGP0+El3TNAicdhPQCPI7QFvb133eo+WzV/r
lxk4/JT94wGnEyBL5NC+FyWYviHy+WJcSeAst+3uva2DELbpPN4iCjkTt6kWl0er
BBzYemVHfAmCBoGi7m8+f8ThHvm1w2kGC1Y5aOakZQhD/Gtu1AzTtVORaZiM8yZT
0M7tWq2q2+1AEx+X0dGZgZsV+g50UExnCeh6vpVU8shKSGUYabl4CE42bTB/lLWu
oz0cSKM/kBe/GsM221dTHrxyhYSQrwrdG1kwwqL/fZ/ED9yOmtE/KqnMfpGPXCGV
Ppt7l15EeTNy1AJS4islt8E1iuIrEpbHIWMAVMLa3ZMftfGymhylq6oRiH22uvcl
kOTGErIxMML96HQr4q0Td4Rl8xjqBsDwl9marvhb92mx9Tioaz5EqTa/+WL8oRwI
/PA+eTkdYqIIPXbCWUCEnnZkZouVh4QiIugfUIDBk3vgGA9EpnOtSCgn1DEHTH6V
FENCbeAG0gC8Brw/IY33
=c73B
-----END PGP SIGNATURE-----
Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup
for an Acer laptop. All marked as stable patches"
* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: opti9xx: Fix conflicting driver object name
ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n
ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
Two straggling fixes that I had missed as they were posted a couple of
weeks ago, causing problems with interrupts (breaking them completely)
on the CSR SiRF platforms.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJSH/KHAAoJEIwa5zzehBx3IEUQAIby2mOq5BGo0pss8Hv2yvBJ
Q71zTfPB9ag1fgPI1Tmz8T0zWxt3Zl7ynXYGUo43tMaOgCVZUgK5UKOT255DiF8y
7N7+RJCf3obHDh/3RfBZL3fu75yC8mkQHu67/fRnVVun59MhqsAmxWJLZkoeC0O2
8AjIULFCN+OaLkqQx75Ti0PV5KgQIW559sx1JLnDaPq0siS7FMOIpMGxQFQvXuLc
JFNWKazUSzHGZyAuXRMRs7+dzzuVbBaPuecLea2GlFqpRsUsEMUdsvWWhYwCZhRp
UZ+dP88D9d7XZonjn/KIlEn03X1NglsSg0yf+7Ad11cOHqAHHeZh1xHSJTLFUySR
XslNsLy5nifaxphhZIfkYgem+VMY4xYLQIY8ETBSfNhZnplLMLYxLLkTUEbvXPS5
y50eSgBFnnpBktk8qaCQ0R1/sPKNufHYBkdWbBXxUCn8pDAFJnrBAFjJAZpAqZJu
9TXOEApGcH+yQdQ+V5yKc6ln8mJUnXKLR6IHoa9z+LosZEqf9uQCcSrkE1Ml7or6
mwNfvph4ka4/hWNxlHvUyTNZbtzwLkNzd13YHmo4c5zRmigLiW/ldf/4fszdxCZN
KcrUvqfDey90Gg5rKktqrfv4hXPKWGYE9cMKv4eszQT1j06I6w7NkfXg4GlIDs0B
kSI5NUFNVFF30i56+SPM
=dbsG
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Two straggling fixes that I had missed as they were posted a couple of
weeks ago, causing problems with interrupts (breaking them completely)
on the CSR SiRF platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm: prima2: drop nr_irqs in mach as we moved to linear irqdomain
irqchip: sirf: move from legacy mode to linear irqdomain
Pull drm fixes from Dave Airlie:
"Since we are getting to the pointy end, one i915 black screen on some
machines, and one vmwgfx stop userspace ability to nuke the VM,
There might be one or two ati or nouveau fixes trickle in before
final, but I think this should pretty much be it"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Split GMR2_REMAP commands if they are to large
drm/i915: ivb: fix edp voltage swing reg val
Pull input layer updates from Dmitry Torokhov:
"Just a couple of new IDs in Wacom and xpad drivers, i8042 is now
disabled on ARC, and data checks in Elantech driver that were overly
relaxed by the previous patch are now tightened"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - disable the driver on ARC platforms
Input: xpad - add signature for Razer Onza Classic Edition
Input: elantech - fix packet check for v3 and v4 hardware
Input: wacom - add support for 0x300 and 0x301
Commit dc975382 "net: fec: add napi support to improve proformance"
converted the fec driver to the napi model. However, that commit
forgot to remove the call to skb_defer_rx_timestamp which is only
needed in non-napi drivers.
(The function napi_gro_receive eventually calls netif_receive_skb,
which in turn calls skb_defer_rx_timestamp.)
This patch should also be applied to the 3.9 and 3.10 kernels.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While looking into MLDv1/v2 code, I noticed that bridging code does
not convert it's max delay into jiffies for MLDv2 messages as we do
in core IPv6' multicast code.
RFC3810, 5.1.3. Maximum Response Code says:
The Maximum Response Code field specifies the maximum time allowed
before sending a responding Report. The actual time allowed, called
the Maximum Response Delay, is represented in units of milliseconds,
and is derived from the Maximum Response Code as follows: [...]
As we update timers that work with jiffies, we need to convert it.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Linus Lüssing <linus.luessing@web.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If skb->len is too short then we should return an error. Otherwise we
read beyond the end of skb->data for several bytes.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 8728c544a9 ("net: dev_pick_tx() fix") and commit
b6fe83e952 ("bonding: refine IFF_XMIT_DST_RELEASE capability")
are quite incompatible : Queue selection is disabled because skb
dst was dropped before entering bonding device.
This causes major performance regression, mainly because TCP packets
for a given flow can be sent to multiple queues.
This is particularly visible when using the new FQ packet scheduler
with MQ + FQ setup on the slaves.
We can safely revert the first commit now that 416186fbf8
("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
properly caps the queue_index.
Reported-by: Xi Wang <xii@google.com>
Diagnosed-by: Xi Wang <xii@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1f324e3887.
It seems to cause regressions, and in particular the output path
really depends upon there being a socket attached to skb->sk for
checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2().
Reported-by: Stephen Warren <swarren@wwwdotorg.org>
Reported-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3d7b46cd20 (ip_tunnel: push generic protocol handling to
ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on
an IPv4 over IPv4 tunnel.
xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But
this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb).
Signed-off-by: Li Hongjun <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
the netfilter owner match lost its ability to block the SYNACK packet on
outbound listening sockets. Revert the change, restoring the owner match
functionality.
This closes netfilter bugzilla #847.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we would still potentially suffer multicast packet loss if there
is just either an IGMP or an MLD querier: For the former case, we would
possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
because we are currently assuming that if either an IGMP or MLD querier
is present that the other one is present, too.
This patch makes the behaviour and fix added in
"bridge: disable snooping if there is no querier" (b00589af3b)
to also work if there is either just an IGMP or an MLD querier on the
link: It refines the deactivation of the snooping to be protocol
specific by using separate timers for the snooped IGMP and MLD queries
as well as separate timers for our internal IGMP and MLD queriers.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull cgroup fix from Tejun Heo:
"During the percpu reference counting update which was merged during
v3.11-rc1, the cgroup destruction path was updated so that a cgroup in
the process of dying may linger on the children list, which was
necessary as the cgroup should still be included in child/descendant
iteration while percpu ref is being killed.
Unfortunately, I forgot to update cgroup destruction path accordingly
and cgroup destruction may fail spuriously with -EBUSY due to
lingering dying children even when there's no live child left - e.g.
"rmdir parent/child parent" will usually fail.
This can be easily fixed by iterating through the children list to
verify that there's no live child left. While this is very late in
the release cycle, this bug is very visible to userland and I believe
the fix is relatively safe.
Thanks Hugh for spotting and providing fix for the issue"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix rmdir EBUSY regression in 3.11
Pull workqueue fix from Tejun Heo:
"This contains one fix which could lead to system-wide lockup on
!PREEMPT kernels. It's very late in the cycle but this definitely is
a -stable material.
The problem is that workqueue worker tasks may process unlimited
number of work items back-to-back without every yielding inbetween.
This usually isn't noticeable but a work item which re-queues itself
waiting for someone else to do something can deadlock with
stop_machine. stop_machine will ensure nothing else happens on all
other cpus and the requeueing work item will reqeueue itself
indefinitely without ever yielding and thus preventing the CPU from
entering stop_machine.
Kudos to Jamie Liu for spotting and diagnosing the problem. This can
be trivially fixed by adding cond_resched() after processing each work
item"
* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: cond_resched() after processing each work item
- Stable patch to fix a highmem-related data corruption issue on 32-bit
ARM platforms
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSH0vKAAoJEGcL54qWCgDyt+8P/3gnDlAVNuu2v0nASVKNc3Xw
i0DDehSciUlEKf2q7RzXsq7lIpt2zunjZ7iRgZ0/VLI4aQ8h2FSbUx/rhOQk4rbw
vO5cqmr3Y5cfIBu5k2rVZwvAd9uubsg81Oa0HMbWJNHJ1danLH/7Ztt+iSsdWJ47
s0fySeyZka8otSCQu8gxSAJeKW8XTUqANu47plTTZSvOZlalL8FA0OwZ8i9D27+k
Oc0hKsPOuUepGhiF4ivF94vAb2j/nN7g4vOH7+fI5YnHmGvEUqwM4+O6x028QEpQ
CpNQchKf/cXYL8Bpyn9VN+d1nb+3/aBCfjYUN5skDrYmSdY6JNI8hCfwrqIJK3mr
KoPwbKEgKjEMiIZV2ba9pbYA37KMMZyG3ttxbBiU2LFvJIuZwtBsb2wKcXD7ywXt
gB/hk3BRwrIyTrIcVuWPsxo1hNlvn/Yz+jtG2OVAXUufkedYbr5ogv9WaGZSrVKY
lMwzcvM8eTWLZ70wPjP8qij3AO0RTJat1nbZJ8a9y9QUjreA+t9y84d6x3s3SxZp
QxFHG2i4Qbj+ZDGKZ7aFeTlbRjOa0vpGYjFI84wM8LzRiU8b3cyCjkqVPKWLisCM
70f/Q35pt6CiivNM+u6Wd18nhStoPHvXbu04w294/93zntSwvAyKpN+1G7zUfYbB
iP7bgAjv2+Nslaap3x4B
=P59j
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfix from Trond Myklebust:
"Stable patch to fix a highmem-related data corruption issue on 32-bit
ARM platforms"
* tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Fix memory corruption issue on 32-bit highmem systems
This fixes the piglit test texturing/max-texture-size
causing the VM to die due to a too large SVGA command.
Signed-off-by: Jakob Bornecrantz <jakob@vmware.com>
Reviewed-by: Biran Paul <brianp@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@gmail.com>
Just a one-line patch to fix a black screen issue on rare ivb machines,
cc: stable. Normally I'd just shovel this into the -next pull request this
late in the -rc cycle, but Linus was making noises about not getting real
fixes which are cc: stable. So here we go ;-)
* tag 'drm-intel-fixes-2013-08-30' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: ivb: fix edp voltage swing reg val
Fix the typo introduced in
commit 1a2eb4604b
Author: Keith Packard <keithp@keithp.com>
Date: Wed Nov 16 16:26:07 2011 -0800
drm/i915: Hook up Ivybridge eDP
This fixes eDP link-training failures and cases where all voltage swing
/pre-emphasis levels were tried and failed during clock recovery and -
as a fallback - we go on to do channel equalization with the last voltage
swing/pre-emphasis level which will succeed. Both issues can lead to a
blank screen.
v2:
- improve commit message
CC: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64880
Tested-by: Jeremy Moles <cubicool@gmail.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Steffen Klassert says:
====================
This pull request fixes some issues that arise when 6in4 or 4in6 tunnels
are used in combination with IPsec, all from Hannes Frederic Sowa and a
null pointer dereference when queueing packets to the policy hold queue.
1) We might access the local error handler of the wrong address family if
6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer
to the correct local_error to xfrm_state_afinet.
2) Add a helper function to always refer to the correct interpretation
of skb->sk.
3) Call skb_reset_inner_headers to record the position of the inner headers
when adding a new one in various ipv6 tunnels. This is needed to identify
the addresses where to send back errors in the xfrm layer.
4) Dereference inner ipv6 header if encapsulated to always call the
right error handler.
5) Choose protocol family by skb protocol to not call the wrong
xfrm{4,6}_local_error handler in case an ipv6 sockets is used
in ipv4 mode.
6) Partly revert "xfrm: introduce helper for safe determination of mtu"
because this introduced pmtu discovery problems.
7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs.
We need this to get the correct mtu informations in xfrm.
8) Fix null pointer dereference in xdst_queue_output.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.
If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.
The number of neighbour discovery messages sent is very limited,
simply use alloc_skb() and don't depend on any socket wmem space any
longer.
This patch has orginally been posted by Eric Dumazet in a modified
form.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d51 (ipv4: Adjust semantics of rt->rt_gateway.)
The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.
Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit 2ad5b9e4bd
Reported-by: Chris Clark <chris.clark@alcatel-lucent.com>
Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com>
Tested-by: Chris Clark <chris.clark@alcatel-lucent.com>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The zero value means that tsecr is not valid, so it's a special case.
tsoffset is used to customize tcp_time_stamp for one socket.
tsoffset is usually zero, it's used when a socket was moved from one
host to another host.
Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
TCP_RTO_MAX.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
u32 rcv_tstamp; /* timestamp of last received ACK */
Its value used in tcp_retransmit_timer, which closes socket
if the last ack was received more then TCP_RTO_MAX ago.
Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer
is called before receiving a first ack, the connection is closed.
This patch initializes rcv_tstamp to a timestamp, when a socket was
restored.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
we don't need nr_irqs in machine any more after we move to
linear irqdomain for sirfsoc irqchip, so drop them.
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
the series of patches for irqdomain core in 3.11 has broken sirf
irq which uses legacy mapping. all users fail in the new kernel
while setupping irq.
this patch moves to linear irqdomain and drop old legacy irqdomain
codes since we don't need it any more, and at the same time, it
also fixes the broken interrupts of sirfsoc in 3.11.
on the other hand, we actually only have 64 interrupt sources for
prima2 and atlas6, but there are 128 interrupt souces for marco
which uses GIC. in the legacy codes, sirf gpio also uses legacy
irqdomain, so to make gpio interrupt mapping not depend on the
prima2/atlas6/marco an use unified marco,we enlarge prima2/atlas6
interrupt number to 128. here we don't need this workaround any
more as sirf gpio also moved to linear mode before. so we move
SIRFSOC_NUM_IRQS back to 64 too.
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
It causes crashes when enabled, and we don't have such a peripheral
anyway on ARC platforms.
Signed-off-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed. Here's a trivial reproducer:
cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy
It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.
Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.
(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)
tj: Flattened nested ifs a bit and updated comment so that it's
correct on both for-3.11-fixes and for-3.12.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine. Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs. The two would deadlock.
Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports. With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.
Fix it by invoking cond_resched() after executing each work item.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jamie Liu <jamieliu@google.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
kernel/workqueue.c | 9 +++++++++
1 file changed, 9 insertions(+)
With devices which have a dense and small register map but placed at a large
offset the global cache_present bitmap imposes a huge memory overhead. Making
the cache_present per rbtree node avoids the issue and easily reduces the memory
footprint by a factor of ten. For devices with a more sparse map or without a
large base register offset the memory usage might increase slightly by a few
bytes, but not significantly. E.g. for a device which has ~50 registers at
offset 0x4000 the memory footprint of the register cache goes down form 2496
bytes to 175 bytes.
Moving the bitmap to a per node basis means that the handling of the bitmap is
now cache implementation specific and can no longer be managed by the core. The
regcache_sync_block() function is extended by a additional parameter so that the
cache implementation can tell the core which registers in the block are set and
which are not. The parameter is optional and if NULL the core assumes that all
registers are set. The rbtree cache also needs to implement its own drop
callback instead of relying on the core to handle this.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>
Support for reducing the number of nodes and memory consumption of the rbtree
cache by allowing for small unused holes in the node's register cache block was
initially added in commit 0c7ed856 ("regmap: Cut down on the average # of nodes
in the rbtree cache"). But the commit had problems and so its effect was
reverted again in commit 4e67fb5 ("regmap: rbtree: Fix overlapping rbnodes.").
This patch brings the feature back of reducing the average number of nodes,
which will speedup node look-up, while at the same time also reducing the memory
usage of the rbtree cache. This patch takes a slightly different approach than
the original patch though. It modifies the adjacent node look-up to not only
consider nodes that are just one to the left or the right of the register but
any node that falls in a certain range around the register. The range is
calculated based on how much memory it would take to allocate a new node
compared to how much memory it takes adding a set of unused registers to an
existing node. E.g. if a node takes up 24 bytes and each register in a block
uses 1 byte the range will be from the register address - 24 to the register
address + 24. If we find a node that falls within this range it is cheaper or as
expensive to add the register to the existing node and have a couple of unused
registers in the node's cache compared to allocating a new node.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@linaro.org>