commit 8728c544a9 ("net: dev_pick_tx() fix") and commit
b6fe83e952 ("bonding: refine IFF_XMIT_DST_RELEASE capability")
are quite incompatible : Queue selection is disabled because skb
dst was dropped before entering bonding device.
This causes major performance regression, mainly because TCP packets
for a given flow can be sent to multiple queues.
This is particularly visible when using the new FQ packet scheduler
with MQ + FQ setup on the slaves.
We can safely revert the first commit now that 416186fbf8
("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
properly caps the queue_index.
Reported-by: Xi Wang <xii@google.com>
Diagnosed-by: Xi Wang <xii@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 1f324e3887.
It seems to cause regressions, and in particular the output path
really depends upon there being a socket attached to skb->sk for
checks such as sk_mc_loop(skb->sk) for example. See ip6_output_finish2().
Reported-by: Stephen Warren <swarren@wwwdotorg.org>
Reported-by: Fabio Estevam <festevam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 3d7b46cd20 (ip_tunnel: push generic protocol handling to
ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on
an IPv4 over IPv4 tunnel.
xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But
this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb).
Signed-off-by: Li Hongjun <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should a connect fail, if the publication/server is unavailable or
due to some other error, a positive value will be returned and errno
is never set. If the application code checks for an explicit zero
return from connect (success) or a negative return (failure), it
will not catch the error and subsequent send() calls will fail as
shown from the strace snippet below.
socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
sendto(3, "test", 4, 0, NULL, 0) = -1 EPIPE (Broken pipe)
The reason for this behaviour is that TIPC wrongly inverts error
codes set in sk_err.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
the call to sock_wmalloc in tcp_make_synack to alloc_skb. In doing so,
the netfilter owner match lost its ability to block the SYNACK packet on
outbound listening sockets. Revert the change, restoring the owner match
functionality.
This closes netfilter bugzilla #847.
Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we would still potentially suffer multicast packet loss if there
is just either an IGMP or an MLD querier: For the former case, we would
possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
because we are currently assuming that if either an IGMP or MLD querier
is present that the other one is present, too.
This patch makes the behaviour and fix added in
"bridge: disable snooping if there is no querier" (b00589af3b)
to also work if there is either just an IGMP or an MLD querier on the
link: It refines the deactivation of the snooping to be protocol
specific by using separate timers for the snooped IGMP and MLD queries
as well as separate timers for our internal IGMP and MLD queriers.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
This pull request fixes some issues that arise when 6in4 or 4in6 tunnels
are used in combination with IPsec, all from Hannes Frederic Sowa and a
null pointer dereference when queueing packets to the policy hold queue.
1) We might access the local error handler of the wrong address family if
6in4 or 4in6 tunnel is protected by ipsec. Fix this by addind a pointer
to the correct local_error to xfrm_state_afinet.
2) Add a helper function to always refer to the correct interpretation
of skb->sk.
3) Call skb_reset_inner_headers to record the position of the inner headers
when adding a new one in various ipv6 tunnels. This is needed to identify
the addresses where to send back errors in the xfrm layer.
4) Dereference inner ipv6 header if encapsulated to always call the
right error handler.
5) Choose protocol family by skb protocol to not call the wrong
xfrm{4,6}_local_error handler in case an ipv6 sockets is used
in ipv4 mode.
6) Partly revert "xfrm: introduce helper for safe determination of mtu"
because this introduced pmtu discovery problems.
7) Set skb->protocol on tcp, raw and ip6_append_data genereated skbs.
We need this to get the correct mtu informations in xfrm.
8) Fix null pointer dereference in xdst_queue_output.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocating skbs when sending out neighbour discovery messages
currently uses sock_alloc_send_skb() based on a per net namespace
socket and thus share a socket wmem buffer space.
If a netdevice is temporarily unable to transmit due to carrier
loss or for other reasons, the queued up ndisc messages will cosnume
all of the wmem space and will thus prevent from any more skbs to
be allocated even for netdevices that are able to transmit packets.
The number of neighbour discovery messages sent is very limited,
simply use alloc_skb() and don't depend on any socket wmem space any
longer.
This patch has orginally been posted by Eric Dumazet in a modified
form.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: raw_sendmsg: don't use header's destination address
A sendto() regression was bisected and found to start with commit
f8126f1d51 (ipv4: Adjust semantics of rt->rt_gateway.)
The problem is that it tries to ARP-lookup the constructed packet's
destination address rather than the explicitly provided address.
Fix this using FLOWI_FLAG_KNOWN_NH so that given nexthop is used.
cf. commit 2ad5b9e4bd
Reported-by: Chris Clark <chris.clark@alcatel-lucent.com>
Bisected-by: Chris Clark <chris.clark@alcatel-lucent.com>
Tested-by: Chris Clark <chris.clark@alcatel-lucent.com>
Suggested-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Chris Clark <chris.clark@alcatel-lucent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The zero value means that tsecr is not valid, so it's a special case.
tsoffset is used to customize tcp_time_stamp for one socket.
tsoffset is usually zero, it's used when a socket was moved from one
host to another host.
Currently this issue affects logic of tcp_rcv_rtt_measure_ts. Due to
incorrect value of rcv_tsecr, tcp_rcv_rtt_measure_ts sets rto to
TCP_RTO_MAX.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
u32 rcv_tstamp; /* timestamp of last received ACK */
Its value used in tcp_retransmit_timer, which closes socket
if the last ack was received more then TCP_RTO_MAX ago.
Currently rcv_tstamp is initialized to zero and if tcp_retransmit_timer
is called before receiving a first ack, the connection is closed.
This patch initializes rcv_tstamp to a timestamp, when a socket was
restored.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Reported-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
netlink dump operations take module as parameter to hold
reference for entire netlink dump duration.
Currently it holds ref only on genl module which is not correct
when we use ops registered to genl from another module.
Following patch adds module pointer to genl_ops so that netlink
can hold ref count on it.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of genl-family with parallel ops off, dumpif() callback
is expected to run under genl_lock, But commit def3117493
(genl: Allow concurrent genl callbacks.) changed this behaviour
where only first dumpit() op was called under genl-lock.
For subsequent dump, only nlk->cb_lock was taken.
Following patch fixes it by defining locked dumpit() and done()
callback which takes care of genl-locking.
CC: Jesse Gross <jesse@nicira.com>
CC: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The net_device might be not set on the skb when we try refcounting.
This leads to a null pointer dereference in xdst_queue_output().
It turned out that the refcount to the net_device is not needed
after all. The dst_entry has a refcount to the net_device before
we queue the skb, so it can't go away. Therefore we can remove the
refcount on queueing to fix the null pointer dereference.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
John W. Linville says:
====================
This is one more set of fixes intended for the 3.11 stream...
For the mac80211 bits, Johannes says:
"I have three more patches for the 3.11 stream: Felix's fix for the
fairly visible brcmsmac crash, a fix from Simon for an IBSS join bug I
found and a fix for a channel context bug in IBSS I'd introduced."
Along with those...
Sujith Manoharan makes a minor change to not use a PLL hang workaroun
for AR9550. This one-liner fixes a couple of bugs reported in the Red Hat
bugzilla.
Helmut Schaa addresses an ath9k_htc bug that mangles frame headers
during Tx. This fix is small, tested by the bug reported and isolated
to ath9k_htc.
Stanislaw Gruszka reverts a recent iwl4965 change that broke rfkill
notification to user space.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a security bug.
The follow-up will fix nsproxy to discourage this type of issue from
happening again.
Cc: stable@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we don't initialize skb->protocol when transmitting data via
tcp, raw(with and without inclhdr) or udp+ufo or appending data directly
to the socket transmit queue (via ip6_append_data). This needs to be
done so that we can get the correct mtu in the xfrm layer.
Setting of skb->protocol happens only in functions where we also have
a transmitting socket and a new skb, so we don't overwrite old values.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In commit 0ea9d5e3e0 ("xfrm: introduce
helper for safe determination of mtu") I switched the determination of
ipv4 mtus from dst_mtu to ip_skb_dst_mtu. This was an error because in
case of IP_PMTUDISC_PROBE we fall back to the interface mtu, which is
never correct for ipv4 ipsec.
This patch partly reverts 0ea9d5e3e0
("xfrm: introduce helper for safe determination of mtu").
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
rfc 4861 says the Redirected Header option is optional, so
the kernel should not drop the Redirect Message that has no
Redirected Header option. In this patch, the function
ip6_redirect_no_header() is introduced to deal with that
condition.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
This reverts commit 58ad436fcf.
It turns out that the change introduced a potential deadlock
by causing a locking dependency with netlink's cb_mutex. I
can't seem to find a way to resolve this without doing major
changes to the locking, so revert this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
my earlier patch "mac80211: change IBSS channel state to chandef"
created a regression by ignoring the channel parameter in
__ieee80211_sta_join_ibss, which breaks IBSS channel selection. This
patch fixes this situation by using the right channel and adopting the
selected bandwidth mode.
Cc: stable@vger.kernel.org
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
brcm80211 cannot handle sending frames with CCK rates as part of an
A-MPDU session. Other drivers may have issues too. Set the flag in all
drivers that have been tested with CCK rates.
This fixes a reported brcmsmac regression introduced in
commit ef47a5e4f1aaf1d0e2e6875e34b2c9595897bef6
"mac80211/minstrel_ht: fix cck rate sampling"
Cc: stable@vger.kernel.org # 3.10
Reported-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
IBSS needs to release the channel context when leaving
but I evidently missed that. Fix it.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The VLAN code needs to know the length of the per-port VLAN bitmap to
perform its most basic operations (retrieving VLAN informations, removing
VLANs, forwarding database manipulation, etc). Unfortunately, in the
current implementation we are using a macro that indicates the bitmap
size in longs in places where the size in bits is expected, which in
some cases can cause what appear to be random failures.
Use the correct macro.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Regarding the iwlwifi bits, Johannes says:
"We revert an rfkill bugfix that unfortunately caused more bugs, shuffle
some code to avoid touching the PCIe device before it's enabled and
disconnect if firmware fails to do our bidding. I also have Stanislaw's
fix to not crash in some channel switch scenarios."
As for the mac80211 bits, Johannes says:
"This time, I have one fix from Dan Carpenter for users of
nl80211hdr_put(), and one fix from myself fixing a regression with the
libertas driver."
Along with the above...
Dan Carpenter fixes some incorrectly placed "address of" operators
in hostap that caused copying of junk data.
Jussi Kivilinna corrects zd1201 to use an allocated buffer rather
than the stack for a URB operation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
getsockopt PACKET_STATISTICS returns tp_packets + tp_drops. Commit
ee80fbf301 ("packet: account statistics only in tpacket_stats_u")
cleaned up the getsockopt PACKET_STATISTICS code.
This also changed semantics. Historically, tp_packets included
tp_drops on return. The commit removed the line that adds tp_drops
into tp_packets.
This patch reinstates the old semantics.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Check if the skb has been correctly prepared before going on
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.20 (GNU/Linux)
iQIcBAABCAAGBQJSEkvHAAoJEADl0hg6qKeOyTQP/ifIXk5t26Tu8GTCH+lQnF36
1HY4nkEhLBkrEaKv0RXXEwLCDe1Gk8INewSXhtgDe7v696287zvxSDiftxXOwSn8
EkrP3jakxqNgyEstVUMxXuHQMxn8YsOnU+u4L4MZvcsWNmh1V8FzNLxPDWF3Z0bi
ycXhFI+BR+waWzFd8rVZ5sJ00ZhgSuM5vJ/uQ28kT8DyDZXz0I0mvve7ZUh5fczc
L7vvnju9VRq84RxV6bQwf9hXDk54fCLz22WSMolrqaHCl0XF4OAu6OVcYBLA0bp7
GUU7fS8IUiqAuC02FS5HEYPy1VErCok8hP/fzvjz8Bxuzz0I5SdYPurFTZQzAx73
U0GCLtNOE7zkwIsRbKhMdUcB6DoFZJVUaLo8YS9E1tl/nn1oRILFGTFb2T9/WzQ8
lbCkmYm+WhLdKeZbLkf8PPs9PDrhRQKr/QRHrCHVKh4rqzP1BUm4FqIBXo71Kiiz
no4GJoern8vW0CzoR59P5++/iFOCVTIx4ZJWvnYjWbqsYRazKjjCFtHffpz6mz+1
pHlrdYAZo+DOvme/2putfe6ViR+bA3lPxPkM7k3gADMifJcCAl3D7OM53QaDSKAk
Gw3yHafxBaFPXFdqxQkkC6ks7T6qoTsPI2lLqUG6srU3XA399bWVcLq7X9JEcR+9
ODwzHPj9fD5CZCe22lIO
=gllO
-----END PGP SIGNATURE-----
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included change:
- Check if the skb has been correctly prepared before going on
When the repair mode is turned off, the write queue seqs are
updated so that the whole queue is considered to be 'already sent.
The "when" field must be set for such skb. It's used in tcp_rearm_rto
for example. If the "when" field isn't set, the retransmit timeout can
be calculated incorrectly and a tcp connected can stop for two minutes
(TCP_RTO_MAX).
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not allowed for an ipv6 packet to contain multiple fragmentation
headers. So discard packets which were already reassembled by
fragmentation logic and send back a parameter problem icmp.
The updates for RFC 6980 will come in later, I have to do a bit more
research here.
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of the max_addresses check attackers were able to disable privacy
extensions on an interface by creating enough autoconfigured addresses:
<http://seclists.org/oss-sec/2012/q4/292>
But the check is not actually needed: max_addresses protects the
kernel to install too many ipv6 addresses on an interface and guards
addrconf_prefix_rcv to install further addresses as soon as this limit
is reached. We only generate temporary addresses in direct response of
a new address showing up. As soon as we filled up the maximum number of
addresses of an interface, we stop installing more addresses and thus
also stop generating more temp addresses.
Even if the attacker tries to generate a lot of temporary addresses
by announcing a prefix and removing it again (lifetime == 0) we won't
install more temp addresses, because the temporary addresses do count
to the maximum number of addresses, thus we would stop installing new
autoconfigured addresses when the limit is reached.
This patch fixes CVE-2013-0343 (but other layer-2 attacks are still
possible).
Thanks to Ding Tianhong to bring this topic up again.
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: George Kargiotakis <kargig@void.gr>
Cc: P J P <ppandit@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to choose the protocol family by skb->protocol. Otherwise we
call the wrong xfrm{4,6}_local_error handler in case an ipv6 sockets is
used in ipv4 mode, in which case we should call down to xfrm4_local_error
(ip6 sockets are a superset of ip4 ones).
We are called before before ip_output functions, so skb->protocol is
not reset.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In xfrm6_local_error use inner_header if the packet was encapsulated.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
When pushing a new header before current one call skb_reset_inner_headers
to record the position of the inner headers in the various ipv6 tunnel
protocols.
We later need this to correctly identify the addresses needed to send
back an error in the xfrm layer.
This change is safe, because skb->protocol is always checked before
dereferencing data from the inner protocol.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
batadv_unicast(_4addr)_prepare_skb might reallocate the skb's data.
And if it tries to do so then this can potentially fail.
We shouldn't continue working on this skb in such a case.
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Pull networking fixes from David Miller:
1) Fix SKB leak in 8139cp, from Dave Jones.
2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar.
3) RCU conversion of macvtap introduced two races, fixes by Eric
Dumazet
4) Synchronize statistic flows in bnx2x driver to prevent corruption,
from Dmitry Kravkov
5) Undo optimization in IP tunneling, we were using the inner IP header
in some cases to inherit the IP ID, but that isn't correct in some
circumstances. From Pravin B Shelar
6) Use correct struct size when parsing netlink attributes in
rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen
7) Length verifications in tun_get_user() are bogus, from Weiping Pan
and Dan Carpenter
8) Fix bad merge resolution during 3.11 networking development in
openvswitch, albeit a harmless one which added some unreachable
code. From Jesse Gross
9) Wrong size used in flexible array allocation in openvswitch, from
Pravin B Shelar
10) Clear out firmware capability flags the be2net driver isn't ready to
handle yet, from Sarveshwar Bandi
11) Revert DMA mapping error checking addition to cxgb3 driver, it's
buggy. From Alexey Kardashevskiy
12) Fix regression in packet scheduler rate limiting when working with a
link layer of ATM. From Jesper Dangaard Brouer
13) Fix several errors in TCP Cubic congestion control, in particular
overflow errors in timestamp calculations. From Eric Dumazet and
Van Jacobson
14) In ipv6 routing lookups, we need to backtrack if subtree traversal
don't result in a match. From Hannes Frederic Sowa
15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs
16) Get "low latency" out of the new MIB counter names. From Eliezer
Tamir
17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar
Samudrala
18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung
Cheng
19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean
20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong
21) call_rcu() call to destroy SCTP transport is done too early and
might result in an oops. From Daniel Borkmann
22) Fix races in genetlink family dumps, from Johannes Berg
23) Flags passed into macvlan by the user need to be validated properly,
from Michael S Tsirkin
24) Fix skge build on 32-bit, from Stephen Hemminger
25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira
Ayuso
26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay
Aleksandrov
27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel
Borkmann
28) neigh_parms need to be setup before calling the ->ndo_neigh_setup()
method. From Veaceslav Falico
29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet
30) Don't dereference MLD query message if the length isn't value in the
bridge multicast code, from Linus Lüssing
31) Fix VXLAN IGMP join regression due to an inverted check, from Cong
Wang
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes
tun: signedness bug in tun_get_user()
qlcnic: Fix diagnostic interrupt test for 83xx adapters
qlcnic: Fix beacon state return status handling
qlcnic: Fix set driver version command
net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset
net_sched: restore "linklayer atm" handling
drivers/net/ethernet/via/via-velocity.c: update napi implementation
Revert "cxgb3: Check and handle the dma mapping errors"
be2net: Clear any capability flags that driver is not interested in.
openvswitch: Reset tunnel key between input and output.
openvswitch: Use correct type while allocating flex array.
openvswitch: Fix bad merge resolution.
tun: compare with 0 instead of total_len
rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header
ethernet/arc/arc_emac - fix NAPI "work > weight" warning
ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
bnx2x: prevent crash in shutdown flow with CNIC
bnx2x: fix PTE write access error
bnx2x: fix memory leak in VF
...
commit 56b765b79 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.
tc class add ... htb rate X ceil Y linklayer atm
The linklayer setting is implemented by modifying the rate table
which is send to the kernel. No direct parameter were
transferred to the kernel indicating the linklayer setting.
The commit 56b765b79 ("htb: improved accuracy at high rates")
removed the use of the rate table system.
To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table. It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.
Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect. Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross says:
====================
Three bug fixes that are fairly small either way but resolve obviously
incorrect code. For net/3.11.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
It doesn't make sense to output a tunnel packet using the same
parameters that it was received with since that will generally
just result in the packet going back to us. As a result, userspace
assumes that the tunnel key is cleared when transitioning through
the switch. In the majority of cases this doesn't matter since a
packet is either going to a tunnel port (in which the key is
overwritten with new values) or to a non-tunnel port (in which
case the key is ignored). However, it's theoreticaly possible that
userspace could rely on the documented behavior, so this corrects
it.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Flex array is used to allocate hash buckets which is type struct
hlist_head, but we use `struct hlist_head *` to calculate
array size. Since hlist_head is of size pointer it works fine.
Following patch use correct type.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Neil Brown reports that with libertas, my recent cfg80211
SME changes in commit ceca7b7121
("cfg80211: separate internal SME implementation") broke
libertas suspend because it we now asked it to disconnect
while already disconnected.
The problematic change is in cfg80211_disconnect() as it
previously checked the SME state and now calls the driver
disconnect operation unconditionally.
Fix this by checking if there's a current_bss indicating
a connection, and do nothing if not.
Reported-and-tested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There are a few places which check nl80211hdr_put() for an ERR_PTR
but actually it returns NULL on error and never error values. In
nl80211_testmode_dump() the return wasn't checked at all so I have
added one.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
[some whitespace changes]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
skb->sk socket can be of AF_INET or AF_INET6 address family. Thus we
always have to make sure we a referring to the correct interpretation
of skb->sk.
We only depend on header defines to query the mtu, so we don't introduce
a new dependency to ipv6 by this change.
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
In xfrm4 and xfrm6 we need to take care about sockets of the other
address family. This could happen because a 6in4 or 4in6 tunnel could
get protected by ipsec.
Because we don't want to have a run-time dependency on ipv6 when only
using ipv4 xfrm we have to embed a pointer to the correct local_error
function in xfrm_state_afinet and look it up when returning an error
depending on the socket address family.
Thanks to vi0ss for the great bug report:
<https://bugzilla.kernel.org/show_bug.cgi?id=58691>
v2:
a) fix two more unsafe interpretations of skb->sk as ipv6 socket
(xfrm6_local_dontfrag and __xfrm6_output)
v3:
a) add an EXPORT_SYMBOL_GPL(xfrm_local_error) to fix a link error when
building ipv6 as a module (thanks to Steffen Klassert)
Reported-by: <vi0oss@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Fix the iproute2 command `bridge vlan show`, after switching from
rtgenmsg to ifinfomsg.
Let's start with a little history:
Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in
the 3.9 merge window.
In the kernel commit 6cbdceeb, he added attribute support to
bridge GETLINK requests sent with rtgenmsg.
Mar 6th: Vlad got this iproute2 reference implementation of the bridge
vlan netlink interface accepted (iproute2 9eff0e5c)
Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
http://patchwork.ozlabs.org/patch/239602/http://marc.info/?t=136680900700007
Apr 28th: Linus released 3.9
Apr 30th: Stephen released iproute2 3.9.0
The `bridge vlan show` command haven't been working since the switch to
ifinfomsg, or in a released version of iproute2. Since the kernel side
only supports rtgenmsg, which iproute2 switched away from just prior to
the iproute2 3.9.0 release.
I haven't been able to find any documentation, about neither rtgenmsg
nor ifinfomsg, and in which situation to use which, but kernel commit
88c5b5ce seams to suggest that ifinfomsg should be used.
Fixing this in kernel will break compatibility, but I doubt that anybody
have been using it due to this bug in the user space reference
implementation, at least not without noticing this bug. That said the
functionality is still fully functional in 3.9, when reversing iproute2
commit 63338dca.
This could also be fixed in iproute2, but thats an ugly patch that would
reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
like rtgenmsg usage is discouraged. I'm assuming that the only reason
that Vlad implemented the kernel side to use rtgenmsg, was because
iproute2 was using it at the time.
Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net>
Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.
Following patch reverts optimization from commit
490ab08127 (IP_GRE: Fix IP-Identification.)
CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>