Commit Graph

27555 Commits

Author SHA1 Message Date
Eric Dumazet
00cfec3748 net: add a synchronize_net() in netdev_rx_handler_unregister()
commit 35d48903e9 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG

the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:

        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!

What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:38:15 -04:00
Vijay Subramanian
cd68ddd4c2 net: fq_codel: Fix off-by-one error
Currently, we hold a max of sch->limit -1 number of packets instead of
sch->limit packets. Fix this off-by-one error.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:32:23 -04:00
John Fastabend
91f3e7b174 net: rtnetlink: fdb dflt dump must set idx used for cb->arg[0]
In rtnl_fdb_dump() when the fdb_dump ndo op is not populated
we never set the idx value so that cb->arg[0] is always 0.
Resulting in a endless loop of messages.

Introduced with this commit,

commit 090096bf3d
Author: Vlad Yasevich <vyasevic@redhat.com>
Date:   Wed Mar 6 15:39:42 2013 +0000

    net: generic fdb support for drivers without ndo_fdb_<op>

CC: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:26:27 -04:00
Shmulik Ladkani
a561cf7edf net: core: Remove redundant call to 'nf_reset' in 'dev_forward_skb'
'nf_reset' is called just prior calling 'netif_rx'.
No need to call it twice.

Reported-by: Igor Michailov <rgohita@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:25:28 -04:00
Pravin B Shelar
54a5d38289 ip_tunnel: Fix off-by-one error in forming dev name.
As Ben pointed out following patch fixes bug in checking device
name length limits while forming tunnel device name.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:24:28 -04:00
Li RongQing
278150321a net: simplify the getting percpu of flow_cache
replace per_cpu with per_cpu_ptr to save conversion between address and pointer

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:14:46 -04:00
Li RongQing
50eab0503a net: fix the use of this_cpu_ptr
flush_tasklet is not percpu var, and percpu is percpu var, and
	this_cpu_ptr(&info->cache->percpu->flush_tasklet)
is not equal to
	&this_cpu_ptr(info->cache->percpu)->flush_tasklet

1f743b076(use this_cpu_ptr per-cpu helper) introduced this bug.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 15:13:27 -04:00
Hannes Frederic Sowa
1c4a154e52 ipv6: don't accept node local multicast traffic from the wire
Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been
verified: http://www.rfc-editor.org/errata_search.php?eid=3480

We have to check for pkt_type and loopback flag because either the
packets are allowed to travel over the loopback interface (in which case
pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel
over a non-loopback interface back to us (in which case PACKET_TYPE is
PACKET_LOOPBACK and IFF_LOOPBACK flag is not set).

Cc: Erik Hugne <erik.hugne@ericsson.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 14:57:33 -04:00
Hong zhi guo
10c9cbb10f netlink: fix the warning introduced by netlink API replacement
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-29 14:44:37 -04:00
Linus Torvalds
2c3de1c2d7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull userns fixes from Eric W Biederman:
 "The bulk of the changes are fixing the worst consequences of the user
  namespace design oversight in not considering what happens when one
  namespace starts off as a clone of another namespace, as happens with
  the mount namespace.

  The rest of the changes are just plain bug fixes.

  Many thanks to Andy Lutomirski for pointing out many of these issues."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  userns: Restrict when proc and sysfs can be mounted
  ipc: Restrict mounting the mqueue filesystem
  vfs: Carefully propogate mounts across user namespaces
  vfs: Add a mount flag to lock read only bind mounts
  userns:  Don't allow creation if the user is chrooted
  yama:  Better permission check for ptraceme
  pid: Handle the exit of a multi-threaded init.
  scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.
2013-03-28 13:43:46 -07:00
Hong zhi guo
c60ee67f45 bridge: remove unused variable ifm
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:41:19 -04:00
John W. Linville
630a216da6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-28 14:40:06 -04:00
David S. Miller
ea407f0b97 Included changes:
- A fix for the network coding component which has been added within the last
   pull request (so it is in linux-3.10). The problem has been spotted thanks to
   Fengguang Wu's automated daily checks on our tree.
 - Implementation of the RTNL API for virtual interface creation/deletion and slave
   manipulation
 - substitution of seq_printf with seq_puts when possible
 - minor cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJRUsb7AAoJEADl0hg6qKeOf/sQAJ4VRwpB5WLIjtLPQ1Dvd/qu
 thPAn4vpv02Is8H7NRULRN9aqUIzWrWtXkhBTAOh0EHq3mki6a+QT4TUQ4qswTPv
 BaVS70hoTtN1OdcntWTATCUuX2Y2HD/D3Cw2IZIG7X9oqTEpoTpnuQygMsG/c1+s
 MtNO57/WUKlwLv0pAu9IHh4qCj00eF5YHsPmsWzoWAvi/4aCgESQfXGy4Dm6HqLB
 E8XuVLOLtw7NSw0V7n//KtEE2llTWkfslm06dSsAmqEtzOfOCCuA4Z0asnMPJt7s
 4HknqNE0D0aR0sfS+W6G24gFQbIV8R4SYWnUfMFCvvT7Ds96RONdgkfWw39nM7XB
 y70kXUbfDBIeHiiV2uqbTZnbuAa61p8Y8jCJvWz0nujI1/QPnUA1bqEgC2AMTc1J
 I6rS8mhWC6Rg+ohkK7HSQfgiZ5xcWtjFuVuOhKr0cPwhaCI0yP5PCB/B3yxuJxwD
 egkL58fgPhgii3DOxInAqLS/aPT5vl49J91qC3+z2/xYS3zRxHsyIRkMwgfoI2Ab
 TofVmzKqDwENA+CnjCPQx2gn6w/QpK8g5xmW1GgbzuKddiCcQFKnezgpeSdnQrsM
 CP/8Tu6r3QSLlHebysMuhsCwyT1hv8Yt0PTDpBqtIXRpV5enTYa9xW3F+UTU65ZY
 p9fKQXJYDLLEhfcy/TJM
 =FoTZ
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- A fix for the network coding component which has been added within the last
  pull request (so it is in linux-3.10). The problem has been spotted thanks to
  Fengguang Wu's automated daily checks on our tree.
- Implementation of the RTNL API for virtual interface creation/deletion and slave
  manipulation
- substitution of seq_printf with seq_puts when possible
- minor cleanups

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:34:23 -04:00
Hong zhi guo
573ce260b3 net-next: replace obsolete NLMSG_* with type safe nlmsg_*
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 14:25:25 -04:00
Simon Horman
e5c5d22e8d net: add ETH_P_802_3_MIN
Add a new constant ETH_P_802_3_MIN, the minimum ethernet type for
an 802.3 frame. Frames with a lower value in the ethernet type field
are Ethernet II.

Also update all the users of this value that David Miller and
I could find to use the new constant.

Also correct a bug in util.c. The comparison with ETH_P_802_3_MIN
should be >= not >.

As suggested by Jesse Gross.

Compile tested only.

Cc: David Miller <davem@davemloft.net>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Bart De Schuymer <bart.de.schuymer@pandora.be>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: linux-bluetooth@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: bridge@lists.linux-foundation.org
Cc: linux-wireless@vger.kernel.org
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-media@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: dev@openvswitch.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-28 01:20:42 -04:00
David S. Miller
0fb031f036 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
1) Initialize the satype field in key_notify_policy_flush(),
   this was left uninitialized. From Nicolas Dichtel.

2) The sequence number difference for replay notifications
   was misscalculated on ESN sequence number wrap. We need
   a separate replay notify function for esn.

3) Fix an off by one in the esn replay notify function.
   From Mathias Krause.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:07:04 -04:00
Sergey Popovich
ea872d7712 sch: add missing u64 in psched_ratecfg_precompute()
It seems that commit

commit 292f1c7ff6
Author: Jiri Pirko <jiri@resnulli.us>
Date:   Tue Feb 12 00:12:03 2013 +0000

    sch: make htb_rate_cfg and functions around that generic

adds little regression.

Before:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 5000Mbit ceil 5000Mbit burst 625b cburst
625b
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 lended: 0 borrowed: 0 giants: 0
 tokens: 31 ctokens: 31

After:

# tc qdisc add dev eth0 root handle 1: htb default ffff
# tc class add dev eth0 classid 1:ffff htb rate 5Gbit
# tc -s class show dev eth0
class htb 1:ffff root prio 0 rate 1544Mbit ceil 1544Mbit burst 625b cburst
625b
 Sent 5073 bytes 41 pkt (dropped 0, overlimits 0 requeues 0)
 rate 1976bit 2pps backlog 0b 0p requeues 0
 lended: 41 borrowed: 0 giants: 0
 tokens: 1802 ctokens: 1802

This probably due to lost u64 cast of rate parameter in
psched_ratecfg_precompute() (net/sched/sch_generic.c).

Signed-off-by: Sergey Popovich <popovich_sergei@mail.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:06:41 -04:00
Wei Yongjun
fcca143d69 rtnetlink: fix error return code in rtnl_link_fill()
Fix to return a negative error code from the error handling case
instead of 0(possible overwrite to 0 by ops->fill_xstats call),
as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 14:06:40 -04:00
David S. Miller
e2a553dbf1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	include/net/ipip.h

The changes made to ipip.h in 'net' were already included
in 'net-next' before that header was moved to another location.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 13:52:49 -04:00
Wei Yongjun
5389090b59 netfilter: nf_conntrack: fix error return code
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in function
nf_conntrack_standalone_init().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-27 18:31:17 +01:00
Jesper Dangaard Brouer
1b5ab0def4 net: use the frag lru_lock to protect netns_frags.nqueues update
Move the protection of netns_frags.nqueues updates under the LRU_lock,
instead of the write lock.  As they are located on the same cacheline,
and this is also needed when transitioning to use per hash bucket locking.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:33 -04:00
Jesper Dangaard Brouer
68399ac37e net: frag, avoid several CPUs grabbing same frag queue during LRU evictor loop
The LRU list is protected by its own lock, since commit 3ef0eb0db4
(net: frag, move LRU list maintenance outside of rwlock), and
no-longer by a read_lock.

This makes it possible, to remove the inet_frag_queue, which is about
to be "evicted", from the LRU list head.  This avoids the problem, of
several CPUs grabbing the same frag queue.

Note, cannot remove the inet_frag_lru_del() call in fq_unlink()
called by inet_frag_kill(), because inet_frag_kill() is also used in
other situations.  Thus, we use list_del_init() to allow this
double list_del to work.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:33 -04:00
Andy Shevchenko
5a048e3b59 net: core: let's use native isxdigit instead of custom
In kernel we have fast and pretty implementation of the isxdigit() function.
Let's use it.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:32 -04:00
Jason Wang
40893fd0fd net: switch to use skb_probe_transport_header()
Switch to use the new help skb_probe_transport_header() to do the l4 header
probing for untrusted sources. For packets with partial csum, the header should
already been set by skb_partial_csum_set().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:31 -04:00
Jason Wang
e5d5decaed net: core: let skb_partial_csum_set() set transport header
For untrusted packets with partial checksum, we need to set the transport header
for precise packet length estimation. We can just let skb_pratial_csum_set() to
do this to avoid extra call to skb_flow_dissect() and simplify the caller.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 12:48:31 -04:00
Antonio Quartulli
0c81465357 batman-adv: use seq_puts instead of seq_printf when the format is constant
As reported by checkpatch, seq_puts has to be preferred with
respect to seq_printf when the format is a constant string
(no va_args)

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-03-27 10:29:55 +01:00
Antonio Quartulli
9cb812c54e batman-adv: update Makefile copyright years
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:29:54 +01:00
Simon Wunderlich
9998bb7300 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:29:54 +01:00
Antonio Quartulli
cb4b0d4864 batman-adv: free an hard-interface before adding it
When adding a new hard interface (e.g. wlan0) to a soft interface (e.g. bat0)
and the former is already enslaved in another virtual interface (e.g. a software
bridge) batman-adv has to free it first and then continue with the adding
mechanism.

In this way the behaviour becomes consistent with what "ip link set master"
does. At the moment batman-adv enslaves the hard interface without checking for
the master device, possibly causing strange behaviours which are never wanted by
the users.

Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-03-27 10:29:53 +01:00
Sven Eckelmann
3dbd550b8b batman-adv: Allow to modify slaves of soft-interfaces through rntl_link
The sysfs configuration interface of batman-adv to add/remove slaves of an
soft-iface is not deadlock free and doesn't follow the currently common way to
modify slaves of an interface.

An additional configuration interface though rtnl_link is introduced which
provides easy device adding/removing with tools like "ip":
$ ip link set dev eth0 master bat0
$ ip link set dev eth0 nomaster

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:29:52 +01:00
Sven Eckelmann
a4ac28c0d0 batman-adv: Allow to use rntl_link for device creation/deletion
The sysfs configuration interface of batman-adv to add/remove soft-interfaces
is not deadlock free and doesn't follow the currently common way to create new
virtual interfaces.

An additional interface though rtnl_link is introduced which provides easy device
creation/deletion with tools like "ip":

$ ip link add dev bat0 type batadv
$ ip link del dev bat0

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:34 +01:00
Marek Lindner
e07932ae6f batman-adv: rename batadv_softif_destroy to reflect sysfs use case
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
CC: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:33 +01:00
Sven Eckelmann
a15fd3612d batman-adv: Don't always delete softif when last slave was removed
batman-adv has an unusual way to manage softinterfaces. These will be created
automatically when a user writes to the batman-adv/mesh_iface file in sysfs and
removed when no slave device exists anymore.

This behaviour cannot be changed without breaking compatibility with existing
code. Instead other interfaces should be able to slightly reduce this behaviour
and provide a more common reaction to a removal of a slave interface.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:32 +01:00
Sven Eckelmann
b3246020e2 batman-adv: Move deinitialization of soft-interface to destructor
The deinitialization of the soft-interface created in ndo_init/constructor
should be done in the destructor and not directly before calling
unregister_netdevice

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:32 +01:00
Sven Eckelmann
37130293fd batman-adv: Move soft-interface initialization to ndo_init
The initialization of an net_device object should be done in the
init/constructor function and not from the outside after the register_netdevice
was done to avoid race conditions.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:31 +01:00
Martin Hundebøll
e6a0b495ff batman-adv: Fix endianness errors for network coding
Add a htonl() in network_coding.c when reading the sequence number
from received ogm_packet, to avoid wrong byte ordering when comparing
with a host value. This bug was introduced in
3ed7ada3f0bbcd058567bc0a8f9729a73eba7db6 ("batman-adv: network coding -
detect coding nodes and remove these after timeout").

Change the type of coded_packet->coded_len from uint16 to __be16 to
avoid wrong assumptions about endianness in later uses. Introduced in
c3289f3650d34b60296000a629c99f2488f7c3dd ("batman-adv: network coding -
code and transmit packets if possible").

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-27 10:27:31 +01:00
Tony Cheneau
6bdeaba47d 6lowpan: use IEEE802154_ADDR_LEN instead of a magic number
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 00:52:16 -04:00
Tony Cheneau
ababf38596 6lowpan: fix a small formatting issue
This formatting issue was introduced with commit
d4ac32365d

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 00:52:16 -04:00
Stephen Röttger
6364e6ee78 ieee802154/dgram: Pass source address in dgram_recvmsg
This patch lets dgram_recvmsg fill in the sockaddr struct in
msg->msg_name with the source address of the packet.
This is used by the userland functions recvmsg and recvfrom to get the
senders address.

[Stefan: Changed from old zigbee legacy tree to mainline]

Signed-off-by: Stephen Röttger <stephen.roettger@zero-entropy.de>
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-27 00:52:06 -04:00
Linus Torvalds
b175293ccc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Always increment IPV4 ID field in encapsulated GSO packets, even
    when DF is set.  Regression fix from Pravin B Shelar.

 2) Fix per-net subsystem initialization in netfilter conntrack,
    otherwise we may access dynamically allocated memory before it is
    actually allocated.  From Gao Feng.

 3) Fix DMA buffer lengths in iwl3945 driver, from Stanislaw Gruszka.

 4) Fix race between submission of sync vs async commands in mwifiex
    driver, from Amitkumar Karwar.

 5) Add missing cancel of command timer in mwifiex driver, from Bing
    Zhao.

 6) Missing SKB free in rtlwifi USB driver, from Jussi Kivilinna.

 7) Thermal layer tries to use a genetlink multicast string that is
    longer than the 16 character limit.  Fix it and add a BUG check to
    prevent this kind of thing from happening in the future.

 From Masatake YAMATO.

 8) Fix many bugs in the handling of the teardown of L2TP connections,
    UDP encapsulation instances, and sockets.  From Tom Parkin.

 9) Missing socket release in IRDA, from Kees Cook.

10) Fix fec driver modular build, from Fabio Estevam.

11) Erroneous use of kfree() instead of free_netdev() in lantiq_etop,
    from Wei Yongjun.

12) Fix bugs in handling of queue numbers and steering rules in mlx4
    driver, from Moshe Lazer, Hadar Hen Zion, and Or Gerlitz.

13) Some FOO_DIAG_MAX constants were defined off by one, fix from Andrey
    Vagin.

14) TCP segmentation deferral is unintentionally done too strongly,
    breaking ACK clocking.  Fix from Eric Dumazet.

15) net_enable_timestamp() can legitimately be invoked from software
    interrupts, and in a way that is safe, so remove the WARN_ON().
    Also from Eric Dumazet.

16) Fix use after free in VLANs, from Cong Wang.

17) Fix TCP slow start retransmit storms after SACK reneging, from
    Yuchung Cheng.

18) Unix socket release should mark a socket dead before NULL'ing out
    sock->sk, otherwise we can race.  Fix from Paul Moore.

19) IPV6 addrconf code can try to free static memory, from Hong Zhiguo.

20) Fix register mis-programming, NULL pointer derefs, and wrong PHC
    clock frequency in IGB driver.  From Lior LevyAlex Williamson, Jiri
    Benc, and Jeff Kirsher.

21) skb->ip_summed logic in pch_gbe driver is reversed, breaking packet
    forwarding.  Fix from Veaceslav Falico.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  ipv4: Fix ip-header identification for gso packets.
  bonding: remove already created master sysfs link on failure
  af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
  pch_gbe: fix ip_summed checksum reporting on rx
  igb: fix PHC stopping on max freq
  igb: make sensor info static
  igb: SR-IOV init reordering
  igb: Fix null pointer dereference
  igb: fix i350 anti spoofing config
  ixgbevf: don't release the soft entries
  ipv6: fix bad free of addrconf_init_net
  unix: fix a race condition in unix_release()
  tcp: undo spurious timeout after SACK reneging
  bnx2x: fix assignment of signed expression to unsigned variable
  bridge: fix crash when set mac address of br interface
  8021q: fix a potential use-after-free
  net: remove a WARN_ON() in net_enable_timestamp()
  tcp: preserve ACK clocking in TSO
  net: fix *_DIAG_MAX constants
  net/mlx4_core: Disallow releasing VF QPs which have steering rules
  ...
2013-03-26 14:24:29 -07:00
Linus Torvalds
5d538483ea NFS client bugfixes for Linux 3.9
- Fix an NFSv4 idmapper regression
 - Fix an Oops in the pNFS blocks client
 - Fix up various issues with pNFS layoutcommit
 - Ensure correct read ordering of variables in rpc_wake_up_task_queue_locked
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRUedyAAoJEGcL54qWCgDyar0P/2pTT/yxX8ejTu5DmY7e4PYJ
 jhPG2AEqY/yMLn9GvB375VIs1L8tuY50+3NFhWZFjyNbEU3GV+5Y+kPpBtAgYiSI
 VyIXiJ/xMtXdYJMYuE/nh5jbcqJsHwGjpcIaSd5BuWzQUaoUYvLulxWd4QN8mmaT
 5SuzmgV+7WIqV6RjlaYF82srcOKAjwemcrfRkCNzzJr6aT39gH2YdYFbDaTr7qhU
 fw0x3QlI7887vSNQcfaGbC1+jr6oe8wRCneOR0tceU/8bcj6zlUDk5HxqSOc28mA
 jUQieoVRggcM4s5DFpNcuwW6qCPZOmzv/OFD6oqnhyyonPOrue+7zaoujZmGNmjx
 dT2V/jQehanYD25WpDO8OyFXUeYE4x9bgHKsszhBTwr4x5D8ceEJ1sugcOPiTTxu
 tflbbuWbt+BguvXp4p8QayUj0V2cplM/nOovWyUG+BH46sz3Dtv46NOgJeO2a29g
 T6jayxmKCxvtPKtG0j34BzLngiKabZTSEhFms6Qarp9lwWvHWrR9KWGuDBNvy1Ts
 GMBN8P6Ib40yVi6Pwlj5Jpy6yLKVklHtJQpactr63AZmYrF4bBBSom+MWAh3X1iO
 QtF0x9Z1bBkXY2Q/u+3vWMxQtEPeW+pSiloj8aiceFAt33zKM+1bLofDhEw0s2fI
 wJEHYsGyGtDQINgP0v1e
 =OPbZ
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix an NFSv4 idmapper regression
 - Fix an Oops in the pNFS blocks client
 - Fix up various issues with pNFS layoutcommit
 - Ensure correct read ordering of variables in
   rpc_wake_up_task_queue_locked

* tag 'nfs-for-3.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
  NFSv4.1: Add a helper pnfs_commit_and_return_layout
  NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
  NFSv4.1: Fix a race in pNFS layoutcommit
  pnfs-block: removing DM device maybe cause oops when call dev_remove
  NFSv4: Fix the string length returned by the idmapper
2013-03-26 14:23:45 -07:00
Pravin B Shelar
330305cc4a ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 13:50:05 -04:00
Jason Wang
15e5a03071 net_sched: better precise estimation on packet length for untrusted packets
gso_segs were reset to zero when kernel receive packets from untrusted
source. But we use this zero value to estimate precise packet len which is
wrong. So this patch tries to estimate the correct gso_segs value before using
it in qdisc_pkt_len_init().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:44:44 -04:00
Jason Wang
c1aad275b0 packet: set transport header before doing xmit
Set the transport header for 1) some drivers (e.g ixgbe needs l4 header to do
atr) 2) precise packet length estimation (introduced in 1def9238) needs l4
header to compute header length.

So this patch first tries to get l4 header for packet socket through
skb_flow_dissect(), and pretend no l4 header if skb_flow_dissect() fails.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:44:43 -04:00
Tony Cheneau
24363b6732 6lowpan: modify udp compression/uncompression to match the standard
The previous code would just compress the UDP header and send the compressed
UDP header along with the uncompressed one.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:58 -04:00
Tony Cheneau
43de7aa6ac 6lowpan: use the PANID provided by the device instead of a static value
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:58 -04:00
Tony Cheneau
c7d0ab28b4 6lowpan: obtain IEEE802.15.4 sequence number from the MAC layer
Sets the sequence number in the frame format. Without this fix, the sequence
number is always set to 0. This makes trafic analysis very hard.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:58 -04:00
Tony Cheneau
0483546a3d mac802154: add mac802154_dev_get_dsn()
Bring-over mac802154_dev_get_dsn() function that was present in the
Linux ZigBee kernel. This function is called by the 6LoWPAN code in
order to properly set the DSN (Data Sequence Number) value in the IEEE
802.15.4 frame.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:58 -04:00
Tony Cheneau
d4ac32365d 6lowpan: store fragment tag values per device instead of net stack wide
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:57 -04:00
Tony Cheneau
9da2924c4b 6lowpan: add debug messages for 6LoWPAN fragmentation
Add pr_debug() call in order to debug 6LoWPAN fragmentation and
reassembly.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:57 -04:00
Tony Cheneau
d991b98f50 6lowpan: fix first fragment (FRAG1) handling
The first fragment, FRAG1, must contain some payload according to the
specs. However, as it is currently written, the first fragment will
remain empty and only contain the 6lowpan headers.

This patch also extracts the transport layer information from the first
fragment. This information is used later on when uncompressing UDP
header.

Thanks to Wolf-Bastian Pöttner for noticing that the offset value was
not properly initialized.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:57 -04:00
Tony Cheneau
58ef67c318 6lowpan: use short IEEE 802.15.4 addresses for broadcast destination
The IEEE 802.15.4 standard uses the 0xFFFF short address (2 bytes) for message
broadcasting.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:56 -04:00
Tony Cheneau
cf692061d0 mac802154: turn on ACK when enabled by the upper layers
Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:55 -04:00
Tony Cheneau
f333a15a3e 6lowpan: always enable link-layer acknowledgments
This feature is especially important when using fragmentation, because
the reassembly mechanism cannot recover from the loss of a fragment.

Note that some hardware ignore this flag and not will not transmit
acknowledgments even if this is set.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:55 -04:00
Tony Cheneau
f5c20f58d9 6lowpan: next header is not properly set upon decompression of a UDP header.
This causes a drop of the UDP packet.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:55 -04:00
Tony Cheneau
8d879a3f98 6lowpan: lowpan_is_iid_16_bit_compressable() does not detect compressible address correctly
The current test is not RFC6282 compliant. The same issue has been found
and fixed in Contiki. This patch is basically a port of their fix.

Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:37:55 -04:00
Hong zhi guo
de179c8c12 netlink: have length check of rtnl msg before deref
When the legacy array rtm_min still exists, the length check within
these functions is covered by rtm_min[RTM_NEWTFILTER],
rtm_min[RTM_NEWQDISC] and rtm_min[RTM_NEWTCLASS].

But after Thomas Graf removed rtm_min several days ago, these checks
are missing. Other doit functions should be OK.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:35:27 -04:00
dingtianhong
14134f6584 af_unix: dont send SCM_CREDENTIAL when dest socket is NULL
SCM_SCREDENTIALS should apply to write() syscalls only either source or destination
socket asserted SOCK_PASSCRED. The original implememtation in maybe_add_creds is wrong,
and breaks several LSB testcases ( i.e. /tset/LSB.os/netowkr/recvfrom/T.recvfrom).

Origionally-authored-by: Karel Srot <ksrot@redhat.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:33:55 -04:00
YOSHIFUJI Hideaki / 吉藤英明
cb6bf35502 firewire net, ipv6: IPv6 over Firewire (RFC3146) support.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:32:13 -04:00
YOSHIFUJI Hideaki / 吉藤英明
6752c8db8e firewire net, ipv4 arp: Extend hardware address and remove driver-level packet inspection.
Inspection of upper layer protocol is considered harmful, especially
if it is about ARP or other stateful upper layer protocol; driver
cannot (and should not) have full state of them.

IPv4 over Firewire module used to inspect ARP (both in sending path
and in receiving path), and record peer's GUID, max packet size, max
speed and fifo address.  This patch removes such inspection by extending
our "hardware address" definition to include other information as well:
max packet size, max speed and fifo.  By doing this, The neighbour
module in networking subsystem can cache them.

Note: As we have started ignoring sspd and max_rec in ARP/NDP, those
      information will not be used in the driver when sending.

When a packet is being sent, the IP layer fills our pseudo header with
the extended "hardware address", including GUID and fifo.  The driver
can look-up node-id (the real but rather volatile low-level address)
by GUID, and then the module can send the packet to the wire using
parameters provided in the extendedn hardware address.

This approach is realistic because IP over IEEE1394 (RFC2734) and IPv6
over IEEE1394 (RFC3146) share same "hardware address" format
in their address resolution protocols.

Here, extended "hardware address" is defined as follows:

union fwnet_hwaddr {
	u8 u[16];
	struct {
		__be64 uniq_id;		/* EUI-64			*/
		u8 max_rec;		/* max packet size		*/
		u8 sspd;		/* max speed			*/
		__be16 fifo_hi;		/* hi 16bits of FIFO addr	*/
		__be32 fifo_lo;		/* lo 32bits of FIFO addr	*/
	} __packed uc;
};

Note that Hardware address is declared as union, so that we can map full
IP address into this, when implementing MCAP (Multicast Cannel Allocation
Protocol) for IPv6, but IP and ARP subsystem do not need to know this
format in detail.

One difference between original ARP (RFC826) and 1394 ARP (RFC2734)
is that 1394 ARP Request/Reply do not contain the target hardware address
field (aka ar$tha).  This difference is handled in the ARP subsystem.

CC: Stephan Gatzka <stephan.gatzka@gmail.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:32:13 -04:00
Pravin B Shelar
f61dd388a9 Tunneling: use IP Tunnel stats APIs.
Use common function get calculate rtnl_link_stats64 stats.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:19 -04:00
Pravin B Shelar
fd58156e45 IPIP: Use ip-tunneling code.
Reuse common ip-tunneling code which is re-factored from GRE
module.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Pravin B Shelar
c544193214 GRE: Refactor GRE tunneling code.
Following patch refactors GRE code into ip tunneling code and GRE
specific code. Common tunneling code is moved to ip_tunnel module.
ip_tunnel module is written as generic library which can be used
by different tunneling implementations.

ip_tunnel module contains following components:
 - packet xmit and rcv generic code. xmit flow looks like
   (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
 - hash table of all devices.
 - lookup for tunnel devices.
 - control plane operations like device create, destroy, ioctl, netlink
   operations code.
 - registration for tunneling modules, like gre, ipip etc.
 - define single pcpu_tstats dev->tstats.
 - struct tnl_ptk_info added to pass parsed tunnel packet parameters.

ipip.h header is renamed to ip_tunnel.h

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-26 12:27:18 -04:00
Samuel Ortiz
39a352a5b5 NFC: llcp: Keep the connected socket parent pointer alive
And avoid decreasing the ack log twice when dequeueing connected LLCP
sockets.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-26 14:35:57 +01:00
John W. Linville
fae172136c Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2013-03-25 14:50:17 -04:00
David S. Miller
eaac5f3d3a net: Print functions in /proc/net/ptype without the offset.
It's always zero.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 14:12:55 -04:00
Hong Zhiguo
a79ca223e0 ipv6: fix bad free of addrconf_init_net
Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 13:55:38 -04:00
Paul Moore
ded34e0fe8 unix: fix a race condition in unix_release()
As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned.  This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned.  This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.stancek@gmail.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 13:11:48 -04:00
Pravin B Shelar
25c7704d8b ipv4: Fix ip-header identification for gso packets.
ip-header id needs to be incremented even if IP_DF flag is set.
This behaviour was changed in commit 490ab08127
(IP_GRE: Fix IP-Identification).

Following patch fixes it so that identification is always
incremented.

Reported-by: Cong Wang <amwang@redhat.com>
Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2013-03-25 12:30:25 -04:00
Pravin B Shelar
5594c32187 Revert "udp: increase inner ip header ID during segmentation"
This reverts commit d6a8c36dd6.
Next commit makes this commit unnecessary.

Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:29:54 -04:00
Pravin B Shelar
9cb690d1b4 Revert "ip_gre: increase inner ip header ID during segmentation"
This reverts commit 10c0d7ed32.
Next commit makes this commit unnecessary.

Acked-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:29:54 -04:00
Florian Fainelli
5f64a7dbf5 dsa: fix freeing of sparse port allocation
If we have defined a sparse port allocation which is non-contiguous and
contains gaps, the code freeing port_names will just stop when it
encouters a first NULL port_names, which is not right, we should iterate
over all possible number of ports (DSA_MAX_PORTS) until we are done.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:23:41 -04:00
Florian Fainelli
2116824503 dsa: factor freeing of dsa_platform_data
This patch factors the freeing of the struct dsa_platform_data
manipulated by the driver identically in two places to a single
function.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:23:41 -04:00
David S. Miller
da13482534 Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter/IPVS updates for
your net-next tree, they are:

* Better performance in nfnetlink_queue by avoiding copy from the
  packet to netlink message, from Eric Dumazet.

* Remove unnecessary locking in the exit path of ebt_ulog, from Gao Feng.

* Use new function ipv6_iface_scope_id in nf_ct_ipv6, from Hannes Frederic Sowa.

* A couple of sparse fixes for IPVS, from Julian Anastasov.

* Use xor hashing in nfnetlink_queue, as suggested by Eric Dumazet, from
  myself.

* Allow to dump expectations per master conntrack via ctnetlink, from myself.

* A couple of cleanups to use PTR_RET in module init path, from Silviu-Mihai
  Popescu.

* Remove nf_conntrack module a bit faster if netns are in use, from
  Vladimir Davydov.

* Use checksum_partial in ip6t_NPT, from YOSHIFUJI Hideaki.

* Sparse fix for nf_conntrack, from Stephen Hemminger.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-25 12:11:44 -04:00
Johannes Berg
382a103b2b mac80211: fix idle handling sequence
Corey Richardson reported that my idle handling cleanup
(commit fd0f979a1b, "mac80211: simplify idle handling")
broke ath9k_htc. The reason appears to be that it wants
to go out of idle before switching channels. To fix it,
reimplement that sequence.

Reported-by: Corey Richardson <corey@octayn.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-25 16:50:18 +01:00
Trond Myklebust
1166fde6a9 SUNRPC: Add barriers to ensure read ordering in rpc_wake_up_task_queue_locked
We need to be careful when testing task->tk_waitqueue in
rpc_wake_up_task_queue_locked, because it can be changed while we
are holding the queue->lock.
By adding appropriate memory barriers, we can ensure that it is safe to
test task->tk_waitqueue for equality if the RPC_TASK_QUEUED bit is set.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
2013-03-25 11:23:40 -04:00
Pablo Neira Ayuso
deadcfc332 netfilter: nfnetlink_acct: return -EINVAL if object name is empty
If user-space tries to create accounting object with an empty
name, then return -EINVAL.

Reported-by: Michael Zintakis <michael.zintakis@googlemail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-25 14:21:30 +01:00
Wei Yongjun
558724a5b2 netfilter: nfnetlink_queue: fix error return code in nfnetlink_queue_init()
Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-25 14:21:27 +01:00
Johannes Berg
3fbd45ca8d mac80211: fix remain-on-channel cancel crash
If a ROC item is canceled just as it expires, the work
struct may be scheduled while it is running (and waiting
for the mutex). This results in it being run after being
freed, which obviously crashes.

To fix this don't free it when aborting is requested but
instead mark it as "to be freed", which makes the work a
no-op and allows freeing it outside.

Cc: stable@vger.kernel.org [3.6+]
Reported-by: Jouni Malinen <j@w1.fi>
Tested-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-25 13:50:33 +01:00
Mathias Krause
799ef90c55 xfrm: Fix esn sequence number diff calculation in xfrm_replay_notify_esn()
Commit 0017c0b "xfrm: Fix replay notification for esn." is off by one
for the sequence number wrapped case as UINT_MAX is 0xffffffff, not
0x100000000. ;)

Just calculate the diff like done everywhere else in the file.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-25 07:25:50 +01:00
Yuchung Cheng
7ebe183c6d tcp: undo spurious timeout after SACK reneging
On SACK reneging the sender immediately retransmits and forces a
timeout but disables Eifel (undo). If the (buggy) receiver does not
drop any packet this can trigger a false slow-start retransmit storm
driven by the ACKs of the original packets. This can be detected with
undo and TCP timestamps.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:27:28 -04:00
Hong zhi guo
9b46922e15 bridge: fix crash when set mac address of br interface
When I tried to set mac address of a bridge interface to a mac
address which already learned on this bridge, I got system hang.

The cause is straight forward: function br_fdb_change_mac_address
calls fdb_insert with NULL source nbp. Then an fdb lookup is
performed. If an fdb entry is found and it's local, it's OK. But
if it's not local, source is dereferenced for printk without NULL
check.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:27:28 -04:00
Cong Wang
4a7df340ed 8021q: fix a potential use-after-free
vlan_vid_del() could possibly free ->vlan_info after a RCU grace
period, however, we may still refer to the freed memory area
by 'grp' pointer. Found by code inspection.

This patch moves vlan_vid_del() as behind as possible.

Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:27:28 -04:00
Eric Dumazet
9979a55a83 net: remove a WARN_ON() in net_enable_timestamp()
The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false
positive, in socket clone path, run from softirq context :

[ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80()
[ 3641.668811] Call Trace:
[ 3641.671254]  <IRQ>  [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0
[ 3641.677871]  [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20
[ 3641.683683]  [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80
[ 3641.689668]  [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450
[ 3641.695222]  [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170
[ 3641.701213]  [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820
[ 3641.707663]  [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670
[ 3641.713354]  [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0
[ 3641.719425]  [<ffffffff807af63a>] tcp_check_req+0x3da/0x530
[ 3641.724979]  [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80
[ 3641.730964]  [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0
[ 3641.736430]  [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0
[ 3641.741985]  [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0

Its safe at this point because the parent socket owns a reference
on the netstamp_needed, so we cant have a 0 -> 1 transition, which
requires to lock a mutex.

Instead of refining the check, lets remove it, as all known callers
are safe. If it ever changes in the future, static_key_slow_inc()
will complain anyway.

Reported-by: Laurent Chavey <chavey@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:27:27 -04:00
Hong zhi guo
7b99a99390 bridge: avoid br_ifinfo_notify when nothing changed
When neither IFF_BRIDGE nor IFF_BRIDGE_PORT is set,
and afspec == NULL but  protinfo != NULL, we run into
"if (err == 0) br_ifinfo_notify(RTM_NEWLINK, p);" with
random value in ret.

Thanks to Sergei for pointing out the error in commit comments.

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Florian Fainelli
5e95329b70 dsa: add device tree bindings to register DSA switches
This patch adds support for registering DSA switches using Device Tree
bindings. Note that we support programming the switch routing table even
though no in-tree user seems to require it. I tested this on Armada 370
with a Marvell 88E6172 (not supported by mainline yet).

Signed-off-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Hannes Frederic Sowa
eec2e6185f ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling
Hello!

After patch 1 got accepted to net-next I will also send a patch to
netfilter-devel to make the corresponding changes to the netfilter
reassembly logic.

Thanks,

  Hannes

-- >8 --
[PATCH 2/2] ipv6: implement RFC3168 5.3 (ecn protection) for ipv6 fragmentation handling

This patch also ensures that INET_ECN_CE is propagated if one fragment
had the codepoint set.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Hannes Frederic Sowa
be991971d5 inet: generalize ipv4-only RFC3168 5.3 ecn fragmentation handling for future use by ipv6
This patch just moves some code arround to make the ip4_frag_ecn_table
and IPFRAG_ECN_* constants accessible from the other reassembly engines. I
also renamed ip4_frag_ecn_table to ip_frag_ecn_table.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:30 -04:00
Nicolas Dichtel
63998ac24f ipv6: provide addr and netconf dump consistency info
This patch adds a dev_addr_genid for IPv6. The goal is to use it, combined with
dev_base_seq to check if a change occurs during a netlink dump.
If a change is detected, the flag NLM_F_DUMP_INTR is set in the first message
after the dump was interrupted.

Note that only dump of unicast addresses is checked (multicast and anycast are
not checked).

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:29 -04:00
Nicolas Dichtel
0465277f6b ipv4: provide addr and netconf dump consistency info
This patch takes benefit of dev_addr_genid and dev_base_seq to check if a change
occurs during a netlink dump. If a change is detected, the flag NLM_F_DUMP_INTR
is set in the first message after the dump was interrupted.

Note that seq and prev_seq must be reset between each family in rtnl_dump_all()
because they are specific to each family.

Reported-by: Junwei Zhang <junwei.zhang@6wind.com>
Reported-by: Hongjun Li <hongjun.li@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-24 17:16:29 -04:00
Ben Greear
370bd00593 mac80211: Don't restart sta-timer if not associated.
I found another crash when deleting lots of virtual stations
in a congested environment.  I think the problem is that
the ieee80211_mlme_notify_scan_completed could call
ieee80211_restart_sta_timer for a stopped interface
that was about to be deleted.

With the following patch I am unable to reproduce the
crash.

Signed-off-by: Ben Greear <greearb@candelatech.com>
[move check, also make the same change in mesh]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-24 11:15:59 +01:00
Johannes Berg
f9f475292d cfg80211: always check for scan end on P2P device
If a P2P device wdev is removed while it has a scan, then the
scan completion might crash later as it is already freed by
that time. To avoid the crash always check the scan completion
when the P2P device is being removed for some reason. If the
driver already canceled it, don't want and free it, otherwise
warn and leak it to avoid later crashes.

In order to do this, locking needs to be changed away from the
rdev mutex (which can't always be guaranteed). For now, use
the sched_scan_mtx instead, I'll rename it to just scan_mtx in
a later patch.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-24 11:15:58 +01:00
Dan Carpenter
1b7c92b905 l2tp: calling the ref() instead of deref()
This is a cut and paste typo.  We call ->ref() a second time instead
of ->deref().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 14:39:24 -04:00
David S. Miller
ea3d1cc285 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull to get the thermal netlink multicast group name fix, otherwise
the assertion added in net-next to netlink to detect that kind of bug
makes systems unbootable for some folks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 12:53:09 -04:00
Thomas Graf
2fa70df935 decnet: Move rtm_dn_policy to dn_route to make it available if !CONFIG_DECNET_ROUTER
Otherwise build fails with CONFIG_DECNET && !CONFIG_DECNET_ROUTER

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 12:51:59 -04:00
Eric Dumazet
f4541d60a4 tcp: preserve ACK clocking in TSO
A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:34:03 -04:00
Thomas Graf
661d2967b3 rtnetlink: Remove passing of attributes into rtnl_doit functions
With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:31:16 -04:00
Thomas Graf
58d7d8f9b2 decnet: Parse netlink attributes on our own
decnet is the only subsystem left that is relying on the global
netlink attribute buffer rta_buf. It's horrible design and we
want to get rid of it.

This converts all of decnet to do implicit attribute parsing. It
also gets rid of the error prone struct dn_kern_rta.

Yes, the fib_magic() stuff is not pretty.

It's compiled tested but I need someone with appropriate hardware
to test the patch since I don't have access to it.

Cc: linux-decnet-user@lists.sourceforge.net
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:31:16 -04:00
Cong Wang
d6a8c36dd6 udp: increase inner ip header ID during segmentation
Similar to GRE tunnel, UDP tunnel should take care of IP header ID
too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:23:34 -04:00
Cong Wang
10c0d7ed32 ip_gre: increase inner ip header ID during segmentation
According to the previous discussion [1] on netdev list, DaveM insists
we should increase the IP header ID for each segmented packets.
This patch fixes it.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>

1. http://marc.info/?t=136384172700001&r=1&w=2
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-22 10:23:34 -04:00
Andrey Vagin
eaaa313926 netlink: Diag core and basic socket info dumping (v2)
The netlink_diag can be built as a module, just like it's done in
unix sockets.

The core dumping message carries the basic info about netlink sockets:
family, type and protocol, portis, dst_group, dst_portid, state.

Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.

Netlink sockets cab be filtered by protocols.

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

The file /proc/net/netlink doesn't provide enough information for
dumping netlink sockets. It doesn't provide dst_group, dst_portid,
groups above 32.

v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.

Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 12:38:03 -04:00
Andrey Vagin
0f29c76864 net: prepare netlink code for netlink diag
Move a few declarations in a header.

Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 12:38:02 -04:00
Zefan Li
4021db9a0d net: remove redundant ifdef CONFIG_CGROUPS
The cgroup code has been surrounded by ifdef CONFIG_NET_CLS_CGROUP
and CONFIG_NETPRIO_CGROUP.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:51 -04:00
Yuchung Cheng
e33099f96d tcp: implement RFC5682 F-RTO
This patch implements F-RTO (foward RTO recovery):

When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted.  Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.

The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss().  The basic version is not supported
because SACK enhanced version also works for non-SACK connections.

The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.

Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:51 -04:00
Yuchung Cheng
ab42d9ee3d tcp: refactor CA_Loss state processing
Consolidate all of TCP CA_Loss state processing in
tcp_fastretrans_alert() into a new function called tcp_process_loss().
This is to prepare the new F-RTO implementation in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:51 -04:00
Yuchung Cheng
9b44190dc1 tcp: refactor F-RTO
The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:50 -04:00
Johannes Berg
8b305780ed mac80211: fix virtual monitor interface locking
The virtual monitor interface has a locking issue, it calls
into the channel context code with the iflist mutex held
which isn't allowed since it is usually acquired the other
way around. The mutex is still required for the interface
iteration, but need not be held across the channel calls.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20 22:26:35 +01:00
Johannes Berg
ce1eadda6b cfg80211: fix wdev tracing crash
Arend reported a crash in tracing if the driver returns an
ERR_PTR() value from the add_virtual_intf() callback. This
is due to the tracing then still attempting to dereference
the "pointer", fix this by using IS_ERR_OR_NULL().

Reported-by: Arend van Spriel <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-20 22:21:31 +01:00
John W. Linville
5470b462c3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-03-20 15:24:57 -04:00
John W. Linville
b9d5319041 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-20 14:26:37 -04:00
Chris Metcalf
8fdc929f57 dynticks: avoid flow_cache_flush() interrupting every core
Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.

With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores.  When we are trying to
isolate a set of cpus from interrupts, this is important to do.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:28:39 -04:00
Daniel Borkmann
3e5289d5e3 filter: add ANC_PAY_OFFSET instruction for loading payload start offset
It is very useful to do dynamic truncation of packets. In particular,
we're interested to push the necessary header bytes to the user space and
cut off user payload that should probably not be transferred for some reasons
(e.g. privacy, speed, or others). With the ancillary extension PAY_OFFSET,
we can load it into the accumulator, and return it. E.g. in bpfc syntax ...

        ld #poff        ; { 0x20, 0, 0, 0xfffff034 },
        ret a           ; { 0x16, 0, 0, 0x00000000 },

... as a filter will accomplish this without having to do a big hackery in
a BPF filter itself. Follow-up JIT implementations are welcome.

Thanks to Eric Dumazet for suggesting and discussing this during the
Netfilter Workshop in Copenhagen.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:15:45 -04:00
Daniel Borkmann
f77668dc25 net: flow_dissector: add __skb_get_poff to get a start offset to payload
__skb_get_poff() returns the offset to the payload as far as it could
be dissected. The main user is currently BPF, so that we can dynamically
truncate packets without needing to push actual payload to the user
space and instead can analyze headers only.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:15:45 -04:00
David S. Miller
61816596d1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull in the 'net' tree to get Daniel Borkmann's flow dissector
infrastructure change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:46:26 -04:00
Kees Cook
896ee0eee6 net/irda: add missing error path release_sock call
This makes sure that release_sock is called for all error conditions in
irda_getsockopt.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Brad Spengler <spender@grsecurity.net>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:23:13 -04:00
Martin Fuzzey
283951f95b ipconfig: Fix newline handling in log message.
When using ipconfig the logs currently look like:

Single name server:
[    3.467270] IP-Config: Complete:
[    3.470613]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.480670]      host=infigo-1, domain=, nis-domain=(none)
[    3.486166]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.492910]      nameserver0=172.16.42.1[    3.496853] ALSA device list:

Three name servers:
[    3.496949] IP-Config: Complete:
[    3.500293]      device=eth0, hwaddr=ac🇩🇪48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
[    3.510367]      host=infigo-1, domain=, nis-domain=(none)
[    3.515864]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
[    3.522635]      nameserver0=172.16.42.1, nameserver1=172.16.42.100
[    3.529149] , nameserver2=172.16.42.200

Fix newline handling for these cases

Signed-off-by: Martin Fuzzey <mfuzzey@parkeon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:15:58 -04:00
Daniel Borkmann
8ed781668d flow_keys: include thoff into flow_keys for later usage
In skb_flow_dissect(), we perform a dissection of a skbuff. Since we're
doing the work here anyway, also store thoff for a later usage, e.g. in
the BPF filter.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:14:36 -04:00
Tom Parkin
f6e16b299b l2tp: unhash l2tp sessions on delete, not on free
If we postpone unhashing of l2tp sessions until the structure is freed, we
risk:

 1. further packets arriving and getting queued while the pseudowire is being
    closed down
 2. the recv path hitting "scheduling while atomic" errors in the case that
    recv drops the last reference to a session and calls l2tp_session_free
    while in atomic context

As such, l2tp sessions should be unhashed from l2tp_core data structures early
in the teardown process prior to calling pseudowire close.  For pseudowires
like l2tp_ppp which have multiple shutdown codepaths, provide an unhash hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
7b7c0719cd l2tp: avoid deadlock in l2tp stats update
l2tp's u64_stats writers were incorrectly synchronised, making it possible to
deadlock a 64bit machine running a 32bit kernel simply by sending the l2tp
code netlink commands while passing data through l2tp sessions.

Previous discussion on netdev determined that alternative solutions such as
spinlock writer synchronisation or per-cpu data would bring unjustified
overhead, given that most users interested in high volume traffic will likely
be running 64bit kernels on 64bit hardware.

As such, this patch replaces l2tp's use of u64_stats with atomic_long_t,
thereby avoiding the deadlock.

Ref:
http://marc.info/?l=linux-netdev&m=134029167910731&w=2
http://marc.info/?l=linux-netdev&m=134079868111131&w=2

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
cf2f5c886a l2tp: push all ppp pseudowire shutdown through .release handler
If userspace deletes a ppp pseudowire using the netlink API, either by
directly deleting the session or by deleting the tunnel that contains the
session, we need to tear down the corresponding pppox channel.

Rather than trying to manage two pppox unbind codepaths, switch the netlink
and l2tp_core session_close handlers to close via. the l2tp_ppp socket
.release handler.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
4c6e2fd354 l2tp: purge session reorder queue on delete
Add calls to l2tp_session_queue_purge as a part of l2tp_tunnel_closeall
and l2tp_session_delete.  Pseudowire implementations which are deleted only
via. l2tp_core l2tp_session_delete calls can dispense with their own code for
flushing the reorder queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
48f72f92b3 l2tp: add session reorder queue purge function to core
If an l2tp session is deleted, it is necessary to delete skbs in-flight
on the session's reorder queue before taking it down.

Rather than having each pseudowire implementation reaching into the
l2tp_session struct to handle this itself, provide a function in l2tp_core to
purge the session queue.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
02d13ed5f9 l2tp: don't BUG_ON sk_socket being NULL
It is valid for an existing struct sock object to have a NULL sk_socket
pointer, so don't BUG_ON in l2tp_tunnel_del_work if that should occur.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:39 -04:00
Tom Parkin
8abbbe8ff5 l2tp: take a reference for kernel sockets in l2tp_tunnel_sock_lookup
When looking up the tunnel socket in struct l2tp_tunnel, hold a reference
whether the socket was created by the kernel or by userspace.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
2b551c6e7d l2tp: close sessions before initiating tunnel delete
When a user deletes a tunnel using netlink, all the sessions in the tunnel
should also be deleted.  Since running sessions will pin the tunnel socket
with the references they hold, have the l2tp_tunnel_delete close all sessions
in a tunnel before finally closing the tunnel socket.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
936063175a l2tp: close sessions in ip socket destroy callback
l2tp_core hooks UDP's .destroy handler to gain advance warning of a tunnel
socket being closed from userspace.  We need to do the same thing for
IP-encapsulation sockets.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
e34f4c7050 l2tp: export l2tp_tunnel_closeall
l2tp_core internally uses l2tp_tunnel_closeall to close all sessions in a
tunnel when a UDP-encapsulation socket is destroyed.  We need to do something
similar for IP-encapsulation sockets.

Export l2tp_tunnel_closeall as a GPL symbol to enable l2tp_ip and l2tp_ip6 to
call it from their .destroy handlers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
9980d001ce l2tp: add udp encap socket destroy handler
L2TP sessions hold a reference to the tunnel socket to prevent it going away
while sessions are still active.  However, since tunnel destruction is handled
by the sock sk_destruct callback there is a catch-22: a tunnel with sessions
cannot be deleted since each session holds a reference to the tunnel socket.
If userspace closes a managed tunnel socket, or dies, the tunnel will persist
and it will be neccessary to individually delete the sessions using netlink
commands.  This is ugly.

To prevent this occuring, this patch leverages the udp encapsulation socket
destroy callback to gain early notification when the tunnel socket is closed.
This allows us to safely close the sessions running in the tunnel, dropping
the tunnel socket references in the process.  The tunnel socket is then
destroyed as normal, and the tunnel resources deallocated in sk_destruct.

While we're at it, ensure that l2tp_tunnel_closeall correctly drops session
references to allow the sessions to be deleted rather than leaking.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Tom Parkin
44046a593e udp: add encap_destroy callback
Users of udp encapsulation currently have an encap_rcv callback which they can
use to hook into the udp receive path.

In situations where a encapsulation user allocates resources associated with a
udp encap socket, it may be convenient to be able to also hook the proto
.destroy operation.  For example, if an encap user holds a reference to the
udp socket, the destroy hook might be used to relinquish this reference.

This patch adds a socket destroy hook into udp, which is set and enabled
in the same way as the existing encap_rcv hook.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:10:38 -04:00
Masatake YAMATO
f1e79e2080 genetlink: trigger BUG_ON if a group name is too long
Trigger BUG_ON if a group name is longer than GENL_NAMSIZ.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 12:05:51 -04:00
Thierry Escande
b315515544 NFC: llcp: Remove possible double call to kfree_skb
kfree_skb was called twice when the socket receive queue is full

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-20 16:46:40 +01:00
David S. Miller
90b2621fd4 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains 7 Netfilter/IPVS fixes for 3.9-rc, they are:

* Restrict IPv6 stateless NPT targets to the mangle table. Many users are
  complaining that this target does not work in the nat table, which is the
  wrong table for it, from Florian Westphal.

* Fix possible use before initialization in the netns init path of several
  conntrack protocol trackers (introduced recently while improving conntrack
  netns support), from Gao Feng.

* Fix incorrect initialization of copy_range in nfnetlink_queue, spotted
  by Eric Dumazet during the NFWS2013, patch from myself.

* Fix wrong calculation of next SCTP chunk in IPVS, from Julian Anastasov.

* Remove rcu_read_lock section in IPVS while calling ipv4_update_pmtu
  not required anymore after change introduced in 3.7, again from Julian.

* Fix SYN looping in IPVS state sync if the backup is used a real server
  in DR/TUN modes, this required a new /proc entry to disable the director
  function when acting as backup, also from Julian.

* Remove leftover IP_NF_QUEUE Kconfig after ip_queue removal, noted by
  Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 10:23:52 -04:00
Steffen Klassert
0017c0b575 xfrm: Fix replay notification for esn.
We may miscalculate the sequence number difference from the
last time we send a notification if a sequence number wrap
occured in the meantime. We fix this by adding a separate
replay notify function for esn. Here we take the high bits
of the sequence number into account to calculate the
difference.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-03-20 11:57:52 +01:00
Samuel Ortiz
bec964ed3b NFC: llcp: Detach socket from process context only when releasing the socket
Calling sock_orphan when e.g. the NFC adapter is removed can lead to
kernel crashes when e.g. a connection less client is sleeping on the
Rx workqueue, waiting for data to show up.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-20 11:30:37 +01:00
Paul Bolle
3dd6664fac netfilter: remove unused "config IP_NF_QUEUE"
Kconfig symbol IP_NF_QUEUE is unused since commit
d16cf20e2f ("netfilter: remove ip_queue
support"). Let's remove it too.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-20 00:11:43 +01:00
Willem de Bruijn
77f65ebdca packet: packet fanout rollover during socket overload
Changes:
  v3->v2: rebase (no other changes)
          passes selftest
  v2->v1: read f->num_members only once
          fix bug: test rollover mode + flag

Minimize packet drop in a fanout group. If one socket is full,
roll over packets to another from the group. Maintain flow
affinity during normal load using an rxhash fanout policy, while
dispersing unexpected traffic storms that hit a single cpu, such
as spoofed-source DoS flows. Rollover breaks affinity for flows
arriving at saturated sockets during those conditions.

The patch adds a fanout policy ROLLOVER that rotates between sockets,
filling each socket before moving to the next. It also adds a fanout
flag ROLLOVER. If passed along with any other fanout policy, the
primary policy is applied until the chosen socket is full. Then,
rollover selects another socket, to delay packet drop until the
entire system is saturated.

Probing sockets is not free. Selecting the last used socket, as
rollover does, is a greedy approach that maximizes chance of
success, at the cost of extreme load imbalance. In practice, with
sufficiently long queues to absorb bursts, sockets are drained in
parallel and load balance looks uniform in `top`.

To avoid contention, scales counters with number of sockets and
accesses them lockfree. Values are bounds checked to ensure
correctness.

Tested using an application with 9 threads pinned to CPUs, one socket
per thread and sufficient busywork per packet operation to limits each
thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
packets, a FANOUT_CPU setup processes 32 Kpps in total without this
patch, 270 Kpps with the patch. Tested with read() and with a packet
ring (V1).

Also, passes psock_fanout.c unit test added to selftests.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 17:15:04 -04:00
Linus Torvalds
7b1b3fd74e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix ARM BPF JIT handling of negative 'k' values, from Chen Gang.

 2) Insufficient space reserved for bridge netlink values, fix from
    Stephen Hemminger.

 3) Some dst_neigh_lookup*() callers don't interpret error pointer
    correctly, fix from Zhouyi Zhou.

 4) Fix transport match in SCTP active_path loops, from Xugeng Zhang.

 5) Fix qeth driver handling of multi-order SKB frags, from Frank
    Blaschka.

 6) fec driver is missing napi_disable() call, resulting in crashes on
    unload, from Georg Hofmann.

 7) Don't try to handle PMTU events on a listening socket, fix from Eric
    Dumazet.

 8) Fix timestamp location calculations in IP option processing, from
    David Ward.

 9) FIB_TABLE_HASHSZ setting is not controlled by the correct kconfig
    tests, from Denis V Lunev.

10) Fix TX descriptor push handling in SFC driver, from Ben Hutchings.

11) Fix isdn/hisax and tulip/de4x5 kconfig dependencies, from Arnd
    Bergmann.

12) bnx2x statistics don't handle 4GB rollover correctly, fix from
    Maciej Żenczykowski.

13) Openvswitch bug fixes for vport del/new error reporting, missing
    genlmsg_end() call in netlink processing, and mis-parsing of
    LLC/SNAP ethernet types.  From Rich Lane.

14) SKB pfmemalloc state should only be propagated from the head page of
    a compound page, fix from Pavel Emelyanov.

15) Fix link handling in tg3 driver for 5715 chips when autonegotation
    is disabled.  From Nithin Sujir.

16) Fix inverted test of cpdma_check_free_tx_desc return value in
    davinci_emac driver, from Mugunthan V N.

17) vlan_depth is incorrectly calculated in skb_network_protocol(), from
    Li RongQing.

18) Fix probing of Gobi 1K devices in qmi_wwan driver, and fix NCM
    device mode backwards compat in cdc_ncm driver.  From Bjørn Mork.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  inet: limit length of fragment queue hash table bucket lists
  qeth: Fix scatter-gather regression
  qeth: Fix invalid router settings handling
  qeth: delay feature trace
  tcp: dont handle MTU reduction on LISTEN socket
  bnx2x: fix occasional statistics off-by-4GB error
  vhost/net: fix heads usage of ubuf_info
  bridge: Add support for setting BR_ROOT_BLOCK flag.
  bnx2x: add missing napi deletion in error path
  drivers: net: ethernet: ti: davinci_emac: fix usage of cpdma_check_free_tx_desc()
  ethernet/tulip: DE4x5 needs VIRT_TO_BUS
  isdn: hisax: netjet requires VIRT_TO_BUS
  net: cdc_ncm, cdc_mbim: allow user to prefer NCM for backwards compatibility
  rtnetlink: Mask the rta_type when range checking
  Revert "ip_gre: make ipgre_tunnel_xmit() not parse network header as IP unconditionally"
  Fix dst_neigh_lookup/dst_neigh_lookup_skb return value handling bug
  smsc75xx: configuration help incorrectly mentions smsc95xx
  net: fec: fix missing napi_disable call
  net: fec: restart the FEC when PHY speed changes
  skb: Propagate pfmemalloc on skb from head page only
  ...
2013-03-19 13:20:51 -07:00
Vladimir Davydov
dece40e848 netfilter: nf_conntrack: speed up module removal path if netns in use
The patch introduces nf_conntrack_cleanup_net_list(), which cleanups
nf_conntrack for a list of netns and calls synchronize_net() only once
for them all. This should reduce netns destruction time.

I've measured cleanup time for 1k dummy net ns. Here are the results:

 <without the patch>
 # modprobe nf_conntrack
 # time modprobe -r nf_conntrack

 real	0m10.337s
 user	0m0.000s
 sys	0m0.376s

 <with the patch>
 # modprobe nf_conntrack
 # time modprobe -r nf_conntrack

 real    0m5.661s
 user    0m0.000s
 sys     0m0.216s

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19 17:08:31 +01:00
stephen hemminger
493763684f netfilter: nf_conntrack: add include to fix sparse warning
Include header file to pickup prototype of nf_nat_seq_adjust_hook

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19 17:05:53 +01:00
Eric Dumazet
ae08ce0021 netfilter: nfnetlink_queue: zero copy support
nfqnl_build_packet_message() actually copy the packet
inside the netlink message, while it can instead use
zero copy.

Make sure the skb 'copy' is the last component of the
cooked netlink message, as we cant add anything after it.

Patch cooked in Copenhagen at Netfilter Workshop ;)

Still to be addressed in separate patches :

-GRO/GSO packets are segmented in nf_queue()
and checksummed in nfqnl_build_packet_message().

Proper support for GSO/GRO packets (no segmentation,
and no checksumming) needs application cooperation, if we
want no regressions.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19 17:02:24 +01:00
Pablo Neira Ayuso
e844a92843 netfilter: ctnetlink: allow to dump expectation per master conntrack
This patch adds the ability to dump all existing expectations
per master conntrack.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-19 17:02:18 +01:00
Baker Zhang
b5fb82c48b xfrm: use xfrm direction when lookup policy
because xfrm policy direction has same value with corresponding
flow direction, so this problem is covered.

In xfrm_lookup and __xfrm_policy_check, flow_cache_lookup is used to
accelerate the lookup.

Flow direction is given to flow_cache_lookup by policy_to_flow_dir.

When the flow cache is mismatched, callback 'resolver' is called.

'resolver' requires xfrm direction,
so convert direction back to xfrm direction.

Signed-off-by: Baker Zhang <baker.zhang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 10:35:11 -04:00
Hannes Frederic Sowa
5a3da1fe95 inet: limit length of fragment queue hash table bucket lists
This patch introduces a constant limit of the fragment queue hash
table bucket list lengths. Currently the limit 128 is choosen somewhat
arbitrary and just ensures that we can fill up the fragment cache with
empty packets up to the default ip_frag_high_thresh limits. It should
just protect from list iteration eating considerable amounts of cpu.

If we reach the maximum length in one hash bucket a warning is printed.
This is implemented on the caller side of inet_frag_find to distinguish
between the different users of inet_fragment.c.

I dropped the out of memory warning in the ipv4 fragment lookup path,
because we already get a warning by the slab allocator.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 10:28:36 -04:00
Oliver Hartkopp
c9bbb75f1d can: dump stack on protocol bugs
The rework of the kernel hlist implementation "hlist: drop the node parameter
from iterators" (b67bfe0d42) created some
fallout in the form of non matching comments and obsolete code.

Additionally to the cleanup this patch adds a WARN() statement to catch the
caller of the wrong filter removal request.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-19 09:44:34 -04:00
Julian Anastasov
bf93ad72cd ipvs: remove extra rcu lock
In 3.7 we added code that uses ipv4_update_pmtu but after commit
c5ae7d4192 (ipv4: must use rcu protection while calling fib_lookup)
the RCU lock is not needed.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:21:52 +09:00
Julian Anastasov
0c12582fbc ipvs: add backup_only flag to avoid loops
Dmitry Akindinov is reporting for a problem where SYNs are looping
between the master and backup server when the backup server is used as
real server in DR mode and has IPVS rules to function as director.

Even when the backup function is enabled we continue to forward
traffic and schedule new connections when the current master is using
the backup server as real server. While this is not a problem for NAT,
for DR and TUN method the backup server can not determine if a request
comes from client or from director.

To avoid such loops add new sysctl flag backup_only. It can be needed
for DR/TUN setups that do not need backup and director function at the
same time. When the backup function is enabled we stop any forwarding
and pass the traffic to the local stack (real server mode). The flag
disables the director function when the backup function is enabled.

For setups that enable backup function for some virtual services and
director function for other virtual services there should be another
more complex solution to support DR/TUN mode, may be to assign
per-virtual service syncid value, so that we can differentiate the
requests.

Reported-by: Dmitry Akindinov <dimak@stalker.com>
Tested-by: German Myzovsky <lawyer@sipnet.ru>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:21:51 +09:00
Julian Anastasov
b962abdc65 ipvs: fix some sparse warnings
Add missing __percpu annotations and make ip_vs_net_id static.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:18:38 +09:00
Julian Anastasov
e9836f24f2 ipvs: fix hashing in ip_vs_svc_hashkey
net is a pointer in host order, mix it properly
with other keys in network order. Fixes sparse warning.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 21:18:38 +09:00
Julian Anastasov
cf2e39429c ipvs: fix sctp chunk length order
Fix wrong but non-fatal access to chunk length.
sch->length should be in network order, next chunk should
be aligned to 4 bytes. Problem noticed in sparse output.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2013-03-19 09:37:27 +09:00
John W. Linville
8fa48cbdfb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2013-03-18 15:17:11 -04:00