Commit Graph

15241 Commits

Author SHA1 Message Date
David S. Miller
d4c4f07df1 bridge: Fix build of ipv6 multicast code.
Based upon a report from Stephen Rothwell:

--------------------
net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'

Introduced by commit 08b202b672 ("bridge
br_multicast: IPv6 MLD support") from the net tree.

csum_ipv6_magic is declared in net/ip6_checksum.h ...
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 10:16:54 -07:00
YOSHIFUJI Hideaki / 吉藤英明
1fafc7a935 bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.
Even with commit 32dec5dd02 ("bridge
br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
not appropriately initialized if IGMP/MLD snooping support is
compiled and disabled, so we can see garbage.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:26:58 -07:00
YOSHIFUJI Hideaki / 吉藤英明
4eb8b9031a bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only.
Even with commit 32dec5dd02 ("bridge
br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
not appropriately initialized if IGMP snooping support is
compiled and disabled, so we can see garbage.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:25:31 -07:00
Stefan Schmidt
93c0c8b4a5 ieee802154: Fix oops during ieee802154_sock_ioctl
Trying to run izlisten (from lowpan-tools tests) on a device that does not
exists I got the oops below. The problem is that we are using get_dev_by_name
without checking if we really get a device back. We don't in this case and
writing to dev->type generates this oops.

[Oops code removed by Dmitry Eremin-Solenikov]

If possible this patch should be applied to the current -rc fixes branch.

Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:20:32 -07:00
Eric Dumazet
4a4771a58e net: use sk_sleep()
Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:18:45 -07:00
Jiri Pirko
0db3f0f49a phonet: use phonet_pernet instead of directly net_generic
As in for example pppoe introduce phonet_pernet and use it instead of calling
net_generic directly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-26 11:18:44 -07:00
Juuso Oikarinen
0c86980817 mac80211: Fix sta->last_tx_rate setting with no-op rate control devices
The sta->last_tx_rate is traditionally updated just before transmitting a
frame based on information from the rate control algorithm. However, for
hardware drivers with IEEE80211_HW_HAS_RATE_CONTROL this is not performed,
as the rate control algorithm is not executed, and because the used rate is
not known before the frame has actually been transmitted.

This causes atleast a fixed 1Mb/s to be reported to user space. A few other
instances of code also rely on this information.

Fix this by setting the sta->last_tx_rate in tx_status handling. There, look
for last rates entry set by the driver, and use that as value for
sta->last_tx_rate.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-26 14:11:40 -04:00
Patrick McHardy
cb6a4e461f net: ipmr: add support for dumping routing tables over netlink
The ipmr /proc interface (ip_mr_cache) can't be extended to dump routes
from any tables but the main table in a backwards compatible fashion since
the output format ends in a variable amount of output interfaces.

Introduce a new netlink interface to dump multicast routes from all tables,
similar to the netlink interface for regular routes.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26 16:22:50 +02:00
Patrick McHardy
25239cee7e net: rtnetlink: decouple rtnetlink address families from real address families
Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.

This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.

This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26 16:13:54 +02:00
Patrick McHardy
3d0c9c4eb2 net: fib_rules: mark arguments to fib_rules_register const and __net_initdata
fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-26 16:02:04 +02:00
Eric Dumazet
6443bb1fc2 ipv6: Fix inet6_csk_bind_conflict()
Commit fda48a0d7a (tcp: bind() fix when many ports are bound)
introduced a bug on IPV6 part.
We should not call ipv6_addr_any(inet6_rcv_saddr(sk2)) but
ipv6_addr_any(inet6_rcv_saddr(sk)) because sk2 can be IPV4, while sk is
IPV6.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-25 15:09:42 -07:00
Jiri Pirko
b3c981d2bb netns: rename unregister_pernet_subsys parameter
Stay consistent with other functions and with comment also and name
pernet_operations parameter properly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-25 00:49:56 -07:00
Changli Gao
8c52d509e8 rps: optimize rps_get_cpu()
optimize rps_get_cpu().

don't initialize ports when we can get the ports. one memory access
for ports than two.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-24 22:50:10 -07:00
David S. Miller
b7d6a43211 Merge branch 'net-next-2.6_20100423a/br/br_multicast_v3' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next 2010-04-23 23:37:24 -07:00
Brian Haley
4b340ae20d IPv6: Complete IPV6_DONTFRAG support
Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:29 -07:00
Brian Haley
13b52cd446 IPv6: Add dontfrag argument to relevant functions
Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:28 -07:00
Brian Haley
793b147316 IPv6: data structure changes for new socket options
Add underlying data structure changes and basic setsockopt()
and getsockopt() support for IPV6_RECVPATHMTU, IPV6_PATHMTU,
and IPV6_DONTFRAG.  IPV6_PATHMTU is actually fully functional
at this point.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 23:35:28 -07:00
Jiri Pirko
3a73702863 l2tp_eth: fix memory allocation
Since .size is set properly in "struct pernet_operations l2tp_eth_net_ops",
allocating space for "struct l2tp_eth_net" by hand is not correct, even causes
memory leakage.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 16:37:33 -07:00
Jiri Pirko
e773aaff82 l2tp: fix memory allocation
Since .size is set properly in "struct pernet_operations l2tp_net_ops",
allocating space for "struct l2tp_net" by hand is not correct, even causes
memory leakage.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-23 16:37:32 -07:00
John W. Linville
3b51cc996e Merge branch 'master' into for-davem
Conflicts:
	drivers/net/wireless/ath/ath9k/phy.c
	drivers/net/wireless/iwlwifi/iwl-6000.c
	drivers/net/wireless/iwlwifi/iwl-debugfs.c
2010-04-23 14:43:45 -04:00
YOSHIFUJI Hideaki
08b202b672 bridge br_multicast: IPv6 MLD support.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:56 +09:00
YOSHIFUJI Hideaki
8ef2a9a598 bridge br_multicast: Make functions less ipv4 dependent.
Introduce struct br_ip{} to store ip address and protocol
and make functions more generic so that we can support
both IPv4 and IPv6 with less pain.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:55 +09:00
YOSHIFUJI Hideaki
6e7cb83707 ipv6 mcast: Introduce include/net/mld.h for MLD definitions.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2010-04-23 13:35:55 +09:00
Eric Dumazet
fda48a0d7a tcp: bind() fix when many ports are bound
Port autoselection done by kernel only works when number of bound
sockets is under a threshold (typically 30000).

When this threshold is over, we must check if there is a conflict before
exiting first loop in inet_csk_get_port()

Change inet_csk_bind_conflict() to forbid two reuse-enabled sockets to
bind on same (address,port) tuple (with a non ANY address)

Same change for inet6_csk_bind_conflict()

Reported-by: Gaspar Chilingarov <gasparch@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 19:06:06 -07:00
Andrew Hendry
5ebfbc06aa X25: Add if_x25.h and x25 to device identifiers
V2 Feedback from John Hughes.
- Add header for userspace implementations such as xot/xoe to use
- Use explicit values for interface stability
- No changes to driver patches

V1
- Use identifiers instead of magic numbers for X25 layer 3 to device interface.
- Also fixed checkpatch notes on updated code.

[ Add new user header to include/linux/Kbuild  -DaveM ]

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:12:36 -07:00
Paul LeoNerd Evans
40eaf96271 net: Socket filter ancilliary data access for skb->dev->type
Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving
access to skb->dev->type, as reported in the sll_hatype field.

When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all
interfaces, there doesn't appear to be a way for the filter program to
actually find out the underlying hardware type the packet was captured
on. This patch adds such ability.

This patch also handles the case where skb->dev can be NULL, such as on
netlink sockets.

Signed-off-by: Paul Evans <leonerd@leonerd.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:05:44 -07:00
Tom Herbert
aa2ea0586d tcp: fix outsegs stat for TSO segments
Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter.  Without
doing this, the counter can be off by orders of magnitude from the
actual number of segments sent.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 16:00:00 -07:00
Dan Carpenter
24acc68956 rdma: potential ERR_PTR dereference
In the original code, the "goto out" calls "rdma_destroy_id(cm_id);"
That isn't needed here and would cause problems because "cm_id" is an
ERR_PTR.  The new code just returns directly.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 15:57:26 -07:00
Dan Carpenter
80032cffb9 rtnetlink: potential ERR_PTR dereference
In the original code, if rtnl_create_link() returned an ERR_PTR then that
would get passed to rtnl_configure_link() which dereferences it.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 15:57:26 -07:00
Stephen Hemminger
e802af9cab IPv6: Generic TTL Security Mechanism (final version)
This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism.  

Not to users of mapped address; the IPV6 and IPV4 socket options are seperate.
The server does have to deal with both IPv4 and IPv6 socket options
and the client has to handle the different for each family.

On client:
	int ttl = 255;
	getaddrinfo(argv[1], argv[2], &hint, &result);

	for (rp = result; rp != NULL; rp = rp->ai_next) {
		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
		if (s < 0) continue;

		if (rp->ai_family == AF_INET) {
			setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl));
		} else if (rp->ai_family == AF_INET6) {
			setsockopt(s, IPPROTO_IPV6,  IPV6_UNICAST_HOPS, 
					&ttl, sizeof(ttl)))
		}
			
		if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) {
		   ...

On server:
	int minttl = 255 - maxhops;
   
	getaddrinfo(NULL, port, &hints, &result);
	for (rp = result; rp != NULL; rp = rp->ai_next) {
		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
		if (s < 0) continue;

		if (rp->ai_family == AF_INET6)
			setsockopt(s, IPPROTO_IPV6,  IPV6_MINHOPCOUNT,
					&minttl, sizeof(minttl));
		setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl));
			
		if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0)
			break
...

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 15:24:53 -07:00
David S. Miller
9ccb897594 net: Orphan and de-dst skbs earlier in xmit path.
This way GSO packets don't get handled differently.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-04-22 01:02:07 -07:00
Eric Dumazet
e326bed2f4 rps: immediate send IPI in process_backlog()
If some skb are queued to our backlog, we are delaying IPI sending at
the end of net_rx_action(), increasing latencies. This defeats the
queueing, since we want to quickly dispatch packets to the pool of
worker cpus, then eventually deeply process our packets.

It's better to send IPI before processing our packets in upper layers,
from process_backlog().

Change the _and_disable_irq suffix to _and_enable_irq(), since we enable
local irq in net_rps_action(), sorry for the confusion.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-22 00:22:45 -07:00
Jiri Olsa
f4f914b580 net: ipv6 bind to device issue
The issue raises when having 2 NICs both assigned the same
IPv6 global address.

If a sender binds to a particular NIC (SO_BINDTODEVICE),
the outgoing traffic is being sent via the first found.
The bonded device is thus not taken into an account during the
routing.

From the ip6_route_output function:

If the binding address is multicast, linklocal or loopback,
the RT6_LOOKUP_F_IFACE bit is set, but not for global address.

So binding global address will neglect SO_BINDTODEVICE-binded device,
because the fib6_rule_lookup function path won't check for the
flowi::oif field and take first route that fits.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Scott Otto <scott.otto@alcatel-lucent.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 22:59:24 -07:00
Johannes Berg
b002a86109 ethernet: print protocol in host byte order
Eric's recent patch added __force, but this
place would seem to require actually doing
a byte order conversion so the printk is
consistent across architectures.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 22:57:19 -07:00
Eric Dumazet
9a20e3197e net: Introduce skb_orphan_try()
At this point, skb->destructor is not the original one (stored in
DEV_GSO_CB(skb)->destructor)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 22:54:08 -07:00
Shan Wei
f2228f785a ipv6: allow to send packet after receiving ICMPv6 Too Big message with MTU field less than IPV6_MIN_MTU
According to RFC2460, PMTU is set to the IPv6 Minimum Link
MTU (1280) and a fragment header should always be included
after a node receiving Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU.

After receiving a ICMPv6 Too Big message reporting PMTU is
less than the IPv6 Minimum Link MTU, sctp *can't* send any
data/control chunk that total length including IPv6 head
and IPv6 extend head is less than IPV6_MIN_MTU(1280 bytes).

The failure occured in p6_fragment(), about reason
see following(take SHUTDOWN chunk for example):
sctp_packet_transmit (SHUTDOWN chunk, len=16 byte)
|------sctp_v6_xmit (local_df=0)
   |------ip6_xmit
       |------ip6_output (dst_allfrag is ture)
           |------ip6_fragment

In ip6_fragment(), for local_df=0, drops the the packet
and returns EMSGSIZE.

The patch fixes it with adding check length of skb->len.
In this case, Ipv6 not to fragment upper protocol data,
just only add a fragment header before it.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 22:48:26 -07:00
andrew hendry
2cec6b014d X25 fix dead unaccepted sockets
1, An X25 program binds and listens
2, calls arrive waiting to be accepted
3, Program exits without accepting
4, Sockets time out but don't get correctly cleaned up
5, cat /proc/net/x25/socket shows the dead sockets with bad inode fields.

This line borrowed from AX25 sets the dying socket so the timers clean up later.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 16:31:50 -07:00
Nicolas Dichtel
bc8e4b954e xfrm6: ensure to use the same dev when building a bundle
When building a bundle, we set dst.dev and rt6.rt6i_idev.
We must ensure to set the same device for both fields.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 16:25:30 -07:00
Eric Dumazet
989a297920 fasync: RCU and fine grained locking
kill_fasync() uses a central rwlock, candidate for RCU conversion, to
avoid cache line ping pongs on SMP.

fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
section instead during whole list scan.

Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
doesnt need its own implementation and can use fasync_helper(), to
reduce code size and complexity.

We can remove __kill_fasync() direct use in net/socket.c, and rename it
to kill_fasync_rcu().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 16:19:29 -07:00
David S. Miller
e5700aff14 tcp: Mark v6 response packets as CHECKSUM_PARTIAL
Otherwise we only get the checksum right for data-less TCP responses.

Noticed by Herbert Xu.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 14:59:20 -07:00
David S. Miller
f71b70e115 tcp: Fix ipv6 checksumming on response packets for real.
Commit 6651ffc8e8
("ipv6: Fix tcp_v6_send_response transport header setting.")
fixed one half of why ipv6 tcp response checksums were
invalid, but it's not the whole story.

If we're going to use CHECKSUM_PARTIAL for these things (which we are
since commit 2e8e18ef52 "tcp: Set
CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"), we can't be setting
buff->csum as we always have been here in tcp_v6_send_response.  We
need to leave it at zero.

Kill that line and checksums are good again.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 01:57:01 -07:00
David S. Miller
87eb367003 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-6000.c
	net/core/dev.c
2010-04-21 01:14:25 -07:00
David Howells
05d17608a6 net: Fix an RCU warning in dev_pick_tx()
Fix the following RCU warning in dev_pick_tx():

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
net/core/dev.c:1993 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by swapper/0:
 #0:  (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff81039e65>] run_timer_softirq+0x17b/0x278
 #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff812ea3eb>] dev_queue_xmit+0x14e/0x4dc

stack backtrace:
Pid: 0, comm: swapper Not tainted 2.6.34-rc5-cachefs #4
Call Trace:
 <IRQ>  [<ffffffff810516c4>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffff812ea4f6>] dev_queue_xmit+0x259/0x4dc
 [<ffffffff812ea3eb>] ? dev_queue_xmit+0x14e/0x4dc
 [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff81035362>] ? local_bh_enable_ip+0xbc/0xc1
 [<ffffffff812f0954>] neigh_resolve_output+0x24b/0x27c
 [<ffffffff8134f673>] ip6_output_finish+0x7c/0xb4
 [<ffffffff81350c34>] ip6_output2+0x256/0x261
 [<ffffffff81052324>] ? trace_hardirqs_on+0xd/0xf
 [<ffffffff813517fb>] ip6_output+0xbbc/0xbcb
 [<ffffffff8135bc5d>] ? fib6_force_start_gc+0x2b/0x2d
 [<ffffffff81368acb>] mld_sendpack+0x273/0x39d
 [<ffffffff81368858>] ? mld_sendpack+0x0/0x39d
 [<ffffffff81052099>] ? mark_held_locks+0x52/0x70
 [<ffffffff813692fc>] mld_ifc_timer_expire+0x24f/0x288
 [<ffffffff81039ed6>] run_timer_softirq+0x1ec/0x278
 [<ffffffff81039e65>] ? run_timer_softirq+0x17b/0x278
 [<ffffffff813690ad>] ? mld_ifc_timer_expire+0x0/0x288
 [<ffffffff81035531>] ? __do_softirq+0x69/0x140
 [<ffffffff8103556a>] __do_softirq+0xa2/0x140
 [<ffffffff81002e0c>] call_softirq+0x1c/0x28
 [<ffffffff81004b54>] do_softirq+0x38/0x80
 [<ffffffff81034f06>] irq_exit+0x45/0x47
 [<ffffffff810177c3>] smp_apic_timer_interrupt+0x88/0x96
 [<ffffffff810028d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff810488dd>] ? __atomic_notifier_call_chain+0x0/0x86
 [<ffffffff810096bf>] ? mwait_idle+0x6e/0x78
 [<ffffffff810096b6>] ? mwait_idle+0x65/0x78
 [<ffffffff810011cb>] cpu_idle+0x4d/0x83
 [<ffffffff81380b05>] rest_init+0xb9/0xc0
 [<ffffffff81380a4c>] ? rest_init+0x0/0xc0
 [<ffffffff8168dcf0>] start_kernel+0x392/0x39d
 [<ffffffff8168d2a3>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff8168d38b>] x86_64_start_kernel+0xe4/0xeb

An rcu_dereference() should be an rcu_dereference_bh().

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 01:09:44 -07:00
David S. Miller
e04997b13a Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-21 00:50:39 -07:00
Herbert Xu
6651ffc8e8 ipv6: Fix tcp_v6_send_response transport header setting.
My recent patch to remove the open-coded checksum sequence in
tcp_v6_send_response broke it as we did not set the transport
header pointer on the new packet.

Actually, there is code there trying to set the transport
header properly, but it sets it for the wrong skb ('skb'
instead of 'buff').

This bug was introduced by commit
a8fdf2b331 ("ipv6: Fix
tcp_v6_send_response(): it didn't set skb transport header")

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-21 00:47:15 -07:00
Rami Rosen
ccb7c7732e net: Remove two unnecessary exports (skbuff).
There is no need to export skb_under_panic() and skb_over_panic() in
skbuff.c, since these methods are used only in skbuff.c ; this patch
removes these two exports. It also marks these functions as 'static'
and removeS the extern declarations of them from
include/linux/skbuff.h

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 22:39:53 -07:00
Eric Dumazet
0eae88f31c net: Fix various endianness glitches
Sparse can help us find endianness bugs, but we need to make some
cleanups to be able to more easily spot real bugs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 19:06:52 -07:00
Eric Dumazet
8eabf95cb1 bridge: add a missing ntohs()
grec_nsrcs is in network order, we should convert to host horder in
br_multicast_igmp3_report()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 18:51:57 -07:00
David S. Miller
e46754f8c9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-04-20 17:57:56 -07:00
Eric Dumazet
aa39514516 net: sk_sleep() helper
Define a new function to return the waitqueue of a "struct sock".

static inline wait_queue_head_t *sk_sleep(struct sock *sk)
{
	return sk->sk_sleep;
}

Change all read occurrences of sk_sleep by a call to this function.

Needed for a future RCU conversion. sk_sleep wont be a field directly
available.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 16:37:13 -07:00
Juuso Oikarinen
7bdfcaaff5 mac80211: Fix ieee80211_sta_conn_mon_timer with hw connection monitoring
When IEEE80211_HW_CONNECTION_MONITOR is configured by the driver, starting
of ieee80211_sta_conn_mon_timer should be prevented, as it is then not needed.

This is currently partially the case. As it seems, when a probe-response is
received from the AP the timer is still restarted, thus restarting the host
based connection keep-alive mechanism. These probe-responses happen at least
when scanning while associated.

Fix this by preventing starting of the ieee80211_sta_conn_mon_timer in the
ieee80211_rx_mgmt_probe_resp function.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:52:40 -04:00
Holger Schurig
1289723ef2 mac80211: sample survey implementation for mac80211 & hwsim
This adds the survey function to both mac80211 itself and to mac80211_hwsim.
For the latter driver, we simply invent some noise level.A real driver which
cannot determine the real channel noise MUST NOT report any noise, especially
not a magically conjured one :-)

Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:50:52 -04:00
Daniel Yingqiang Ma
03ceedea97 ath9k: Group Key fix for VAPs
When I set up multiple VAPs with ath9k, I encountered an issue that
the traffic may be lost after a while.

The detailed phenomenon is
1. After a while the clients connected to one of these VAPs will get
into a state that no broadcast/multicast packets can be transfered
successfully while the unicast packets can be transfered normally.
2. Minutes latter the unitcast packets transfer will fail as well,
because the ARP entry is expired and it can't be freshed due to the
broadcast trouble.

It's caused by the group key overwritten and someone discussed this
issue in ath9k-devel maillist before, but haven't work out a fix yet.

I referred the method in madwifi, and made a patch for ath9k.
The method is to set the high bit of the sender(AP)'s address, and
associated that mac and the group key. It requires the hardware
supports multicast frame key search. It seems true for AR9160.

Not sure whether it's the correct way to fix this issue. But it seems
to work in my test. The patch is attached, feel free to revise it.

Signed-off-by: Daniel Yingqiang ma <yma.cool@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-20 11:50:51 -04:00
Jiri Pirko
ab9304717f net: emphasize rtnl lock required in call_netdevice_notifiers
Since netdev_chain is guarded by rtnl_lock, ASSERT_RTNL should be
present here to make sure that all callers of call_netdevice_notifiers
does the locking properly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 01:45:37 -07:00
Eric Dumazet
b249dcb82d rps: consistent rxhash
In case we compute a software skb->rxhash, we can generate a consistent
hash : Its value will be the same in both flow directions.

This helps some workloads, like conntracking, since the same state needs
to be accessed in both directions.

tbench + RFS + this patch gives better results than tbench with default
kernel configuration (no RPS, no RFS)

Also fixed some sparse warnings.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 01:18:06 -07:00
Eric Dumazet
e36fa2f7e9 rps: cleanups
struct softnet_data holds many queues, so consistent use "sd" name
instead of "queue" is better.

Adds a rps_ipi_queued() helper to cleanup enqueue_to_backlog()

Adds a _and_irq_disable suffix to net_rps_action() name, as David
suggested.

incr_input_queue_head() becomes input_queue_head_incr()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20 01:18:05 -07:00
Eric Dumazet
f5acb907dc rps: static functions
store_rps_map() & store_rps_dev_flow_table_cnt() are static.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 14:40:57 -07:00
Johannes Berg
67e0f39277 mac80211: add missing newline
One HT debugging printk is missing a newline,
add it.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:45:20 -04:00
Juuso Oikarinen
3393a608c4 mac80211: Prevent running sta_cleanup timer unnecessarily
The sta_cleanup timer is used to periodically expire buffered frames from the
tx buf. The timer is executing periodically, regardless of the need for it.
This is wasting resources.

Fix this simply by not restarting the sta_cleanup timer if the tx buffer was
empty. Restart the timer when there is some more tx-traffic.

Cc: Janne Ylälehto <janne.ylalehto@nokia.com>
Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:41:42 -04:00
Johannes Berg
2aab4c273a mac80211: fix stopping RX BA session from timer
Kalle reported that his system deadlocks since my
recent work in this area. The reason quickly became
apparent: we try to cancel_timer_sync() a timer
from within itself. Fix that by making the function
aware of the context it is called from.

Reported-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Kalle Valo <kvalo@adurom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:41:42 -04:00
Reinette Chatre
fe6f212ce1 mac80211: pass HT changes to driver when off channel
Since "mac80211: make off-channel work generic" drivers have not been
notified of configuration changes after association or authentication. This
caused more dependence on current state to ensure driver will be notified
when configuration changes occur. One such problem arises if off-channel is
in progress when HT information changes. Since HT is only enabled on the
"oper_channel" the driver will never be notified of this change. Usually
the driver is notified soon after of a BSS information change
(BSS_CHANGED_HT) ... but since the driver did not get a notification that
this is a HT channel the new BSS information does not make sense.

Fix this by also changing the off-channel information when HT is enabled
and thus cause driver to be notified correctly.

This fixes a problem in 4965 when associated with 5GHz 40MHz channel.
Without this patch the system can associate but is unable to transfer any
data, not even ping.

See http://bugzilla.intellinuxwireless.org/show_bug.cgi?id=2158

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:34:11 -04:00
Johannes Berg
b4bb5c3fd9 mac80211: remove bogus TX agg state assignment
When the addba timer expires but has no work to do,
it should not affect the state machine. If it does,
TX will not see the successfully established and we
can also crash trying to re-establish the session.

Cc: stable@kernel.org [2.6.32, 2.6.33]
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-19 16:34:11 -04:00
Eric Dumazet
88751275b8 rps: shortcut net_rps_action()
net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if
RPS is not active.

Tom Herbert used two bitmasks to hold information needed to send IPI,
but a single LIFO list seems more appropriate.

Move all RPS logic into net_rps_action() to cleanup net_rx_action() code
(remove two ifdefs)

Move rps_remote_softirq_cpus into softnet_data to share its first cache
line, filling an existing hole.

In a future patch, we could call net_rps_action() from process_backlog()
to make sure we send IPI before handling this cpu backlog.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-19 13:20:34 -07:00
Linus Torvalds
375db4810b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  gigaset: include cleanup cleanup
  packet : remove init_net restriction
  WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
  ip: Fix ip_dev_loopback_xmit()
  net: dev_pick_tx() fix
  fib: suppress lockdep-RCU false positive in FIB trie.
  tun: orphan an skb on tx
  forcedeth: fix tx limit2 flag check
  iwlwifi: work around bogus active chains detection
2010-04-19 07:27:45 -07:00
Eric Dumazet
fc6055a5ba net: Introduce skb_orphan_try()
Transmitted skb might be attached to a socket and a destructor, for
memory accounting purposes.

Traditionally, this destructor is called at tx completion time, when skb
is freed.

When tx completion is performed by another cpu than the sender, this
forces some cache lines to change ownership. XPS was an attempt to give
tx completion to initial cpu.

David idea is to call destructor right before giving skb to device (call
to ndo_start_xmit()). Because device queues are usually small, orphaning
skb before tx completion is not a big deal. Some drivers already do
this, we could do it in upper level.

There is one known exception to this early orphaning, called tx
timestamping. It needs to keep a reference to socket until device can
give a hardware or software timestamp.

This patch adds a skb_orphan_try() helper, to centralize all exceptions
to early orphaning in one spot, and use it in dev_hard_start_xmit().

"tbench 16" results on a Nehalem machine (2 X5570  @ 2.93GHz)
before: Throughput 4428.9 MB/sec 16 procs
after: Throughput 4448.14 MB/sec 16 procs

UDP should get even better results, its destructor being more complex,
since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18 02:39:41 -07:00
Eric Dumazet
9958da0501 net: remove time limit in process_backlog()
- There is no point to enforce a time limit in process_backlog(), since
other napi instances dont follow same rule. We can exit after only one
packet processed...
The normal quota of 64 packets per napi instance should be the norm, and
net_rx_action() already has its own time limit.
Note : /proc/net/core/dev_weight can be used to tune this 64 default
value.

- Use DEFINE_PER_CPU_ALIGNED for softnet_data definition.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-18 02:36:13 -07:00
Eric Dumazet
8770acf049 rps: rps_sock_flow_table is mostly read
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-17 00:54:36 -07:00
Tom Herbert
fec5e652e5 rfs: Receive Flow Steering
This patch implements receive flow steering (RFS).  RFS steers
received packets for layer 3 and 4 processing to the CPU where
the application for the corresponding flow is running.  RFS is an
extension of Receive Packet Steering (RPS).

The basic idea of RFS is that when an application calls recvmsg
(or sendmsg) the application's running CPU is stored in a hash
table that is indexed by the connection's rxhash which is stored in
the socket structure.  The rxhash is passed in skb's received on
the connection from netif_receive_skb.  For each received packet,
the associated rxhash is used to look up the CPU in the hash table,
if a valid CPU is set then the packet is steered to that CPU using
the RPS mechanisms.

The convolution of the simple approach is that it would potentially
allow OOO packets.  If threads are thrashing around CPUs or multiple
threads are trying to read from the same sockets, a quickly changing
CPU value in the hash table could cause rampant OOO packets--
we consider this a non-starter.

To avoid OOO packets, this solution implements two types of hash
tables: rps_sock_flow_table and rps_dev_flow_table.

rps_sock_table is a global hash table.  Each entry is just a CPU
number and it is populated in recvmsg and sendmsg as described above.
This table contains the "desired" CPUs for flows.

rps_dev_flow_table is specific to each device queue.  Each entry
contains a CPU and a tail queue counter.  The CPU is the "current"
CPU for a matching flow.  The tail queue counter holds the value
of a tail queue counter for the associated CPU's backlog queue at
the time of last enqueue for a flow matching the entry.

Each backlog queue has a queue head counter which is incremented
on dequeue, and so a queue tail counter is computed as queue head
count + queue length.  When a packet is enqueued on a backlog queue,
the current value of the queue tail counter is saved in the hash
entry of the rps_dev_flow_table.

And now the trick: when selecting the CPU for RPS (get_rps_cpu)
the rps_sock_flow table and the rps_dev_flow table for the RX queue
are consulted.  When the desired CPU for the flow (found in the
rps_sock_flow table) does not match the current CPU (found in the
rps_dev_flow table), the current CPU is changed to the desired CPU
if one of the following is true:

- The current CPU is unset (equal to RPS_NO_CPU)
- Current CPU is offline
- The current CPU's queue head counter >= queue tail counter in the
rps_dev_flow table.  This checks if the queue tail has advanced
beyond the last packet that was enqueued using this table entry.
This guarantees that all packets queued using this entry have been
dequeued, thus preserving in order delivery.

Making each queue have its own rps_dev_flow table has two advantages:
1) the tail queue counters will be written on each receive, so
keeping the table local to interrupting CPU s good for locality.  2)
this allows lockless access to the table-- the CPU number and queue
tail counter need to be accessed together under mutual exclusion
from netif_receive_skb, we assume that this is only called from
device napi_poll which is non-reentrant.

This patch implements RFS for TCP and connected UDP sockets.
It should be usable for other flow oriented protocols.

There are two configuration parameters for RFS.  The
"rps_flow_entries" kernel init parameter sets the number of
entries in the rps_sock_flow_table, the per rxqueue sysfs entry
"rps_flow_cnt" contains the number of entries in the rps_dev_flow
table for the rxqueue.  Both are rounded to power of two.

The obvious benefit of RFS (over just RPS) is that it achieves
CPU locality between the receive processing for a flow and the
applications processing; this can result in increased performance
(higher pps, lower latency).

The benefits of RFS are dependent on cache hierarchy, application
load, and other factors.  On simple benchmarks, we don't necessarily
see improvement and sometimes see degradation.  However, for more
complex benchmarks and for applications where cache pressure is
much higher this technique seems to perform very well.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR
test with 1 byte req. and resp.  The RPC test is an request/response
test similar in structure to netperf RR test ith 100 threads on
each host, but does more work in userspace that netperf.

e1000e on 8 core Intel
   No RFS or RPS		104K tps at 30% CPU
   No RFS (best RPS config):    290K tps at 63% CPU
   RFS				303K tps at 61% CPU

RPC test	tps	CPU%	50/90/99% usec latency	Latency StdDev
  No RFS/RPS	103K	48%	757/900/3185		4472.35
  RPS only:	174K	73%	415/993/2468		491.66
  RFS		223K	73%	379/651/1382		315.61

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 16:01:27 -07:00
Daniel Lezcano
1c4f019732 packet : remove init_net restriction
The af_packet protocol is used by Perl to do ioctls as reported by
Stephane Riviere:

"Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC
addresses of the network interface."

But in a new network namespace these ioctl fail because it is disabled for
a namespace different from the init_net_ns.

These two lines should not be there as af_inet and af_packet are
namespace aware since a long time now. I suppose we forget to remove these
lines because we sent the af_packet first, before af_inet was supported.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-16 15:41:04 -07:00
Nishant Sarmukadam
7834704be4 cfg80211: Avoid sending IWEVASSOCREQIE and IWEVASSOCRESPIE events with NULL event body
In a scenario, where a cfg80211 driver (station mode) does not send assoc request
and assoc response IEs in cfg80211_connect_result after a successful association
to an AP, cfg80211 sends IWEVASSOCREQIE and IWEVASSOCRESPIE to the user space
application with NULL data. This can cause an issue at the event recipient.

An example of this is when cfg80211 sends IWEVASSOCREQIE and IWEVASSOCRESPIE
events with NULL event body to wpa_supplicant. The wpa_supplicant overwrites
the assoc request and assoc response IEs for this station with NULL data.
If the association is WPA/WPA2, the wpa_supplicant is not able to generate
EAPOL handshake messages, since the IEs are NULL.

With the patch, req_ie and resp_ie will be NULL by avoiding the
assignment if the driver has not sent the IEs to cfg80211. The event sending
code sends the events only if resp_ie and req_ie are not NULL. This
will ensure that the events are not sent with NULL event body.

Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-16 15:32:00 -04:00
Shan Wei
b5d4399823 ipv6: fix the comment of ip6_xmit()
ip6_xmit() is used by upper transport protocol.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 23:36:38 -07:00
Shan Wei
4e15ed4d93 net: replace ipfragok with skb->local_df
As Herbert Xu said: we should be able to simply replace ipfragok
with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

The patch kills the ipfragok parameter of .queue_xmit().

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 23:36:37 -07:00
Shan Wei
0eecb78494 ipv6: cancel to setting local_df in ip6_xmit()
commit f88037(sctp: Drop ipfargok in sctp_xmit function)
has droped ipfragok and set local_df value properly.

So the change of commit 77e2f1(ipv6: Fix ip6_xmit to
send fragments if ipfragok is true) is not needed.
So the patch remove them.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 23:36:37 -07:00
Joe Perches
a4fbf8415c net/l2tp/l2tp_debugfs.c: Convert NIPQUAD to %pI4
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 15:37:13 -07:00
David S. Miller
3eb14b944f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-04-15 14:31:06 -07:00
Eric Dumazet
e30b38c298 ip: Fix ip_dev_loopback_xmit()
Eric Paris got following trace with a linux-next kernel

[   14.203970] BUG: using smp_processor_id() in preemptible [00000000]
code: avahi-daemon/2093
[   14.204025] caller is netif_rx+0xfa/0x110
[   14.204035] Call Trace:
[   14.204064]  [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110
[   14.204070]  [<ffffffff8142163a>] netif_rx+0xfa/0x110
[   14.204090]  [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0
[   14.204095]  [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0
[   14.204099]  [<ffffffff8145d610>] ip_local_out+0x20/0x30
[   14.204105]  [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0
[   14.204119]  [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400
[   14.204125]  [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790
[   14.204137]  [<ffffffff814891d5>] inet_sendmsg+0x45/0x80
[   14.204149]  [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110
[   14.204189]  [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380
[   14.204233]  [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b

While current linux-2.6 kernel doesnt emit this warning, bug is latent
and might cause unexpected failures.

ip_dev_loopback_xmit() runs in process context, preemption enabled, so
must call netif_rx_ni() instead of netif_rx(), to make sure that we
process pending software interrupt.

Same change for ip6_dev_loopback_xmit()

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 14:25:22 -07:00
David S. Miller
791f58c064 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6 2010-04-15 14:14:05 -07:00
John W. Linville
5c01d56693 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/wl12xx/wl1271_main.c
2010-04-15 16:21:34 -04:00
Patrick McHardy
8de53dfbf9 ipv4: ipmr: fix NULL pointer deref during unres queue destruction
Fix an oversight in ipmr_destroy_unres() - the net pointer is
unconditionally initialized to NULL, resulting in a NULL pointer
dereference later on.

Fix by adding a net pointer to struct mr_table and using it in
ipmr_destroy_unres().

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 13:31:29 +02:00
Patrick McHardy
b0ebb739a8 ipv4: ipmr: fix invalid cache resolving when adding a non-matching entry
The patch to convert struct mfc_cache to list_heads (ipv4: ipmr: convert
struct mfc_cache to struct list_head) introduced a bug when adding new
cache entries that don't match any unresolved entries.

The unres queue is searched for a matching entry, which is then resolved.
When no matching entry is present, the iterator points to the head of the
list, but is treated as a matching entry. Use a seperate variable to
indicate that a matching entry was found.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 13:31:29 +02:00
Patrick McHardy
66496d4973 ipv4: ipmr: fix IP_MROUTE_MULTIPLE_TABLES Kconfig dependencies
IP_MROUTE_MULTIPLE_TABLES should depend on IP_MROUTE.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-04-15 13:31:29 +02:00
Eric Dumazet
8728c544a9 net: dev_pick_tx() fix
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.

Bug introduced by commit a4ee3ce3
(net: Use sk_tx_queue_mapping for connected sockets)

Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 01:27:11 -07:00
Eric Dumazet
b0e28f1eff net: netif_rx() must disable preemption
Eric Paris reported netif_rx() is calling smp_processor_id() from
preemptible context, in particular when caller is
ip_dev_loopback_xmit().

RPS commit added this smp_processor_id() call, this patch makes sure
preemption is disabled. rps_get_cpus() wants rcu_read_lock() anyway, we
can dot it a bit earlier.

Reported-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-15 00:14:07 -07:00
Eric Dumazet
4eaa0e3c86 fib: suppress lockdep-RCU false positive in FIB trie.
Followup of commit 634a4b20

Allow tnode_get_child_rcu() to be called either under rcu_read_lock()
protection or with RTNL held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-14 16:13:29 -07:00
David S. Miller
dad1e54b12 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/pcmcia/smc91c92_cs.c
	drivers/net/virtio_net.c
2010-04-14 05:01:33 -07:00
Patrick McHardy
f0ad0860d0 ipv4: ipmr: support multiple tables
This patch adds support for multiple independant multicast routing instances,
named "tables".

Userspace multicast routing daemons can bind to a specific table instance by
issuing a setsockopt call using a new option MRT_TABLE. The table number is
stored in the raw socket data and affects all following ipmr setsockopt(),
getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT)
is created with a default routing rule pointing to it. Newly created pimreg
devices have the table number appended ("pimregX"), with the exception of
devices created in the default table, which are named just "pimreg" for
compatibility reasons.

Packets are directed to a specific table instance using routing rules,
similar to how regular routing rules work. Currently iif, oif and mark
are supported as keys, source and destination addresses could be supported
additionally.

Example usage:

- bind pimd/xorp/... to a specific table:

uint32_t table = 123;
setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table));

- create routing rules directing packets to the new table:

# ip mrule add iif eth0 lookup 123
# ip mrule add oif eth0 lookup 123

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:34 -07:00
Patrick McHardy
0c12295a74 ipv4: ipmr: move mroute data into seperate structure
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:34 -07:00
Patrick McHardy
862465f2e7 ipv4: ipmr: convert struct mfc_cache to struct list_head
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:33 -07:00
Patrick McHardy
d658f8a0e6 ipv4: ipmr: remove net pointer from struct mfc_cache
Now that cache entries in unres_queue don't need to be distinguished by their
network namespace pointer anymore, we can remove it from struct mfc_cache
add pass the namespace as function argument to the functions that need it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:33 -07:00
Patrick McHardy
e258beb22f ipv4: ipmr: move unres_queue and timer to per-namespace data
The unres_queue is currently shared between all namespaces. Following patches
will additionally allow to create multiple multicast routing tables in each
namespace. Having a single shared queue for all these users seems to excessive,
move the queue and the cleanup timer to the per-namespace data to unshare it.

As a side-effect, this fixes a bug in the seq file iteration functions: the
first entry returned is always from the current namespace, entries returned
after that may belong to any namespace.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:32 -07:00
Patrick McHardy
0f87b1dd01 net: fib_rules: decouple address families from real address families
Decouple the address family values used for fib_rules from the real
address families in socket.h. This allows to use fib_rules for
code that is not a real address family without increasing AF_MAX/NPROTO.

Values up to 127 are reserved for real address families and map directly
to the corresponding AF value, values starting from 128 are for other
uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers
for these families.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:31 -07:00
Patrick McHardy
28bb17268b net: fib_rules: set family in fib_rule_hdr centrally
All fib_rules implementations need to set the family in their ->fill()
functions. Since the value is available to the generic fib_nl_fill_rule()
function, set it there.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:30 -07:00
Patrick McHardy
d8a566beaa net: fib_rules: consolidate IPv4 and DECnet ->default_pref() functions.
Both functions are equivalent, consolidate them since a following patch
needs a third implementation for multicast routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 14:49:30 -07:00
Linus Torvalds
465de2ba71 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  smc91c92_cs: define multicast_table as unsigned char
  can: avoids a false warning
  e1000e: stop cleaning when we reach tx_ring->next_to_use
  igb: restrict WoL for 82576 ET2 Quad Port Server Adapter
  virtio_net: missing sg_init_table
  Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"
  iwlwifi: need check for valid qos packet before free
  tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
  udp: fix for unicast RX path optimization
  myri10ge: fix rx_pause in myri10ge_set_pauseparam
  net: corrected documentation for hardware time stamping
  stmmac: use resource_size()
  x.25 attempts to negotiate invalid throughput
  x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
  bridge: Fix IGMP3 report parsing
  cnic: Fix crash during bnx2x MTU change.
  qlcnic: fix set mac addr
  r6040: fix r6040_multicast_list
  vhost-net: fix vq_memory_access_ok error checking
  ath9k: fix double calls to ath_radio_enable
  ...
2010-04-13 11:32:48 -07:00
stephen hemminger
5611551103 dst: don't inline dst_ifdown
The function dst_ifdown is called only two places but in a non-
performance critical code path, there is no reason to inline it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 03:32:44 -07:00
Eric Dumazet
acbbc07145 net: uninline skb_bond_should_drop()
skb_bond_should_drop() is too big to be inlined.

This patch reduces kernel text size, and its compilation time as well
(shrinking include/linux/netdevice.h)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 03:32:42 -07:00
Eric Dumazet
4ffa87012e can: avoids a false warning
At this point optlen == sizeof(sfilter) but some compilers are dumb.

Reported-by: Németh Márton <nm127@freemail.h
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 03:03:14 -07:00
stephen hemminger
8595805aaf IPv6: only notify protocols if address is compeletely gone
The notifier for address down should only be called if address is completely
gone, not just being marked as tentative on link transistion. The code
in net-next would case bonding/sctp/s390 to see address disappear on link
down, but they would never see it reappear on link up.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:28 -07:00
stephen hemminger
d1f84c63a4 ipv6: additional ref count for hash list unnecessary
Since an address in hash list has to already have a ref count,
no additional ref count is needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:28 -07:00
stephen hemminger
27bdb2abcc IPv6: keep tentative addresses in hash table
When link goes down, want address to be preserved but in a tentative
state, therefore it has to stay in hash list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:27 -07:00
stephen hemminger
93fa159abe IPv6: keep route for tentative address
Recent changes preserve IPv6 address when link goes down (good).
But would cause address to point to dead dst entry (bad).
The simplest fix is to just not delete route if address is
being held for later use.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 02:29:27 -07:00
Eric Dumazet
b6c6712a42 net: sk_dst_cache RCUification
With latest CONFIG_PROVE_RCU stuff, I felt more comfortable to make this
work.

sk->sk_dst_cache is currently protected by a rwlock (sk_dst_lock)

This rwlock is readlocked for a very small amount of time, and dst
entries are already freed after RCU grace period. This calls for RCU
again :)

This patch converts sk_dst_lock to a spinlock, and use RCU for readers.

__sk_dst_get() is supposed to be called with rcu_read_lock() or if
socket locked by user, so use appropriate rcu_dereference_check()
condition (rcu_read_lock_held() || sock_owned_by_user(sk))

This patch avoids two atomic ops per tx packet on UDP connected sockets,
for example, and permits sk_dst_lock to be much less dirtied.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:41:33 -07:00
Eric Dumazet
7a161ea924 net: Dont use netdev_warn()
Dont use netdev_warn() in dev_cap_txqueue() and get_rps_cpu() so that we
can catch following warnings without crash.

bond0.2240 received packet on queue 6, but number of RX queues is 1
bond0.2240 received packet on queue 11, but number of RX queues is 1

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:41:32 -07:00
Richard Cochran
ed85b565b8 packet: support for TX time stamps on RAW sockets
Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets.
We introduce PACKET_TX_TIMESTAMP for the control message cmsg_type.

Similar support for UDP and CAN sockets was added in commit
51f31cabe3

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-13 01:30:48 -07:00
Linus Torvalds
fedfb947b2 Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  svcrdma: RDMA support not yet compatible with RPC6
2010-04-12 18:34:56 -07:00
David S. Miller
2e8e18ef52 tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
Back in commit 04a0551c87
("loopback: Drop obsolete ip_summed setting") we stopped
setting CHECKSUM_UNNECESSARY in the loopback xmit.

This is because such a setting was a lie since it implies that the
checksum field of the packet is properly filled in.

Instead what happens normally is that CHECKSUM_PARTIAL is set and
skb->csum is calculated as needed.

But this was only happening for TCP data packets (via the
skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
happen for non-data packets like ACKs etc.

Fix this by setting skb->ip_summed in the common non-data packet
constructor.  It already is setting skb->csum to zero.

But this reminds us that we still have things like ip_output.c's
ip_dev_loopback_xmit() which sets skb->ip_summed to the value
CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
valid.  So we'll have to address that at some point too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:13 -07:00
Herbert Xu
bb29624614 inet: Remove unused send_check length argument
inet: Remove unused send_check length argument

This patch removes the unused length argument from the send_check
function in struct inet_connection_sock_af_ops.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:09 -07:00
Herbert Xu
8ad50d96db tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv6
tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv6

This patch moves the common code between tcp_v6_send_check and
tcp_v6_gso_send_check into a new function __tcp_v6_send_check.

It then uses the new function in tcp_v6_send_synack as well as
tcp_v6_send_response so that they handle CHECKSUM_PARTIAL properly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:08 -07:00
Herbert Xu
419f9f8960 tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4
tcp: Handle CHECKSUM_PARTIAL for SYNACK packets for IPv4

This patch moves the common code between tcp_v4_send_check and
tcp_v4_gso_send_check into a new function __tcp_v4_send_check.

It then uses the new function in tcp_v4_send_synack so that it
handles CHECKSUM_PARTIAL properly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tested-by: Yinghai <yinghai.lu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 15:29:08 -07:00
David S. Miller
871039f02f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/stmmac/stmmac_main.c
	drivers/net/wireless/wl12xx/wl1271_cmd.c
	drivers/net/wireless/wl12xx/wl1271_main.c
	drivers/net/wireless/wl12xx/wl1271_spi.c
	net/core/ethtool.c
	net/mac80211/scan.c
2010-04-11 14:53:53 -07:00
David S. Miller
4a1032faac Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-04-11 02:44:30 -07:00
David S. Miller
ae4e8d63b5 Revert "tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb"
This reverts commit 2626419ad5.

It causes regressions for people with IGB cards.  Connection
requests don't complete etc.  The true cause of the issue is
still not known, but we should sort this out in net-next-2.6
not net-2.6

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-11 02:40:49 -07:00
Teemu Paasikivi
68dd5b7a45 mac80211: check whether scan is in progress before queueing scan_work
As scan_work is queued from work_work it needs to be checked if scan
has been started during execution of work_work. Otherwise, when hw
scan is used, the stack gets error about hw being busy with ongoing
scan. This causes the stack to abort scan without notifying the driver
about it. This leads to a situation where the hw is scanning and the stack
thinks it's not. Then when the scan finishes, the stack will complain by
warnings.

Signed-off-by: Teemu Paasikivi <ext-teemu.3.paasikivi@nokia.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-09 13:43:11 -04:00
Luis R. Rodriguez
c15cf5fcf9 mac80211: fix typo for LDPC capability
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-09 13:43:09 -04:00
Zhu Yi
39184b151c mac80211: delay skb linearising in rx decryption
We delay the skb linearising in ieee80211_rx_h_decrypt so that
frames do not require software decryption are not linearized. We
are safe to do this because ieee80211_get_mmie_keyidx() only
requires to touch nonlinear data for management frames, which are
already linearized before getting here.

Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-09 13:43:09 -04:00
Johannes Berg
b5878a2dc5 mac80211: enhance tracing
Enhance tracing by adding tracing for a variety of
callbacks that the drivers call, and also for
internal calls (currently limited to queue status).
This can aid debugging what is going on in mac80211
in interaction with drivers, since we can now see
what drivers call and not just what mac80211 calls
in the driver.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-08 15:24:13 -04:00
Javier Cardona
97ad9139fd mac80211: Moved mesh action codes to a more visible location
Grouped mesh action codes together with the other action codes in
ieee80211.h.

Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-08 15:24:07 -04:00
David S. Miller
2626419ad5 tcp: Set CHECKSUM_UNNECESSARY in tcp_init_nondata_skb
Back in commit 04a0551c87
("loopback: Drop obsolete ip_summed setting") we stopped
setting CHECKSUM_UNNECESSARY in the loopback xmit.

This is because such a setting was a lie since it implies that the
checksum field of the packet is properly filled in.

Instead what happens normally is that CHECKSUM_PARTIAL is set and
skb->csum is calculated as needed.

But this was only happening for TCP data packets (via the
skb->ip_summed assignment done in tcp_sendmsg()).  It doesn't
happen for non-data packets like ACKs etc.

Fix this by setting skb->ip_summed in the common non-data packet
constructor.  It already is setting skb->csum to zero.

But this reminds us that we still have things like ip_output.c's
ip_dev_loopback_xmit() which sets skb->ip_summed to the value
CHECKSUM_UNNECESSARY, which Herbert's patch teaches us is not
valid.  So we'll have to address that at some point too.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08 11:32:30 -07:00
Jorge Boncompte [DTI2]
1223c67c09 udp: fix for unicast RX path optimization
Commits 5051ebd275 and
5051ebd275 ("ipv[46]: udp: optimize unicast RX
path") broke some programs.

	After upgrading a L2TP server to 2.6.33 it started to fail, tunnels going up an
down, after the 10th tunnel came up. My modified rp-l2tp uses a global
unconnected socket bound to (INADDR_ANY, 1701) and one connected socket per
tunnel after parameter negotiation.

	After ten sockets were open and due to mixed parameters to
udp[46]_lib_lookup2() kernel started to drop packets.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-08 11:29:13 -07:00
John W. Linville
0f2df9eac7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into merge
Conflicts:
	Documentation/feature-removal-schedule.txt
	drivers/net/wireless/ath/ath5k/phy.c
	drivers/net/wireless/iwlwifi/iwl-4965.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl-core.c
	drivers/net/wireless/iwlwifi/iwl-core.h
	drivers/net/wireless/iwlwifi/iwl-tx.c
2010-04-08 13:34:54 -04:00
chavey
97f8aefbbf net: fix ethtool coding style errors and warnings
Fix coding style errors and warnings output while running checkpatch.pl
on the files net/core/ethtool.c and include/linux/ethtool.h

Signed-off-by: chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:54:42 -07:00
John Hughes
ddd0451fc8 x.25 attempts to negotiate invalid throughput
The current X.25 code has some bugs in throughput negotiation:

   1. It does negotiation in all cases, usually there is no need
   2. It incorrectly attempts to negotiate the throughput class in one
      direction only.  There are separate throughput classes for input
      and output and if either is negotiated both mist be negotiates.

This is bug https://bugzilla.kernel.org/show_bug.cgi?id=15681

This bug was first reported by Daniel Ferenci to the linux-x25 mailing
list on 6/8/2004, but is still present.

The current (2.6.34) x.25 code doesn't seem to know that the X.25
throughput facility includes two values, one for the required
throughput outbound, one for inbound.

This causes it to attempt to negotiate throughput 0x0A, which is
throughput 9600 inbound and the illegal value "0" for inbound
throughput.

Because of this some X.25 devices (e.g. Cisco 1600) refuse to connect
to Linux X.25.

The following patch fixes this behaviour.  Unless the user specifies a
required throughput it does not attempt to negotiate.  If the user
does not specify a throughput it accepts the suggestion of the remote
X.25 system.  If the user requests a throughput then it validates both
the input and output throughputs and correctly negotiates them with
the remote end.

Signed-off-by: John Hughes <john@calva.com>
Tested-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:33:02 -07:00
John Hughes
f5eb917b86 x25: Patch to fix bug 15678 - x25 accesses fields beyond end of packet.
Here is a patch to stop X.25 examining fields beyond the end of the packet.

For example, when a simple CALL ACCEPTED was received:

	10 10 0f

x25_parse_facilities was attempting to decode the FACILITIES field, but this
packet contains no facilities field.

Signed-off-by: John Hughes <john@calva.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:29:25 -07:00
Herbert Xu
fd218cf955 bridge: Fix IGMP3 report parsing
The IGMP3 report parsing is looking at the wrong address for
group records.  This patch fixes it.

Reported-by: Banyeer <banyeer@yahoo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 21:20:47 -07:00
Eric Dumazet
298b9e44be net: include linux/proc_fs.h in dev_addr_lists.c
As pointed by Randy Dunlap, we must include linux/proc_fs.h in
net/core/dev_addr_lists.c, regardless of CONFIG_PROC_FS

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>, 
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 16:46:36 -07:00
David S. Miller
005c93b5d8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-04-07 16:41:03 -07:00
Johannes Berg
8c11e4ab09 mac80211: fix paged RX crypto
WEP crypto was broken, but upon finding the problem
it is evident that other things were broken by the
paged RX patch as well.

To fix it, for now move the linearising in front.
This means that we linearise all frames, which is
not at all what we want, but at least it fixes the
problem for now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 16:26:25 -04:00
Johannes Berg
54297e4d60 mac80211: fix some RX aggregation locking
A few places in mac80211 do not currently acquire
the sta lock for RX aggregation, but they should.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:06 -04:00
Johannes Berg
098a607091 mac80211: clean up/fix aggregation code
The aggregation code has a number of quirks, like
inventing an unneeded WLAN_BACK_TIMER value and
leaking memory under certain circumstances during
station destruction. Fix these issues by using
the regular aggregation session teardown code and
blocking new aggregation sessions, all before the
station is really destructed.

As a side effect, this gets rid of the long code
block to destroy aggregation safely.

Additionally, rename tid_state_rx which can only
have the values IDLE and OPERATIONAL to
tid_active_rx to make it easier to understand
that there is no bitwise stuff going on on the
RX side -- the TX side remains because it needs
to keep track of the driver and peer states.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:05 -04:00
Johannes Berg
618f356b95 mac80211: rename WLAN_STA_SUSPEND to WLAN_STA_BLOCK_BA
I want to use it during station destruction as well
so rename it to WLAN_STA_BLOCK_BA which is also the
only use of it now.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:04 -04:00
Johannes Berg
66b0470aee mac80211: remove ieee80211_sta_stop_rx_ba_session
All callers of ieee80211_sta_stop_rx_ba_session can
just call __ieee80211_stop_rx_ba_session instead
because they already have the station struct, so do
that and remove ieee80211_sta_stop_rx_ba_session.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:03 -04:00
Johannes Berg
2b43ae6daf mac80211: remove irq disabling for sta lock
All other places except one in the TX path, which
has BHs disabled, and it also cannot be locked from
interrupts so disabling IRQs is not necessary.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:01 -04:00
Johannes Berg
e64b379574 mac80211: fix station destruction problem
When a station w/o a key is destroyed, or when
a driver submits work for a station and thereby
references it again, it seems like potentially
we could reference the station structure while
it is being destroyed.

Wait for an RCU grace period to elapse before
finishing destroying the station after we have
removed the station from the driver and from
the hash table etc., even in the case where no
key is associated with the station.

Also, there's no point in deleting the plink
timer here since it'll be properly deleted just
a bit later.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:38:00 -04:00
Jouni Malinen
d5cdfacb35 cfg80211: Add local-state-change-only auth/deauth/disassoc
cfg80211 is quite strict on allowing authentication and association
commands only in certain states. In order to meet these requirements,
user space applications may need to clear authentication or
association state in some cases. Currently, this can be done with
deauth/disassoc command, but that ends up sending out Deauthentication
or Disassociation frame unnecessarily. Add a new nl80211 attribute to
allow this sending of the frame be skipped, but with all other
deauth/disassoc operations being completed.

Similar state change is also needed for IEEE 802.11r FT protocol in
the FT-over-DS case which does not use Authentication frame exchange
in a transition to another BSS. For this to work with cfg80211, an
authentication entry needs to be created for the target BSS without
sending out an Authentication frame. The nl80211 authentication
command can be used for this purpose, too, with the new attribute to
indicate that the command is only for changing local state. This
enables wpa_supplicant to complete FT-over-DS transition successfully.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-07 14:37:56 -04:00
Timo Teräs
8e4795605d flow: delayed deletion of flow cache entries
Speed up lookups by freeing flow cache entries later. After
virtualizing flow cache entry operations, the flow cache may now
end up calling policy or bundle destructor which can be slowish.

As gc_list is more effective with double linked list, the flow cache
is converted to use common hlist and list macroes where appropriate.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:20 -07:00
Timo Teräs
285ead175c xfrm: remove policy garbage collection
Policies are now properly reference counted and destroyed from
all code paths. The delayed gc is just an overhead now and can
be removed.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:19 -07:00
Timo Teräs
80c802f307 xfrm: cache bundles instead of policies for outgoing flows
__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:19 -07:00
Timo Teräs
fe1a5f031e flow: virtualize flow cache entry methods
This allows to validate the cached object before returning it.
It also allows to destruct object properly, if the last reference
was held in flow cache. This is also a prepartion for caching
bundles in the flow cache.

In return for virtualizing the methods, we save on:
- not having to regenerate the whole flow cache on policy removal:
  each flow matching a killed policy gets refreshed as the getter
  function notices it smartly.
- we do not have to call flow_cache_flush from policy gc, since the
  flow cache now properly deletes the object if it had any references

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-07 03:43:18 -07:00
David S. Miller
4a35ecf8bf Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/via-velocity.c
	drivers/net/wireless/iwlwifi/iwl-agn.c
2010-04-06 23:53:30 -07:00
Hagen Paul Pfeifer
842509b859 socket: remove duplicate declaration of struct timespec
struct timespec ts was alreay defined. Reuse the previously
defined one and reduce the memory footprint on the stack by
16 bytes.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 19:50:20 -07:00
Jon Paul Maloy
c6537d6742 TIPC: Updated topology subscription protocol according to latest spec
This patch makes it explicit in the API that all fields in subscriptions and events exchanged with the Topology Server must be in
network byte order.
It also ensures that all fields of a subscription are compared when cancelling a subscription, in order to avoid inadvertent
cancelling of the wrong subscription.
Finally, the tipc module version is updated to 2.0.0, to reflect the API change.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 19:50:19 -07:00
Jouni Malinen
d211e90e28 mac80211: Fix robust management frame handling (MFP)
Commit e34e09401ee9888dd662b2fca5d607794a56daf2 incorrectly removed
use of ieee80211_has_protected() from the management frame case and in
practice, made this validation drop all Action frames when MFP is
enabled. This should have only been done for frames with Protected
field set to zero.

Signed-off-by: Jouni Malinen <j@w1.fi>
Cc: stable@kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 16:49:33 -04:00
Johannes Berg
0379185b6c mac80211: annotate station rcu dereferences
The new RCU lockdep support warns about these
in some contexts -- make it aware of the locks
used to protect all this. Different locks are
used in different contexts which unfortunately
means we can't get perfect checking.

Also remove rcu_dereference() from two places
that don't actually dereference the pointers.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:30 -04:00
Javier Cardona
1cb561f837 mac80211: Handle mesh action frames in ieee80211_rx_h_action
This fixes the problem introduced in commit
8404080568 which broke mesh peer link establishment.

changes:
v2 	Added missing break (Johannes)
v3 	Broke original patch into two (Johannes)

Signed-off-by: Javier Cardona <javier@cozybit.com>
Cc: stable@kernel.org
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-06 15:53:28 -04:00
Linus Torvalds
cb4361c1dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (37 commits)
  smc91c92_cs: fix the problem of "Unable to find hardware address"
  r8169: clean up my printk uglyness
  net: Hook up cxgb4 to Kconfig and Makefile
  cxgb4: Add main driver file and driver Makefile
  cxgb4: Add remaining driver headers and L2T management
  cxgb4: Add packet queues and packet DMA code
  cxgb4: Add HW and FW support code
  cxgb4: Add register, message, and FW definitions
  netlabel: Fix several rcu_dereference() calls used without RCU read locks
  bonding: fix potential deadlock in bond_uninit()
  net: check the length of the socket address passed to connect(2)
  stmmac: add documentation for the driver.
  stmmac: fix kconfig for crc32 build error
  be2net: fix bug in vlan rx path for big endian architecture
  be2net: fix flashing on big endian architectures
  be2net: fix a bug in flashing the redboot section
  bonding: bond_xmit_roundrobin() fix
  drivers/net: Add missing unlock
  net: gianfar - align BD ring size console messages
  net: gianfar - initialize per-queue statistics
  ...
2010-04-06 08:34:06 -07:00
YOSHIFUJI Hideaki / 吉藤英明
2f787b0b76 mac80211: Ensure initializing private mc_list in prepare_multicast().
Fix kernel panic by NULL pointer dereference in the context of
ieee80211_ops->prepare_multicast().

This bug was introduced by commit 22bedad3c.. ("net: convert
multicast list to list_head").

Call __hw_addr_init() in ieee80211_alloc_hw() to initialize
list_head of private device multicast list, like we do in
bond_init().

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-06 00:12:30 -07:00
Eric Dumazet
e4008276fd net: Add a missing local_irq_enable()
As noticed by Changli Gao, we must call local_irq_enable() after
rps_unlock()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-05 15:42:39 -07:00
Tom Herbert
5a6d234e73 rps: fixed missed rps_unlock
Fix spin_unlock_irq which needs to be rps_unlock.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-05 14:37:55 -07:00
Linus Torvalds
749d229761 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: saving negative to unsigned char
  9p: return on mutex_lock_interruptible()
  9p: Creating files with names too long should fail with ENAMETOOLONG.
  9p: Make sure we are able to clunk the cached fid on umount
  9p: drop nlink remove
  fs/9p: Clunk the fid resulting from partial walk of the name
  9p: documentation update
  9p: Fix setting of protocol flags in v9fs_session_info structure.
2010-04-05 13:42:54 -07:00
Dan Carpenter
3dc9fef67f 9p: saving negative to unsigned char
Saving -EINVAL as unsigned char truncates the high bits and changes it
into 234 instead of -22.  This breaks the test for "if (ret == -EINVAL)"
in parse_opts().

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-04-05 14:37:28 -05:00