Commit Graph

26077 Commits

Author SHA1 Message Date
Vitaly E. Lavrov
32263dd1b4 netfilter: xt_hashlimit: fix namespace destroy path
recent_net_exit() is called before recent_mt_destroy() in the
destroy path of network namespaces. Make sure there are no entries
in the parent proc entry xt_recent before removing it.

Signed-off-by: Vitaly E. Lavrov <lve@guap.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26 18:14:48 +01:00
Vitaly E. Lavrov
665e205c16 netfilter: xt_recent: fix namespace destroy path
recent_net_exit() is called before recent_mt_destroy() in the
destroy path of network namespaces. Make sure there are no entries
in the parent proc entry xt_recent before removing it.

Signed-off-by: Vitaly E. Lavrov <lve@guap.ru>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26 18:14:48 +01:00
Pablo Neira Ayuso
09181842b0 netfilter: xt_hashlimit: fix race that results in duplicated entries
Two packets may race to create the same entry in the hashtable,
double check if this packet lost race. This double checking only
happens in the path of the packet that creates the hashtable for
first time.

Note that, with this patch, no packet drops occur if the race happens.

Reported-by: Feng Gao <gfree.wind@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-26 18:14:44 +01:00
Pablo Neira Ayuso
10db9069eb netfilter: xt_CT: recover NOTRACK target support
Florian Westphal reported that the removal of the NOTRACK target
(9655050 netfilter: remove xt_NOTRACK) is breaking some existing
setups.

That removal was scheduled for removal since long time ago as
described in Documentation/feature-removal-schedule.txt

What:  xt_NOTRACK
Files: net/netfilter/xt_NOTRACK.c
When:  April 2011
Why:   Superseded by xt_CT

Still, people may have not notice / may have decided to stick to an
old iptables version. I agree with him in that some more conservative
approach by spotting some printk to warn users for some time is less
agressive.

Current iptables 1.4.16.3 already contains the aliasing support
that makes it point to the CT target, so upgrading would fix it.
Still, the policy so far has been to avoid pushing our users to
upgrade.

As a solution, this patch recovers the NOTRACK target inside the CT
target and it now spots a warning.

Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-24 12:55:09 +01:00
Pablo Neira Ayuso
e035edd16e netfilter: nfnetlink_log: fix possible compilation issue due to missing include
In (0c36b48 netfilter: nfnetlink_log: fix mac address for 6in4 tunnels)
the include file that defines ARPD_SIT was missing. This passed unnoticed
during my tests (I did not hit this problem here).

net/netfilter/nfnetlink_log.c: In function '__build_packet_message':
net/netfilter/nfnetlink_log.c:494:25: error: 'ARPHRD_SIT' undeclared (first use in this function)
net/netfilter/nfnetlink_log.c:494:25: note: each undeclared identifier is reported only once for
+each function it appears in

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-17 01:16:17 +01:00
Pablo Neira Ayuso
252b3e8c1b netfilter: xt_CT: fix crash while destroy ct templates
In (d871bef netfilter: ctnetlink: dump entries from the dying and
unconfirmed lists), we assume that all conntrack objects are
inserted in any of the existing lists. However, template conntrack
objects were not. This results in hitting BUG_ON in the
destroy_conntrack path while removing a rule that uses the CT target.

This patch fixes the situation by adding the template lists, which
is where template conntrack objects reside now.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:44:12 +01:00
Bob Hockney
0c36b48b36 netfilter: nfnetlink_log: fix mac address for 6in4 tunnels
For tunnelled ipv6in4 packets, the LOG target (xt_LOG.c) adjusts
the start of the mac field to start at the ethernet header instead
of the ipv4 header for the tunnel. This patch conforms what is
passed by the NFLOG target through nfnetlink to what the LOG target
does. Code borrowed from xt_LOG.c.

Signed-off-by: Bob Hockney <bhockney@ix.netcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:41:31 +01:00
Haibo Xi
97cf00e93c netfilter: nf_ct_reasm: fix conntrack reassembly expire code
Commit b836c99fd6 (ipv6: unify conntrack reassembly expire
code with standard one) use the standard IPv6 reassembly
code(ip6_expire_frag_queue) to handle conntrack reassembly expire.

In ip6_expire_frag_queue, it invoke dev_get_by_index_rcu to get
which device received this expired packet.so we must save ifindex
when NF_conntrack get this packet.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 1/2 fragmented
IPv6 packet.

Signed-off-by: Haibo Xi <haibbo@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:41:25 +01:00
Florent Fourcot
d7a769ff0e netfilter: nf_conntrack_ipv6: fix comment for packets without data
Remove ambiguity of double negation.

Signed-off-by: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:28:31 +01:00
Andrew Collins
c65ef8dc7b netfilter: nf_nat: Also handle non-ESTABLISHED routing changes in MASQUERADE
Since (a0ecb85 netfilter: nf_nat: Handle routing changes in MASQUERADE
target), the MASQUERADE target handles routing changes which affect
the output interface of a connection, but only for ESTABLISHED
connections.  It is also possible for NEW connections which
already have a conntrack entry to be affected by routing changes.

This adds a check to drop entries in the NEW+conntrack state
when the oif has changed.

Signed-off-by: Andrew Collins <bsderandrew@gmail.com>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:28:30 +01:00
Mukund Jampala
c6f408996c netfilter: ip[6]t_REJECT: fix wrong transport header pointer in TCP reset
The problem occurs when iptables constructs the tcp reset packet.
It doesn't initialize the pointer to the tcp header within the skb.
When the skb is passed to the ixgbe driver for transmit, the ixgbe
driver attempts to access the tcp header and crashes.
Currently, other drivers (such as our 1G e1000e or igb drivers) don't
access the tcp header on transmit unless the TSO option is turned on.

<1>BUG: unable to handle kernel NULL pointer dereference at 0000000d
<1>IP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>*pdpt = 0000000085e5d001 *pde = 0000000000000000
<0>Oops: 0000 [#1] SMP
[...]
<4>Pid: 0, comm: swapper Tainted: P            2.6.35.12 #1 Greencity/Thurley
<4>EIP: 0060:[<d081621c>] EFLAGS: 00010246 CPU: 16
<4>EIP is at ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe]
<4>EAX: c7628820 EBX: 00000007 ECX: 00000000 EDX: 00000000
<4>ESI: 00000008 EDI: c6882180 EBP: dfc6b000 ESP: ced95c48
<4> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
<0>Process swapper (pid: 0, ti=ced94000 task=ced73bd0 task.ti=ced94000)
<0>Stack:
<4> cbec7418 c779e0d8 c77cc888 c77cc8a8 0903010a 00000000 c77c0008 00000002
<4><0> cd4997c0 00000010 dfc6b000 00000000 d0d176c9 c77cc8d8 c6882180 cbec7318
<4><0> 00000004 00000004 cbec7230 cbec7110 00000000 cbec70c0 c779e000 00000002
<0>Call Trace:
<4> [<d0d176c9>] ? 0xd0d176c9
<4> [<d0d18a4d>] ? 0xd0d18a4d
<4> [<411e243e>] ? dev_hard_start_xmit+0x218/0x2d7
<4> [<411f03d7>] ? sch_direct_xmit+0x4b/0x114
<4> [<411f056a>] ? __qdisc_run+0xca/0xe0
<4> [<411e28b0>] ? dev_queue_xmit+0x2d1/0x3d0
<4> [<411e8120>] ? neigh_resolve_output+0x1c5/0x20f
<4> [<411e94a1>] ? neigh_update+0x29c/0x330
<4> [<4121cf29>] ? arp_process+0x49c/0x4cd
<4> [<411f80c9>] ? nf_hook_slow+0x3f/0xac
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<4121c6d5>] ? T.901+0x38/0x3b
<4> [<4121c918>] ? arp_rcv+0xa3/0xb4
<4> [<4121ca8d>] ? arp_process+0x0/0x4cd
<4> [<411e1173>] ? __netif_receive_skb+0x32b/0x346
<4> [<411e19e1>] ? netif_receive_skb+0x5a/0x5f
<4> [<411e1ea9>] ? napi_skb_finish+0x1b/0x30
<4> [<d0816eb4>] ? ixgbe_xmit_frame_ring+0x1564/0x2260 [ixgbe]
<4> [<41013468>] ? lapic_next_event+0x13/0x16
<4> [<410429b2>] ? clockevents_program_event+0xd2/0xe4
<4> [<411e1b03>] ? net_rx_action+0x55/0x127
<4> [<4102da1a>] ? __do_softirq+0x77/0xeb
<4> [<4102dab1>] ? do_softirq+0x23/0x27
<4> [<41003a67>] ? do_IRQ+0x7d/0x8e
<4> [<41002a69>] ? common_interrupt+0x29/0x30
<4> [<41007bcf>] ? mwait_idle+0x48/0x4d
<4> [<4100193b>] ? cpu_idle+0x37/0x4c
<0>Code: df 09 d7 0f 94 c2 0f b6 d2 e9 e7 fb ff ff 31 db 31 c0 e9 38
ff ff ff 80 78 06 06 0f 85 3e fb ff ff 8b 7c 24 38 8b 8f b8 00 00 00
<0f> b6 51 0d f6 c2 01 0f 85 27 fb ff ff 80 e2 02 75 0d 8b 6c 24
<0>EIP: [<d081621c>] ixgbe_xmit_frame_ring+0x8cc/0x2260 [ixgbe] SS:ESP

Signed-off-by: Mukund Jampala <jbmukund@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-16 23:27:35 +01:00
Simon Arlott
df48419107 ipv6: Fix Makefile offload objects
The following commit breaks IPv6 TCP transmission for me:
	Commit 75fe83c322
	Author: Vlad Yasevich <vyasevic@redhat.com>
	Date:   Fri Nov 16 09:41:21 2012 +0000
	ipv6: Preserve ipv6 functionality needed by NET

This patch fixes the typo "ipv6_offload" which should be
"ipv6-offload".

I don't know why not including the offload modules should
break TCP. Disabling all offload options on the NIC didn't
help. Outgoing pulseaudio traffic kept stalling.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-16 09:15:53 -08:00
Daniel Borkmann
4cb9d6eaf8 sctp: jsctp_sf_eat_sack: fix jprobes function signature mismatch
Commit 24cb81a6a (sctp: Push struct net down into all of the
state machine functions) introduced the net structure into all
state machine functions, but jsctp_sf_eat_sack was not updated,
hence when SCTP association probing is enabled in the kernel,
any simple SCTP client/server program from userspace will panic
the kernel.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:39 -08:00
Amerigo Wang
ccb1c31a7a bridge: add flags to distinguish permanent mdb entires
This patch adds a flag to each mdb entry, so that we can distinguish
permanent entries with temporary entries.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:39 -08:00
Neil Horman
0d0863b020 sctp: Change defaults on cookie hmac selection
Recently I posted commit 3c68198e75 which made selection of the cookie hmac
algorithm selectable.  This is all well and good, but Linus noted that it
changes the default config:
http://marc.info/?l=linux-netdev&m=135536629004808&w=2

I've modified the sctp Kconfig file to reflect the recommended way of making
this choice, using the thermal driver example specified, and brought the
defaults back into line with the way they were prior to my origional patch

Also, on Linus' suggestion, re-adding ability to select default 'none' hmac
algorithm, so we don't needlessly bloat the kernel by forcing a non-none
default.  This also led me to note that we won't honor the default none
condition properly because of how sctp_net_init is encoded.  Fix that up as
well.

Tested by myself (allbeit fairly quickly).  All configuration combinations seems
to work soundly.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-15 17:14:38 -08:00
Christoph Paasch
e337e24d66 inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and dccp_v4/6_request_recv_sock
If in either of the above functions inet_csk_route_child_sock() or
__inet_inherit_port() fails, the newsk will not be freed:

unreferenced object 0xffff88022e8a92c0 (size 1592):
  comm "softirq", pid 0, jiffies 4294946244 (age 726.160s)
  hex dump (first 32 bytes):
    0a 01 01 01 0a 01 01 02 00 00 00 00 a7 cc 16 00  ................
    02 00 03 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8153d190>] kmemleak_alloc+0x21/0x3e
    [<ffffffff810ab3e7>] kmem_cache_alloc+0xb5/0xc5
    [<ffffffff8149b65b>] sk_prot_alloc.isra.53+0x2b/0xcd
    [<ffffffff8149b784>] sk_clone_lock+0x16/0x21e
    [<ffffffff814d711a>] inet_csk_clone_lock+0x10/0x7b
    [<ffffffff814ebbc3>] tcp_create_openreq_child+0x21/0x481
    [<ffffffff814e8fa5>] tcp_v4_syn_recv_sock+0x3a/0x23b
    [<ffffffff814ec5ba>] tcp_check_req+0x29f/0x416
    [<ffffffff814e8e10>] tcp_v4_do_rcv+0x161/0x2bc
    [<ffffffff814eb917>] tcp_v4_rcv+0x6c9/0x701
    [<ffffffff814cea9f>] ip_local_deliver_finish+0x70/0xc4
    [<ffffffff814cec20>] ip_local_deliver+0x4e/0x7f
    [<ffffffff814ce9f8>] ip_rcv_finish+0x1fc/0x233
    [<ffffffff814cee68>] ip_rcv+0x217/0x267
    [<ffffffff814a7bbe>] __netif_receive_skb+0x49e/0x553
    [<ffffffff814a7cc3>] netif_receive_skb+0x50/0x82

This happens, because sk_clone_lock initializes sk_refcnt to 2, and thus
a single sock_put() is not enough to free the memory. Additionally, things
like xfrm, memcg, cookie_values,... may have been initialized.
We have to free them properly.

This is fixed by forcing a call to tcp_done(), ending up in
inet_csk_destroy_sock, doing the final sock_put(). tcp_done() is necessary,
because it ends up doing all the cleanup on xfrm, memcg, cookie_values,
xfrm,...

Before calling tcp_done, we have to set the socket to SOCK_DEAD, to
force it entering inet_csk_destroy_sock. To avoid the warning in
inet_csk_destroy_sock, inet_num has to be set to 0.
As inet_csk_destroy_sock does a dec on orphan_count, we first have to
increase it.

Calling tcp_done() allows us to remove the calls to
tcp_clear_xmit_timer() and tcp_cleanup_congestion_control().

A similar approach is taken for dccp by calling dccp_done().

This is in the kernel since 093d282321 (tproxy: fix hash locking issue
when using port redirection in __inet_inherit_port()), thus since
version >= 2.6.37.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Duan Jiong
093d04d42f ipv6: Change skb->data before using icmpv6_notify() to propagate redirect
In function ndisc_redirect_rcv(), the skb->data points to the transport
header, but function icmpv6_notify() need the skb->data points to the
inner IP packet. So before using icmpv6_notify() to propagate redirect,
change skb->data to point the inner IP packet that triggered the sending
of the Redirect, and introduce struct rd_msg to make it easy.

Signed-off-by: Duan Jiong <djduanjiong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Konstantin Khlebnikov
1e9f954516 mac802154: fix destructon ordering for ieee802154 devices
mutex_destroy() must be called before wpan_phy_free(), because it puts the last
reference and frees memory. Catched as overwritten poison in kmalloc-2048.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-zigbee-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:07 -05:00
Ang Way Chuang
8fa45a70ba bridge: remove temporary variable for MLDv2 maximum response code computation
As suggested by Stephen Hemminger, this remove the temporary variable
introduced in commit eca2a43bb0
("bridge: fix icmpv6 endian bug and other sparse warnings")

Signed-off-by: Ang Way Chuang <wcang@sfc.wide.ad.jp>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-14 13:14:06 -05:00
Linus Torvalds
8d9ea7172e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A pile of fixes in response to yesterday's big merge.  The SCTP HMAC
  thing hasn't been addressed yet, I'll take care of that myself if Neil
  and Vlad don't show signs of life by tomorrow.

   1) Use after free of SKB in tuntap code.  Fix by Eric Dumazet,
      reported by Dave Jones.

   2) NFC LLCP code emits annoying kernel log message, triggerable by
      the user.  From Dave Jones.

   3) Fix several endianness bugs noticed by sparse in the bridging
      code, from Stephen Hemminger.

   4) Ipv6 NDISC code doesn't take padding into account properly, fix
      from YOSHIFUJI Hideaki.

   5) Add missing docs to ethtool_flow_ext struct, from Yan Burman."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bridge: fix icmpv6 endian bug and other sparse warnings
  net: ethool: Document struct ethtool_flow_ext
  ndisc: Fix padding error in link-layer address option.
  tuntap: dont use skb after netif_rx_ni(skb)
  nfc: remove noisy message from llcp_sock_sendmsg
2012-12-13 13:20:02 -08:00
Linus Torvalds
fd62c54503 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID subsystem updates from Jiri Kosina:

 1) Support for HID over I2C bus has been added by Benjamin Tissoires.
    ACPI device discovery is still in the works.

 2) Support for Win8 Multitiouch protocol is being added, most work done
    by Benjamin Tissoires as well

 3) EIO/ERESTARTSYS is fixed in hiddev/hidraw, fixes by Andrew Duggan
    and Jiri Kosina

 4) ION iCade driver added by Bastien Nocera

 5) Support for a couple new Roccat devices has been added by Stefan
    Achatz

 6) HID sensor hubs are now auto-detected instead of having to list all
    the VID/PID combinations in the blacklist array

 7) other random fixes and support for new device IDs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (65 commits)
  HID: i2c-hid: add mutex protecting open/close race
  Revert "HID: sensors: add to special driver list"
  HID: sensors: autodetect USB HID sensor hubs
  HID: hidp: fallback to input session properly if hid is blacklisted
  HID: i2c-hid: fix ret_count check
  HID: i2c-hid: fix i2c_hid_get_raw_report count mismatches
  HID: i2c-hid: remove extra .irq field in struct i2c_hid
  HID: i2c-hid: reorder allocation/free of buffers
  HID: i2c-hid: fix memory corruption due to missing hid declaration
  HID: i2c-hid: remove superfluous include
  HID: i2c-hid: remove unneeded test in i2c_hid_remove
  HID: i2c-hid: i2c_hid_get_report may fail
  HID: i2c-hid: also call i2c_hid_free_buffers in i2c_hid_remove
  HID: i2c-hid: fix error messages
  HID: i2c-hid: fix return paths
  HID: i2c-hid: remove unused static declarations
  HID: i2c-hid: fix i2c_hid_dbg macro
  HID: i2c-hid: fix checkpatch.pl warning
  HID: i2c-hid: enhance Kconfig
  HID: i2c-hid: change I2C name
  ...
2012-12-13 12:00:48 -08:00
Linus Torvalds
a2013a13e6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
 "Usual stuff -- comment/printk typo fixes, documentation updates, dead
  code elimination."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  HOWTO: fix double words typo
  x86 mtrr: fix comment typo in mtrr_bp_init
  propagate name change to comments in kernel source
  doc: Update the name of profiling based on sysfs
  treewide: Fix typos in various drivers
  treewide: Fix typos in various Kconfig
  wireless: mwifiex: Fix typo in wireless/mwifiex driver
  messages: i2o: Fix typo in messages/i2o
  scripts/kernel-doc: check that non-void fcts describe their return value
  Kernel-doc: Convention: Use a "Return" section to describe return values
  radeon: Fix typo and copy/paste error in comments
  doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
  various: Fix spelling of "asynchronous" in comments.
  Fix misspellings of "whether" in comments.
  eisa: Fix spelling of "asynchronous".
  various: Fix spelling of "registered" in comments.
  doc: fix quite a few typos within Documentation
  target: iscsi: fix comment typos in target/iscsi drivers
  treewide: fix typo of "suport" in various comments and Kconfig
  treewide: fix typo of "suppport" in various comments
  ...
2012-12-13 12:00:02 -08:00
stephen hemminger
eca2a43bb0 bridge: fix icmpv6 endian bug and other sparse warnings
Fix the warnings reported by sparse on recent bridge multicast
changes. Mostly just rcu annotation issues but in this case
sparse found a real bug! The ICMPv6 mld2 query mrc
values is in network byte order.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
YOSHIFUJI Hideaki / 吉藤英明
7bdc1b4aba ndisc: Fix padding error in link-layer address option.
If a natural number n exists where 2 + data_len <= 8n < 2 + data_len + pad,
post padding is not initialized correctly.

(Un)fortunately, the only type that requires pad is Infiniband,
whose pad is 2 and data_len is 20, and this logical error has not
become obvious, but it is better to fix.

Note that ndisc_opt_addr_space() handles the situation described
above correctly.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:11 -05:00
Dave Jones
026e43def7 nfc: remove noisy message from llcp_sock_sendmsg
This is easily triggerable when fuzz-testing as an unprivileged user.
We could rate-limit it, but given we don't print similar messages
for other protocols, I just removed it.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-13 12:58:10 -05:00
Linus Torvalds
6be35c700f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David Miller:

1) Allow to dump, monitor, and change the bridge multicast database
   using netlink.  From Cong Wang.

2) RFC 5961 TCP blind data injection attack mitigation, from Eric
   Dumazet.

3) Networking user namespace support from Eric W. Biederman.

4) tuntap/virtio-net multiqueue support by Jason Wang.

5) Support for checksum offload of encapsulated packets (basically,
   tunneled traffic can still be checksummed by HW).  From Joseph
   Gasparakis.

6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
   Daniel Borkmann.

7) Bridge port parameters over netlink and BPDU blocking support
   from Stephen Hemminger.

8) Improve data access patterns during inet socket demux by rearranging
   socket layout, from Eric Dumazet.

9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
   Jon Maloy.

10) Update TCP socket hash sizing to be more in line with current day
    realities.  The existing heurstics were choosen a decade ago.
    From Eric Dumazet.

11) Fix races, queue bloat, and excessive wakeups in ATM and
    associated drivers, from Krzysztof Mazur and David Woodhouse.

12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
    in VXLAN driver, from David Stevens.

13) Add "oops_only" mode to netconsole, from Amerigo Wang.

14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
    allow DCB netlink to work on namespaces other than the initial
    namespace.  From John Fastabend.

15) Support PTP in the Tigon3 driver, from Matt Carlson.

16) tun/vhost zero copy fixes and improvements, plus turn it on
    by default, from Michael S. Tsirkin.

17) Support per-association statistics in SCTP, from Michele
    Baldessari.

And many, many, driver updates, cleanups, and improvements.  Too
numerous to mention individually.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  net/mlx4_en: Add support for destination MAC in steering rules
  net/mlx4_en: Use generic etherdevice.h functions.
  net: ethtool: Add destination MAC address to flow steering API
  bridge: add support of adding and deleting mdb entries
  bridge: notify mdb changes via netlink
  ndisc: Unexport ndisc_{build,send}_skb().
  uapi: add missing netconf.h to export list
  pkt_sched: avoid requeues if possible
  solos-pci: fix double-free of TX skb in DMA mode
  bnx2: Fix accidental reversions.
  bna: Driver Version Updated to 3.1.2.1
  bna: Firmware update
  bna: Add RX State
  bna: Rx Page Based Allocation
  bna: TX Intr Coalescing Fix
  bna: Tx and Rx Optimizations
  bna: Code Cleanup and Enhancements
  ath9k: check pdata variable before dereferencing it
  ath5k: RX timestamp is reported at end of frame
  ath9k_htc: RX timestamp is reported at end of frame
  ...
2012-12-12 18:07:07 -08:00
Jiri Kosina
818b930bc1 Merge branches 'for-3.7/upstream-fixes', 'for-3.8/hidraw', 'for-3.8/i2c-hid', 'for-3.8/multitouch', 'for-3.8/roccat', 'for-3.8/sensors' and 'for-3.8/upstream' into for-linus
Conflicts:
	drivers/hid/hid-core.c
2012-12-12 21:41:55 +01:00
Cong Wang
cfd5675435 bridge: add support of adding and deleting mdb entries
This patch implents adding/deleting mdb entries via netlink.
Currently all entries are temp, we probably need a flag to distinguish
permanent entries too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
Cong Wang
37a393bc49 bridge: notify mdb changes via netlink
As Stephen mentioned, we need to monitor the mdb
changes in user-space, so add notifications via netlink too.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 13:02:30 -05:00
YOSHIFUJI Hideaki
fd0ea7dbfa ndisc: Unexport ndisc_{build,send}_skb().
These symbols were exported for bonding device by commit 305d552a
("bonding: send IPv6 neighbor advertisement on failover").

It bacame obsolete by commit 7c899432 ("bonding, ipv4, ipv6, vlan: Handle
NETDEV_BONDING_FAILOVER like NETDEV_NOTIFY_PEERS") and removed by
commit 4f5762ec ("bonding: Remove obsolete source file 'bond_ipv6.c'").

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 12:42:29 -05:00
Linus Torvalds
d206e09036 Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup changes from Tejun Heo:
 "A lot of activities on cgroup side.  The big changes are focused on
  making cgroup hierarchy handling saner.

   - cgroup_rmdir() had peculiar semantics - it allowed cgroup
     destruction to be vetoed by individual controllers and tried to
     drain refcnt synchronously.  The vetoing never worked properly and
     caused good deal of contortions in cgroup.  memcg was the last
     reamining user.  Michal Hocko removed the usage and cgroup_rmdir()
     path has been simplified significantly.  This was done in a
     separate branch so that the memcg people can base further memcg
     changes on top.

   - The above allowed cleaning up cgroup lifecycle management and
     implementation of generic cgroup iterators which are used to
     improve hierarchy support.

   - cgroup_freezer updated to allow migration in and out of a frozen
     cgroup and handle hierarchy.  If a cgroup is frozen, all descendant
     cgroups are frozen.

   - netcls_cgroup and netprio_cgroup updated to handle hierarchy
     properly.

   - Various fixes and cleanups.

   - Two merge commits.  One to pull in memcg and rmdir cleanups (needed
     to build iterators).  The other pulled in cgroup/for-3.7-fixes for
     device_cgroup fixes so that further device_cgroup patches can be
     stacked on top."

Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150a7 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3fe4 ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
  cgroup: update Documentation/cgroups/00-INDEX
  cgroup_rm_file: don't delete the uncreated files
  cgroup: remove subsystem files when remounting cgroup
  cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
  cgroup: warn about broken hierarchies only after css_online
  cgroup: list_del_init() on removed events
  cgroup: fix lockdep warning for event_control
  cgroup: move list add after list head initilization
  netprio_cgroup: allow nesting and inherit config on cgroup creation
  netprio_cgroup: implement netprio[_set]_prio() helpers
  netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
  netprio_cgroup: reimplement priomap expansion
  netprio_cgroup: shorten variable names in extend_netdev_table()
  netprio_cgroup: simplify write_priomap()
  netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
  cgroup: remove obsolete guarantee from cgroup_task_migrate.
  cgroup: add cgroup->id
  cgroup, cpuset: remove cgroup_subsys->post_clone()
  cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
  cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
  ...
2012-12-12 08:18:24 -08:00
Eric Dumazet
1abbe1394a pkt_sched: avoid requeues if possible
With BQL being deployed, we can more likely have following behavior :

We dequeue a packet from qdisc in dequeue_skb(), then we realize target
tx queue is in XOFF state in sch_direct_xmit(), and we have to hold the
skb into gso_skb for later.

This shows in stats (tc -s qdisc dev eth0) as requeues.

Problem of these requeues is that high priority packets can not be
dequeued as long as this (possibly low prio and big TSO packet) is not
removed from gso_skb.

At 1Gbps speed, a full size TSO packet is 500 us of extra latency.

In some cases, we know that all packets dequeued from a qdisc are
for a particular and known txq :

- If device is non multi queue
- For all MQ/MQPRIO slave qdiscs

This patch introduces a new qdisc flag, TCQ_F_ONETXQUEUE to mark
this capability, so that dequeue_skb() is allowed to dequeue a packet
only if the associated txq is not stopped.

This indeed reduce latencies for high prio packets (or improve fairness
with sfq/fq_codel), and almost remove qdisc 'requeues'.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-12 00:16:47 -05:00
Linus Torvalds
c6bd5bcc49 TTY/Serial merge for 3.8-rc1
Here's the big tty/serial tree set of changes for 3.8-rc1.
 
 Contained in here is a bunch more reworks of the tty port layer from Jiri and
 bugfixes from Alan, along with a number of other tty and serial driver updates
 by the various driver authors.
 
 Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the TTY
 layer, which is much appreciated by me.
 
 All of these have been in the linux-next tree for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iEYEABECAAYFAlDHhgwACgkQMUfUDdst+ynI6wCcC+YeBwncnoWHvwLAJOwAZpUL
 bysAn28o780/lOsTzp3P1Qcjvo69nldo
 =hN/g
 -----END PGP SIGNATURE-----

Merge tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull TTY/Serial merge from Greg Kroah-Hartman:
 "Here's the big tty/serial tree set of changes for 3.8-rc1.

  Contained in here is a bunch more reworks of the tty port layer from
  Jiri and bugfixes from Alan, along with a number of other tty and
  serial driver updates by the various driver authors.

  Also, Jiri has been coerced^Wconvinced to be the co-maintainer of the
  TTY layer, which is much appreciated by me.

  All of these have been in the linux-next tree for a while.

  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

Fixed up some trivial conflicts in the staging tree, due to the fwserial
driver having come in both ways (but fixed up a bit in the serial tree),
and the ioctl handling in the dgrp driver having been done slightly
differently (staging tree got that one right, and removed both
TIOCGSOFTCAR and TIOCSSOFTCAR).

* tag 'tty-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (146 commits)
  staging: sb105x: fix potential NULL pointer dereference in mp_chars_in_buffer()
  staging/fwserial: Remove superfluous free
  staging/fwserial: Use WARN_ONCE when port table is corrupted
  staging/fwserial: Destruct embedded tty_port on teardown
  staging/fwserial: Fix build breakage when !CONFIG_BUG
  staging: fwserial: Add TTY-over-Firewire serial driver
  drivers/tty/serial/serial_core.c: clean up HIGH_BITS_OFFSET usage
  staging: dgrp: dgrp_tty.c: Audit the return values of get/put_user()
  staging: dgrp: dgrp_tty.c: Remove the TIOCSSOFTCAR ioctl handler from dgrp driver
  serial: ifx6x60: Add modem power off function in the platform reboot process
  serial: mxs-auart: unmap the scatter list before we copy the data
  serial: mxs-auart: disable the Receive Timeout Interrupt when DMA is enabled
  serial: max310x: Setup missing "can_sleep" field for GPIO
  tty/serial: fix ifx6x60.c declaration warning
  serial: samsung: add devicetree properties for non-Exynos SoCs
  serial: samsung: fix potential soft lockup during uart write
  tty: vt: Remove redundant null check before kfree.
  tty/8250 Add check for pci_ioremap_bar failure
  tty/8250 Add support for Commtech's Fastcom Async-335 and Fastcom Async-PCIe cards
  tty/8250 Add XR17D15x devices to the exar_handle_irq override
  ...
2012-12-11 14:08:47 -08:00
John W. Linville
f9c4d420c1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-12-11 16:24:55 -05:00
John W. Linville
c66cfd5325 Merge branch 'for-john' of git://git.sipsolutions.net/mac80211-next 2012-12-11 16:04:03 -05:00
Eric Dumazet
75be437230 net: gro: avoid double copy in skb_gro_receive()
__copy_skb_header(nskb, p) already copied p->cb[], no need to copy
it again.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 13:44:09 -05:00
Cong Wang
2ce297fc24 bridge: fix seq check in br_mdb_dump()
In case of rehashing, introduce a global variable 'br_mdb_rehash_seq'
which gets increased every time when rehashing, and assign
net->dev_base_seq + br_mdb_rehash_seq to cb->seq.

In theory cb->seq could be wrapped to zero, but this is not
easy to fix, as net->dev_base_seq is not visible inside
br_mdb_rehash(). In practice, this is rare.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 13:44:09 -05:00
Abhijit Pawar
a71258d79e net: remove obsolete simple_strto<foo>
This patch removes the redundant occurences of simple_strto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 12:49:53 -05:00
Eric Dumazet
89c5fa3369 net: gro: dev_gro_receive() cleanup
__napi_gro_receive() is inlined from two call sites for no good reason.

Lets move the prep stuff in a function of its own, called only if/when
needed. This saves 300 bytes on x86 :

# size net/core/dev.o.after net/core/dev.o.before
   text	   data	    bss	    dec	    hex	filename
  51968	   1238	   1040	  54246	   d3e6	net/core/dev.o.before
  51664	   1238	   1040	  53942	   d2b6	net/core/dev.o.after

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-11 12:49:53 -05:00
Johannes Berg
8acbcddb5f minstrel: update stats after processing status
Instead of updating stats before sending a packet,
update them after processing the packet's status.
This makes minstrel in line with minstrel_ht.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-10 22:51:50 +01:00
Johannes Berg
8e3c1b7743 mac80211: a few whitespace fixes
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-10 21:24:02 +01:00
John Fastabend
7c77ab24e3 net: Allow DCBnl to use other namespaces besides init_net
Allow DCB and net namespace to work together. This is useful if you
have containers that are bound to 'phys' interfaces that want to
also manage their DCB attributes.

The net namespace is taken from sock_net(skb->sk) of the netlink skb.

CC: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 14:09:01 -05:00
Abhijit Pawar
4b5511ebc7 net: remove obsolete simple_strto<foo>
This patch replace the obsolete simple_strto<foo> with kstrto<foo>

Signed-off-by: Abhijit Pawar <abhi.c.pawar@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 14:09:00 -05:00
Johannes Berg
1bf3751ec9 ipv4: ip_check_defrag must not modify skb before unsharing
ip_check_defrag() might be called from af_packet within the
RX path where shared SKBs are used, so it must not modify
the input SKB before it has unshared it for defragmentation.
Use skb_copy_bits() to get the IP header and only pull in
everything later.

The same is true for the other caller in macvlan as it is
called from dev->rx_handler which can also get a shared SKB.

Reported-by: Eric Leblond <eric@regit.org>
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:51:44 -05:00
Dan Carpenter
2062cc20d0 bridge: make buffer larger in br_setlink()
We pass IFLA_BRPORT_MAX to nla_parse_nested() so we need
IFLA_BRPORT_MAX + 1 elements.  Also Smatch complains that we read past
the end of the array when in br_set_port_flag() when it's called with
IFLA_BRPORT_FAST_LEAVE.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-10 13:31:30 -05:00
Neal Cardwell
5e1f54201c inet_diag: validate port comparison byte code to prevent unsafe reads
Add logic to verify that a port comparison byte code operation
actually has the second inet_diag_bc_op from which we read the port
for such operations.

Previously the code blindly referenced op[1] without first checking
whether a second inet_diag_bc_op struct could fit there. So a
malicious user could make the kernel read 4 bytes beyond the end of
the bytecode array by claiming to have a whole port comparison byte
code (2 inet_diag_bc_op structs) when in fact the bytecode was not
long enough to hold both.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 19:00:48 -05:00
Neal Cardwell
f67caec906 inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run()
Add logic to check the address family of the user-supplied conditional
and the address family of the connection entry. We now do not do
prefix matching of addresses from different address families (AF_INET
vs AF_INET6), except for the previously existing support for having an
IPv4 prefix match an IPv4-mapped IPv6 address (which this commit
maintains as-is).

This change is needed for two reasons:

(1) The addresses are different lengths, so comparing a 128-bit IPv6
prefix match condition to a 32-bit IPv4 connection address can cause
us to unwittingly walk off the end of the IPv4 address and read
garbage or oops.

(2) The IPv4 and IPv6 address spaces are semantically distinct, so a
simple bit-wise comparison of the prefixes is not meaningful, and
would lead to bogus results (except for the IPv4-mapped IPv6 case,
which this commit maintains).

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
405c005949 inet_diag: validate byte code to prevent oops in inet_diag_bc_run()
Add logic to validate INET_DIAG_BC_S_COND and INET_DIAG_BC_D_COND
operations.

Previously we did not validate the inet_diag_hostcond, address family,
address length, and prefix length. So a malicious user could make the
kernel read beyond the end of the bytecode array by claiming to have a
whole inet_diag_hostcond when the bytecode was not long enough to
contain a whole inet_diag_hostcond of the given address family. Or
they could make the kernel read up to about 27 bytes beyond the end of
a connection address by passing a prefix length that exceeded the
length of addresses of the given family.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Neal Cardwell
1c95df85ca inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 18:59:37 -05:00
Ben Hutchings
65d2897c0f caif_usb: Make the driver name check more efficient
Use the device model to get just the name, rather than using the
ethtool API to get all driver information.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-09 00:34:02 -05:00