Commit Graph

24699 Commits

Author SHA1 Message Date
Def
40a3eb33e3 batman-adv: Fix change mac address of soft iface.
Into function interface_set_mac_addr, the function tt_local_add was
invoked before updating dev->dev_addr. The new MAC address was not
tagged as NoPurge.

Signed-off-by: Def <def@laposte.net>
2012-09-23 23:12:48 +02:00
Eric Dumazet
ab43ed8b74 ipv4: raw: fix icmp_filter()
icmp_filter() should not modify its input, or else its caller
would need to recompute ip_hdr() if skb->head is reallocated.

Use skb_header_pointer() instead of pskb_may_pull() and
change the prototype to make clear both sk and skb are const.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-22 15:35:05 -04:00
John W. Linville
1199992df2 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-09-22 12:19:22 -04:00
Alex Elder
5ce765a540 libceph: only kunmap kmapped pages
In write_partial_msg_pages(), pages need to be kmapped in order to
perform a CRC-32c calculation on them.  As an artifact of the way
this code used to be structured, the kunmap() call was separated
from the kmap() call and both were done conditionally.  But the
conditions under which the kmap() and kunmap() calls were made
differed, so there was a chance a kunmap() call would be done on a
page that had not been mapped.

The symptom of this was tripping a BUG() in kunmap_high() when
pkmap_count[nr] became 0.

Reported-by: Bryan K. Wright <bryan@virginia.edu>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
2012-09-21 20:49:26 -07:00
Zhao Hongjiang
bf5b30b8a4 net: change return values from -EACCES to -EPERM
Change return value from -EACCES to -EPERM when the permission check fails.

Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-21 13:58:08 -04:00
Wei Yongjun
f950c0ecc7 ipv6: fix return value check in fib6_add()
In case of error, the function fib6_add_1() returns ERR_PTR()
or NULL pointer. The ERR_PTR() case check is missing in fib6_add().

dpatch engine is used to generated this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-21 13:43:52 -04:00
Ed Cashin
c0d680e577 net: do not disable sg for packets requiring no checksum
A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE).  Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.

  commit f01a5236bd
  Author: Jesse Gross <jesse@nicira.com>
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming.  Calling it for a
protocol that requires no checksum is inappropriate.

The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets.  Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:

  http://www.spinics.net/lists/linux-mm/msg15184.html

The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:

  https://lkml.org/lkml/2012/8/28/140

The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.

Signed-off-by: Ed Cashin <ecashin@coraid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 22:23:40 -04:00
Mathias Krause
e3ac104d41 xfrm_user: don't copy esn replay window twice for new states
The ESN replay window was already fully initialized in
xfrm_alloc_replay_state_esn(). No need to copy it again.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:40 -04:00
Mathias Krause
ecd7918745 xfrm_user: ensure user supplied esn replay window is valid
The current code fails to ensure that the netlink message actually
contains as many bytes as the header indicates. If a user creates a new
state or updates an existing one but does not supply the bytes for the
whole ESN replay window, the kernel copies random heap bytes into the
replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
netlink attribute. This leads to following issues:

1. The replay window has random bits set confusing the replay handling
   code later on.

2. A malicious user could use this flaw to leak up to ~3.5kB of heap
   memory when she has access to the XFRM netlink interface (requires
   CAP_NET_ADMIN).

Known users of the ESN replay window are strongSwan and Steffen's
iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
uses the interface with a bitmap supplied while the former does not.
strongSwan is therefore prone to run into issue 1.

To fix both issues without breaking existing userland allow using the
XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
fully specified one. For the former case we initialize the in-kernel
bitmap with zero, for the latter we copy the user supplied bitmap. For
state updates the full bitmap must be supplied.

To prevent overflows in the bitmap length calculation the maximum size
of bmp_len is limited to 128 by this patch -- resulting in a maximum
replay window of 4096 packets. This should be sufficient for all real
life scenarios (RFC 4303 recommends a default replay window size of 64).

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Martin Willi <martin@revosec.ch>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:40 -04:00
Mathias Krause
1f86840f89 xfrm_user: fix info leak in copy_to_user_tmpl()
The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.

Initial version of the patch by Brad Spengler.

Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:40 -04:00
Mathias Krause
7b789836f4 xfrm_user: fix info leak in copy_to_user_policy()
The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:39 -04:00
Mathias Krause
f778a63671 xfrm_user: fix info leak in copy_to_user_state()
The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:39 -04:00
Mathias Krause
4c87308bde xfrm_user: fix info leak in copy_to_user_auth()
copy_to_user_auth() fails to initialize the remainder of alg_name and
therefore discloses up to 54 bytes of heap memory via netlink to
userland.

Use strncpy() instead of strcpy() to fill the trailing bytes of alg_name
with null bytes.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 18:08:39 -04:00
Andrey Vagin
bc26ccd8fc tcp: restore rcv_wscale in a repair mode (v2)
rcv_wscale is a symetric parameter with snd_wscale.

Both this parameters are set on a connection handshake.

Without this value a remote window size can not be interpreted correctly,
because a value from a packet should be shifted on rcv_wscale.

And one more thing is that wscale_ok should be set too.

This patch doesn't break a backward compatibility.
If someone uses it in a old scheme, a rcv window
will be restored with the same bug (rcv_wscale = 0).

v2: Save backward compatibility on big-endian system. Before
    the first two bytes were snd_wscale and the second two bytes were
    rcv_wscale. Now snd_wscale is opt_val & 0xFFFF and rcv_wscale >> 16.
    This approach is independent on byte ordering.

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
CC: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 17:49:58 -04:00
Li RongQing
8ea853fd0b net/core: fix comment in skb_try_coalesce
It should be the skb which is not cloned

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 17:29:13 -04:00
Paolo Valente
7126195697 pkt_sched: fix virtual-start-time update in QFQ
If the old timestamps of a class, say cl, are stale when the class
becomes active, then QFQ may assign to cl a much higher start time
than the maximum value allowed. This may happen when QFQ assigns to
the start time of cl the finish time of a group whose classes are
characterized by a higher value of the ratio
max_class_pkt/weight_of_the_class with respect to that of
cl. Inserting a class with a too high start time into the bucket list
corrupts the data structure and may eventually lead to crashes.
This patch limits the maximum start time assigned to a class.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 16:23:53 -04:00
Michal Kubeček
15c041759b tcp: flush DMA queue before sk_wait_data if rcv_wnd is zero
If recv() syscall is called for a TCP socket so that
  - IOAT DMA is used
  - MSG_WAITALL flag is used
  - requested length is bigger than sk_rcvbuf
  - enough data has already arrived to bring rcv_wnd to zero
then when tcp_recvmsg() gets to calling sk_wait_data(), receive
window can be still zero while sk_async_wait_queue exhausts
enough space to keep it zero. As this queue isn't cleaned until
the tcp_service_net_dma() call, sk_wait_data() cannot receive
any data and blocks forever.

If zero receive window and non-empty sk_async_wait_queue is
detected before calling sk_wait_data(), process the queue first.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 16:07:58 -04:00
Linus Lüssing
dbd6b11e15 batman-adv: make batadv_test_bit() return 0 or 1 only
On some architectures test_bit() can return other values than 0 or 1:

With a generic x86 OpenWrt image in a kvm setup (batadv_)test_bit()
frequently returns -1 for me, leading to batadv_iv_ogm_update_seqnos()
wrongly signaling a protected seqno window.

This patch tries to fix this issue by making batadv_test_bit() return 0
or 1 only.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Acked-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-19 15:49:53 -04:00
Luis R. Rodriguez
a85d0d7f34 cfg80211: fix possible circular lock on reg_regdb_search()
When call_crda() is called we kick off a witch hunt search
for the same regulatory domain on our internal regulatory
database and that work gets kicked off on a workqueue, this
is done while the cfg80211_mutex is held. If that workqueue
kicks off it will first lock reg_regdb_search_mutex and
later cfg80211_mutex but to ensure two CPUs will not contend
against cfg80211_mutex the right thing to do is to have the
reg_regdb_search() wait until the cfg80211_mutex is let go.

The lockdep report is pasted below.

cfg80211: Calling CRDA to update world regulatory domain

======================================================
[ INFO: possible circular locking dependency detected ]
3.3.8 #3 Tainted: G           O
-------------------------------------------------------
kworker/0:1/235 is trying to acquire lock:
 (cfg80211_mutex){+.+...}, at: [<816468a4>] set_regdom+0x78c/0x808 [cfg80211]

but task is already holding lock:
 (reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (reg_regdb_search_mutex){+.+...}:
       [<800a8384>] lock_acquire+0x60/0x88
       [<802950a8>] mutex_lock_nested+0x54/0x31c
       [<81645778>] is_world_regdom+0x9f8/0xc74 [cfg80211]

-> #1 (reg_mutex#2){+.+...}:
       [<800a8384>] lock_acquire+0x60/0x88
       [<802950a8>] mutex_lock_nested+0x54/0x31c
       [<8164539c>] is_world_regdom+0x61c/0xc74 [cfg80211]

-> #0 (cfg80211_mutex){+.+...}:
       [<800a77b8>] __lock_acquire+0x10d4/0x17bc
       [<800a8384>] lock_acquire+0x60/0x88
       [<802950a8>] mutex_lock_nested+0x54/0x31c
       [<816468a4>] set_regdom+0x78c/0x808 [cfg80211]

other info that might help us debug this:

Chain exists of:
  cfg80211_mutex --> reg_mutex#2 --> reg_regdb_search_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(reg_regdb_search_mutex);
                               lock(reg_mutex#2);
                               lock(reg_regdb_search_mutex);
  lock(cfg80211_mutex);

 *** DEADLOCK ***

3 locks held by kworker/0:1/235:
 #0:  (events){.+.+..}, at: [<80089a00>] process_one_work+0x230/0x460
 #1:  (reg_regdb_work){+.+...}, at: [<80089a00>] process_one_work+0x230/0x460
 #2:  (reg_regdb_search_mutex){+.+...}, at: [<81646828>] set_regdom+0x710/0x808 [cfg80211]

stack backtrace:
Call Trace:
[<80290fd4>] dump_stack+0x8/0x34
[<80291bc4>] print_circular_bug+0x2ac/0x2d8
[<800a77b8>] __lock_acquire+0x10d4/0x17bc
[<800a8384>] lock_acquire+0x60/0x88
[<802950a8>] mutex_lock_nested+0x54/0x31c
[<816468a4>] set_regdom+0x78c/0x808 [cfg80211]

Reported-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Felix Fietkau <nbd@openwrt.org>
Cc: stable@vger.kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-09-18 20:43:23 -04:00
Vinicius Costa Gomes
78c04c0bf5 Bluetooth: Fix not removing power_off delayed work
For example, when a usb reset is received (I could reproduce it
running something very similar to this[1] in a loop) it could be
that the device is unregistered while the power_off delayed work
is still scheduled to run.

Backtrace:

WARNING: at lib/debugobjects.c:261 debug_print_object+0x7c/0x8d()
Hardware name: To Be Filled By O.E.M.
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x26
Modules linked in: nouveau mxm_wmi btusb wmi bluetooth ttm coretemp drm_kms_helper
Pid: 2114, comm: usb-reset Not tainted 3.5.0bt-next #2
Call Trace:
 [<ffffffff8124cc00>] ? free_obj_work+0x57/0x91
 [<ffffffff81058f88>] warn_slowpath_common+0x7e/0x97
 [<ffffffff81059035>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8124ccb6>] debug_print_object+0x7c/0x8d
 [<ffffffff8106e3ec>] ? __queue_work+0x259/0x259
 [<ffffffff8124d63e>] ? debug_check_no_obj_freed+0x6f/0x1b5
 [<ffffffff8124d667>] debug_check_no_obj_freed+0x98/0x1b5
 [<ffffffffa00aa031>] ? bt_host_release+0x10/0x1e [bluetooth]
 [<ffffffff810fc035>] kfree+0x90/0xe6
 [<ffffffffa00aa031>] bt_host_release+0x10/0x1e [bluetooth]
 [<ffffffff812ec2f9>] device_release+0x4a/0x7e
 [<ffffffff8123ef57>] kobject_release+0x11d/0x154
 [<ffffffff8123ed98>] kobject_put+0x4a/0x4f
 [<ffffffff812ec0d9>] put_device+0x12/0x14
 [<ffffffffa009472b>] hci_free_dev+0x22/0x26 [bluetooth]
 [<ffffffffa0280dd0>] btusb_disconnect+0x96/0x9f [btusb]
 [<ffffffff813581b4>] usb_unbind_interface+0x57/0x106
 [<ffffffff812ef988>] __device_release_driver+0x83/0xd6
 [<ffffffff812ef9fb>] device_release_driver+0x20/0x2d
 [<ffffffff813582a7>] usb_driver_release_interface+0x44/0x7b
 [<ffffffff81358795>] usb_forced_unbind_intf+0x45/0x4e
 [<ffffffff8134f959>] usb_reset_device+0xa6/0x12e
 [<ffffffff8135df86>] usbdev_do_ioctl+0x319/0xe20
 [<ffffffff81203244>] ? avc_has_perm_flags+0xc9/0x12e
 [<ffffffff812031a0>] ? avc_has_perm_flags+0x25/0x12e
 [<ffffffff81050101>] ? do_page_fault+0x31e/0x3a1
 [<ffffffff8135eaa6>] usbdev_ioctl+0x9/0xd
 [<ffffffff811126b1>] vfs_ioctl+0x21/0x34
 [<ffffffff81112f7b>] do_vfs_ioctl+0x408/0x44b
 [<ffffffff81208d45>] ? file_has_perm+0x76/0x81
 [<ffffffff8111300f>] sys_ioctl+0x51/0x76
 [<ffffffff8158db22>] system_call_fastpath+0x16/0x1b

[1] http://cpansearch.perl.org/src/DPAVLIN/Biblio-RFID-0.03/examples/usbreset.c

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 20:13:02 -03:00
Andrei Emeltchenko
aad3d0e343 Bluetooth: Fix freeing uninitialized delayed works
When releasing L2CAP socket which is in BT_CONFIG state l2cap_chan_close
invokes l2cap_send_disconn_req which cancel delayed works which are only
set in BT_CONNECTED state with l2cap_ertm_init. Add state check before
cancelling those works.

...
[ 9668.574372] [21085] l2cap_sock_release: sock cd065200, sk f073e800
[ 9668.574399] [21085] l2cap_sock_shutdown: sock cd065200, sk f073e800
[ 9668.574411] [21085] l2cap_chan_close: chan f073ec00 state BT_CONFIG sk f073e800
[ 9668.574421] [21085] l2cap_send_disconn_req: chan f073ec00 conn ecc16600
[ 9668.574441] INFO: trying to register non-static key.
[ 9668.574443] the code is fine but needs lockdep annotation.
[ 9668.574446] turning off the locking correctness validator.
[ 9668.574450] Pid: 21085, comm: obex-client Tainted: G           O 3.5.0+ #57
[ 9668.574452] Call Trace:
[ 9668.574463]  [<c10a64b3>] __lock_acquire+0x12e3/0x1700
[ 9668.574468]  [<c10a44fb>] ? trace_hardirqs_on+0xb/0x10
[ 9668.574476]  [<c15e4f60>] ? printk+0x4d/0x4f
[ 9668.574479]  [<c10a6e38>] lock_acquire+0x88/0x130
[ 9668.574487]  [<c1059740>] ? try_to_del_timer_sync+0x60/0x60
[ 9668.574491]  [<c1059790>] del_timer_sync+0x50/0xc0
[ 9668.574495]  [<c1059740>] ? try_to_del_timer_sync+0x60/0x60
[ 9668.574515]  [<f8aa1c23>] l2cap_send_disconn_req+0xe3/0x160 [bluetooth]
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 20:07:04 -03:00
Andrzej Kaczmarek
562fcc246e Bluetooth: mgmt: Fix enabling LE while powered off
When new BT USB adapter is plugged in it's configured while still being powered
off (HCI_AUTO_OFF flag is set), thus Set LE will only set dev_flags but won't
write changes to controller. As a result it's not possible to start device
discovery session on LE controller as it uses interleaved discovery which
requires LE Supported Host flag in extended features.

This patch ensures HCI Write LE Host Supported is sent when Set Powered is
called to power on controller and clear HCI_AUTO_OFF flag.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Cc: stable@vger.kernel.org
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 20:07:03 -03:00
Andrzej Kaczmarek
3d1cbdd6ae Bluetooth: mgmt: Fix enabling SSP while powered off
When new BT USB adapter is plugged in it's configured while still being powered
off (HCI_AUTO_OFF flag is set), thus Set SSP will only set dev_flags but won't
write changes to controller. As a result remote devices won't use Secure Simple
Pairing with our device due to SSP Host Support flag disabled in extended
features and may also reject SSP attempt from our side (with possible fallback
to legacy pairing).

This patch ensures HCI Write Simple Pairing Mode is sent when Set Powered is
called to power on controller and clear HCI_AUTO_OFF flag.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Cc: stable@vger.kernel.org
Acked-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-09-18 20:07:03 -03:00
Li RongQing
433a195480 xfrm: fix a read lock imbalance in make_blackhole
if xfrm_policy_get_afinfo returns 0, it has already released the read
lock, xfrm_policy_put_afinfo should not be called again.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 16:30:15 -04:00
Eric Dumazet
1d57f19539 tcp: fix regression in urgent data handling
Stephan Springl found that commit 1402d36601 "tcp: introduce
tcp_try_coalesce" introduced a regression for rlogin

It turns out problem comes from TCP urgent data handling and
a change in behavior in input path.

rlogin sends two one-byte packets with URG ptr set, and when next data
frame is coalesced, we lack sk_data_ready() calls to wakeup consumer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stephan Springl <springl-k@lar.bfw.de>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 16:26:27 -04:00
Michael S. Tsirkin
0e698bf662 net: fix memory leak on oom with zerocopy
If orphan flags fails, we don't free the skb
on receive, which leaks the skb memory.

Return value was also wrong: netif_receive_skb
is supposed to return NET_RX_DROP, not ENOMEM.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 16:24:00 -04:00
Mathias Krause
c254637225 xfrm_user: return error pointer instead of NULL #2
When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 16:13:46 -04:00
Mathias Krause
864745d291 xfrm_user: return error pointer instead of NULL
When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.

This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 16:13:45 -04:00
Nicolas Dichtel
2c20cbd7e3 ipv6: use DST_* macro to set obselete field
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:04 -04:00
Nicolas Dichtel
6f3118b571 ipv6: use net->rt_genid to check dst validity
IPv6 dst should take care of rt_genid too. When a xfrm policy is inserted or
deleted, all dst should be invalidated.
To force the validation, dst entries should be created with ->obsolete set to
DST_OBSOLETE_FORCE_CHK. This was already the case for all functions calling
ip6_dst_alloc(), except for ip6_rt_copy().

As a consequence, we can remove the specific code in inet6_connection_sock.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Nicolas Dichtel
ee8372dd19 xfrm: invalidate dst on policy insertion/deletion
When a policy is inserted or deleted, all dst should be recalculated.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Nicolas Dichtel
b42664f898 netns: move net->ipv4.rt_genid to net->rt_genid
This commit prepares the use of rt_genid by both IPv4 and IPv6.
Initialization is left in IPv4 part.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:57:03 -04:00
Eric Dumazet
2885da7296 net: rt_cache_flush() cleanup
We dont use jhash anymore since route cache removal,
so we can get rid of get_random_bytes() calls for rt_genid
changes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:54:19 -04:00
Nicolas Dichtel
bafa6d9d89 ipv4/route: arg delay is useless in rt_cache_flush()
Since route cache deletion (89aef8921b), delay is no
more used. Remove it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-18 15:44:34 -04:00
Linus Torvalds
a1362d504e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Use after free and new device IDs in bluetooth from Andre Guedes,
    Yevgeniy Melnichuk, Gustavo Padovan, and Henrik Rydberg.

 2) Fix crashes with short packet lengths and VLAN in pktgen, from
    Nishank Trivedi.

 3) mISDN calls flush_work_sync() with locks held, fix from Karsten
    Keil.

 4) Packet scheduler gred parameters are reported to userspace
    improperly scaled, and WRED idling is not performed correctly.  All
    from David Ward.

 5) Fix TCP socket refcount problem in ipv6, from Julian Anastasov.

 6) ibmveth device has RX queue alignment requirements which are not
    being explicitly met resulting in sporadic failures, fix from
    Santiago Leon.

 7) Netfilter needs to take care when interpreting sockets attached to
    socket buffers, they could be time-wait minisockets.  Fix from Eric
    Dumazet.

 8) sock_edemux() has the same issue as netfilter did in #7 above, fix
    from Eric Dumazet.

 9) Avoid infinite loops in CBQ scheduler with some configurations, from
    Eric Dumazet.

10) Deal with "Reflection scan: an Off-Path Attack on TCP", from Jozsef
    Kadlecsik.

11) SCTP overcharges socket for TX packets, fix from Thomas Graf.

12) CODEL packet scheduler should not reset it's state every time it
    builds a new flow, fix from Eric Dumazet.

13) Fix memory leak in nl80211, from Wei Yongjun.

14) NETROM doesn't check skb_copy_datagram_iovec() return values, from
    Alan Cox.

15) l2tp ethernet was using sizeof(ETH_HLEN) instead of plain ETH_HLEN,
    oops.  From Eric Dumazet.

16) Fix selection of ath9k chips on which PA linearization and AM2PM
    predistoration are used, from Felix Fietkau.

17) Flow steering settings in mlx4 driver need to be validated properly,
    from Hadar Hen Zion.

18) bnx2x doesn't show the correct link duplex setting, from Yaniv
    Rosner.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (75 commits)
  pktgen: fix crash with vlan and packet size less than 46
  bnx2x: Add missing afex code
  bnx2x: fix registers dumped
  bnx2x: correct advertisement of pause capabilities
  bnx2x: display the correct duplex value
  bnx2x: prevent timeouts when using PFC
  bnx2x: fix stats copying logic
  bnx2x: Avoid sending multiple statistics queries
  net: qmi_wwan: call subdriver with control intf only
  net_sched: gred: actually perform idling in WRED mode
  net_sched: gred: fix qave reporting via netlink
  net_sched: gred: eliminate redundant DP prio comparisons
  net_sched: gred: correct comment about qavg calculation in RIO mode
  mISDN: Fix wrong usage of flush_work_sync while holding locks
  netfilter: log: Fix log-level processing
  net-sched: sch_cbq: avoid infinite loop
  net: qmi_wwan: fix Gobi device probing for un2430
  net: fix net/core/sock.c build error
  ixp4xx_hss: fix build failure due to missing linux/module.h inclusion
  caif: move the dereference below the NULL test
  ...
2012-09-14 15:34:07 -07:00
Nishank Trivedi
6af773e786 pktgen: fix crash with vlan and packet size less than 46
If vlan option is being specified in the pktgen and packet size
being requested is less than 46 bytes, despite being illogical
request, pktgen should not crash the kernel.

BUG: unable to handle kernel paging request at ffff88021fb82000
Process kpktgend_0 (pid: 1184, threadinfo ffff880215f1a000, task ffff880218544530)
Call Trace:
[<ffffffffa0637cd2>] ? pktgen_finalize_skb+0x222/0x300 [pktgen]
[<ffffffff814f0084>] ? build_skb+0x34/0x1c0
[<ffffffffa0639b11>] pktgen_thread_worker+0x5d1/0x1790 [pktgen]
[<ffffffffa03ffb10>] ? igb_xmit_frame_ring+0xa30/0xa30 [igb]
[<ffffffff8107ba20>] ? wake_up_bit+0x40/0x40
[<ffffffff8107ba20>] ? wake_up_bit+0x40/0x40
[<ffffffffa0639540>] ? spin+0x240/0x240 [pktgen]
[<ffffffff8107b4e3>] kthread+0x93/0xa0
[<ffffffff81615de4>] kernel_thread_helper+0x4/0x10
[<ffffffff8107b450>] ? flush_kthread_worker+0x80/0x80
[<ffffffff81615de0>] ? gs_change+0x13/0x13

The root cause of why pktgen is not able to handle this case is due
to comparison of signed (datalen) and unsigned data (sizeof), which
eventually passes a huge number to skb_put().

Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 17:10:00 -04:00
David Ward
ba1bf474ea net_sched: gred: actually perform idling in WRED mode
gred_dequeue() and gred_drop() do not seem to get called when the
queue is empty, meaning that we never start idling while in WRED
mode. And since qidlestart is not stored by gred_store_wred_set(),
we would never stop idling while in WRED mode if we ever started.
This messes up the average queue size calculation that influences
packet marking/dropping behavior.

Now, we start WRED mode idling as we are removing the last packet
from the queue. Also we now actually stop WRED mode idling when we
are enqueuing a packet.

Cc: Bruce Osler <brosler@cisco.com>
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 16:10:13 -04:00
David Ward
1fe37b106b net_sched: gred: fix qave reporting via netlink
q->vars.qavg is a Wlog scaled value, but q->backlog is not. In order
to pass q->vars.qavg as the backlog value, we need to un-scale it.
Additionally, the qave value returned via netlink should not be Wlog
scaled, so we need to un-scale the result of red_calc_qavg().

This caused artificially high values for "Average Queue" to be shown
by 'tc -s -d qdisc', but did not affect the actual operation of GRED.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 16:10:13 -04:00
David Ward
c22e464022 net_sched: gred: eliminate redundant DP prio comparisons
Each pair of DPs only needs to be compared once when searching for
a non-unique prio value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 16:10:13 -04:00
David Ward
e29fe837bf net_sched: gred: correct comment about qavg calculation in RIO mode
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 16:10:13 -04:00
David S. Miller
930521695c Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso say:

====================
The following patchset contains four updates for your net tree, they are:

* Fix crash on timewait sockets, since the TCP early demux was added,
  in nfnetlink_log, from Eric Dumazet.

* Fix broken syslog log-level for xt_LOG and ebt_log since printk format was
  converted from <.> to a 2 bytes pattern using ASCII SOH, from Joe Perches.

* Two security fixes for the TCP connection tracking targeting off-path attacks,
  from Jozsef Kadlecsik. The problem was discovered by Jan Wrobel and it is
  documented in: http://mixedbit.org/reflection_scan/reflection_scan.pdf.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13 13:53:06 -04:00
Linus Torvalds
22b4e63ebe NFS client bugfixes for Linux 3.6
- Final (hopefully) fix for the range checking code in NFSv4 getacl. This
   should fix the Oopses being seen when the acl size is close to PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJQUN+KAAoJEGcL54qWCgDyHGcQAKj7MYVDIjhdmsVGGNWXUCnf
 X0LVg/ajh+vjusK+hmquzcJikZqgce5IU5DW4vcFr1X8BgP+R51UVvU0KksByD5H
 ourV2JVCztAQzQ4WWOsZAGqN0tooJUjyEjl4lEiDsQCF4Nk1HWbuCHeYuX74OToZ
 jrgedj0EZ6zb7TOizvbgU/7lI+FKu3Hlw6+u27M9phtSuefJdYSHZHYVMOX81qPh
 k0zgZ4tuLIaDuBB84iCrPwNt9icnevq6cIc+AGluI6xhDw+foPvUaUR+OUI420IZ
 tunNzP2So+nNoyjEiyMVENaCdEyA75XAmmGHTUUdBiVOsMV4HF/TqvTtSsjk2mN1
 FbZVvtjD6srjsQaKdVmqMIZBdhY9LSMLIQVqb4H2rYP6Mwq06WTuyCxf5YhzFfoy
 2tai7JuqBkTAWfKB8ESWywV6Qk/MkUWRAOBO6ksS66gAwpcFDj6nfeAdwaEmoYKc
 uzLUIRZaclPMZf661cs1fWeFV5XOnCL7je4owgTRGs7MHooWHPcC3273fEJqnhFz
 5MkC7nfmUiGcdO1v0mfYTEtMj9Pp9icBoZcVTGn4eZIHzvhhZOx//8LhyBfS+jll
 bKjaLZ1rErvIqwnSGcB7PK2yBYY9P6ZaxWjOrAAncZmiOxfhN0hvCo54jNOr/VZ+
 atsDEAuqSTeK7ouBqyO4
 =e5yE
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Final (hopefully) fix for the range checking code in NFSv4 getacl.
   This should fix the Oopses being seen when the acl size is close to
   PAGE_SIZE.
 - Fix a regression with the legacy binary mount code
 - Fix a regression in the readdir cookieverf initialisation
 - Fix an RPC over UDP regression
 - Ensure that we report all errors in the NFSv4 open code
 - Ensure that fsync() reports all relevant synchronisation errors.

* tag 'nfs-for-3.6-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: fsync() must exit with an error if page writeback failed
  SUNRPC: Fix a UDP transport regression
  NFS: return error from decode_getfh in decode open
  NFSv4: Fix buffer overflow checking in __nfs4_get_acl_uncached
  NFSv4: Fix range checking in __nfs4_get_acl_uncached and __nfs4_proc_set_acl
  NFS: Fix a problem with the legacy binary mount code
  NFS: Fix the initialisation of the readdir 'cookieverf' array
2012-09-13 09:04:13 +08:00
Joe Perches
16af511a66 netfilter: log: Fix log-level processing
auto75914331@hushmail.com reports that iptables does not correctly
output the KERN_<level>.

$IPTABLES -A RULE_0_in  -j LOG  --log-level notice --log-prefix "DENY  in: "

result with linux 3.6-rc5
Sep 12 06:37:29 xxxxx kernel: <5>DENY  in: IN=eth0 OUT= MAC=.......

result with linux 3.5.3 and older:
Sep  9 10:43:01 xxxxx kernel: DENY  in: IN=eth0 OUT= MAC......

commit 04d2c8c83d
("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern")
updated the syslog header style but did not update netfilter uses.

Do so.

Use KERN_SOH and string concatenation instead of "%c" KERN_SOH_ASCII
as suggested by Eric Dumazet.

Signed-off-by: Joe Perches <joe@perches.com>
cc: auto75914331@hushmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-12 17:17:35 +02:00
Eric Dumazet
bdfc87f7d1 net-sched: sch_cbq: avoid infinite loop
Its possible to setup a bad cbq configuration leading to
an infinite loop in cbq_classify()

DEV_OUT=eth0
ICMP="match ip protocol 1 0xff"
U32="protocol ip u32"
DST="match ip dst"
tc qdisc add dev $DEV_OUT root handle 1: cbq avpkt 1000 \
	bandwidth 100mbit
tc class add dev $DEV_OUT parent 1: classid 1:1 cbq \
	rate 512kbit allot 1500 prio 5 bounded isolated
tc filter add dev $DEV_OUT parent 1: prio 3 $U32 \
	$ICMP $DST 192.168.3.234 flowid 1:

Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-11 22:20:43 -04:00
Randy Dunlap
1c463e57b3 net: fix net/core/sock.c build error
Fix net/core/sock.c build error when CONFIG_INET is not enabled:

net/built-in.o: In function `sock_edemux':
(.text+0xd396): undefined reference to `inet_twsk_put'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 16:44:45 -04:00
Wei Yongjun
566f26aa70 caif: move the dereference below the NULL test
The dereference should be moved below the NULL test.

spatch with a semantic match is used to found this.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10 16:13:31 -04:00
Jozsef Kadlecsik
4a70bbfaef netfilter: Validate the sequence number of dataless ACK packets as well
We spare nothing by not validating the sequence number of dataless
ACK packets and enabling it makes harder off-path attacks.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-09 22:13:49 +02:00
Jozsef Kadlecsik
64f509ce71 netfilter: Mark SYN/ACK packets as invalid from original direction
Clients should not send such packets. By accepting them, we open
up a hole by wich ephemeral ports can be discovered in an off-path
attack.

See: "Reflection scan: an Off-Path Attack on TCP" by Jan Wrobel,
http://arxiv.org/abs/1201.2074

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-09 22:13:30 +02:00
Chema Gonzalez
6862234238 net: small bug on rxhash calculation
In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.

For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.

This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)

The patch is co-written with Eric Dumazet <edumazet@google.com>

Signed-off-by: Chema Gonzalez <chema@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-08 18:41:48 -04:00
John W. Linville
777bf135b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
John W. Linville says:

====================
Please pull these fixes intended for 3.6.  There are more commits
here than I would like -- I got a bit behind while I was stalking
Steven Rostedt in San Diego last week...  I'll slow it down after this!

There are a couple of pulls here.  One is from Johannes:

"Please pull (according to the below information) to get a few fixes.

 * a fix to properly disconnect in the driver when authentication or
   association fails
 * a fix to prevent invalid information about mesh paths being reported
   to userspace
 * a memory leak fix in an nl80211 error path"

The other comes via Gustavo:

"A few updates for the 3.6 kernel. There are two btusb patches to add
more supported devices through the new USB_VENDOR_AND_INTEFACE_INFO()
macro and another one that add a new device id for a Sony Vaio laptop,
one fix for a user-after-free and, finally, two patches from Vinicius
to fix a issue in SMP pairing."

Along with those...

Arend van Spriel provides a fix for a use-after-free bug in brcmfmac.

Daniel Drake avoids a hang by not trying to touch the libertas hardware
duing suspend if it is already powered-down.

Felix Fietkau provides a batch of ath9k fixes that adress some
potential problems with power settings, as well as a fix to avoid a
potential interrupt storm.

Gertjan van Wingerde provides a register-width fix for rt2x00, and
a rt2x00 fix to prevent incorrectly detecting the rfkill status.
He also provides a device ID patch.

Hante Meuleman gives us three brcmfmac fixes, one that properly
initializes a command structure, one that fixes a race condition that
could lose usb requests, and one that removes some log spam.

Marc Kleine-Budde offers an rt2x00 fix for a voltage setting on some
specific devices.

Mohammed Shafi Shajakhan sent an ath9k fix to avoid a crash related to
using timers that aren't allocated when 2 wire bluetooth coexistence
hardware is in use.

Sergei Poselenov changes rt2800usb to do some validity checking for
received packets, avoiding crashes on an ARM Soc.

Stone Piao gives us an mwifiex fix for an incorrectly set skb length
value for a command buffer.

All of these are localized to their specific drivers, and relatively
small.  The power-related patches from Felix are bigger than I would
like, but I merged them in consideration of their isolation to ath9k
and the sensitive nature of power settings in wireless devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 14:38:50 -04:00
Eric Dumazet
979402b16c udp: increment UDP_MIB_INERRORS if copy failed
In UDP recvmsg(), we miss an increase of UDP_MIB_INERRORS if the copy
of skb to userspace failed for whatever reason.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:56:00 -04:00
Trond Myklebust
f39c1bfb5a SUNRPC: Fix a UDP transport regression
Commit 43cedbf0e8 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.

Reported-by: Dick Streefland <dick.streefland@altium.nl>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]
2012-09-07 11:43:49 -04:00
Eric Dumazet
0626af3139 netfilter: take care of timewait sockets
Sami Farin reported crashes in xt_LOG because it assumes skb->sk is a
full blown socket.

Since (41063e9 ipv4: Early TCP socket demux), we can have skb->sk
pointing to a timewait socket.

Same fix is needed in nfnetlink_log.

Diagnosed-by: Florian Westphal <fw@strlen.de>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-09-06 14:28:18 +02:00
Mikulas Patocka
ed6fe9d614 Fix order of arguments to compat_put_time[spec|val]
Commit 644595f896 ("compat: Handle COMPAT_USE_64BIT_TIME in
net/socket.c") introduced a bug where the helper functions to take
either a 64-bit or compat time[spec|val] got the arguments in the wrong
order, passing the kernel stack pointer off as a user pointer (and vice
versa).

Because of the user address range check, that in turn then causes an
EFAULT due to the user pointer range checking failing for the kernel
address.  Incorrectly resuling in a failed system call for 32-bit
processes with a 64-bit kernel.

On odder architectures like HP-PA (with separate user/kernel address
spaces), it can be used read kernel memory.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-09-05 18:34:13 -07:00
Julian Anastasov
d013ef2aba tcp: fix possible socket refcount problem for ipv6
commit 144d56e910
("tcp: fix possible socket refcount problem") is missing
the IPv6 part. As tcp_release_cb is shared by both protocols
we should hold sock reference for the TCP_MTU_REDUCED_DEFERRED
bit.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05 17:16:25 -04:00
John W. Linville
785a7de9ee Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-09-05 14:48:15 -04:00
John W. Linville
518eefe1b6 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-09-05 14:46:30 -04:00
LEO Airwarosu Yoichi Shinoda
7ce8c7a343 mac80211: Various small fixes for cfg.c: mpath_set_pinfo()
Various small fixes for net/mac80211/cfg.c:mpath_set_pinfo():
Initialize *pinfo before filling members in, handle MESH_PATH_RESOLVED
correctly, and remove bogus assignment; result in correct display
of FLAGS values and meaningful EXPTIME for expired paths in iw utility.

Signed-off-by: Yoichi Shinoda <shinoda@jaist.ac.jp>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05 16:06:05 +02:00
Eric Dumazet
c0cc88a762 l2tp: fix a typo in l2tp_eth_dev_recv()
While investigating l2tp bug, I hit a bug in eth_type_trans(),
because not enough bytes were pulled in skb head.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 15:54:55 -04:00
David S. Miller
9d148e39d1 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch 2012-09-04 15:17:52 -04:00
Steffen Klassert
3b59df46a4 xfrm: Workaround incompatibility of ESN and async crypto
ESN for esp is defined in RFC 4303. This RFC assumes that the
sequence number counters are always up to date. However,
this is not true if an async crypto algorithm is employed.

If the sequence number counters are not up to date on sequence
number check, we may incorrectly update the upper 32 bit of
the sequence number. This leads to a DOS.

We workaround this by comparing the upper sequence number,
(used for authentication) with the upper sequence number
computed after the async processing. We drop the packet
if these numbers are different.

To do this, we introduce a recheck function that does this
check in the ESN case.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:09:45 -04:00
Eric Dumazet
37159ef2c1 l2tp: fix a lockdep splat
Fixes following lockdep splat :

[ 1614.734896] =============================================
[ 1614.734898] [ INFO: possible recursive locking detected ]
[ 1614.734901] 3.6.0-rc3+ #782 Not tainted
[ 1614.734903] ---------------------------------------------
[ 1614.734905] swapper/11/0 is trying to acquire lock:
[ 1614.734907]  (slock-AF_INET){+.-...}, at: [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.734920]
[ 1614.734920] but task is already holding lock:
[ 1614.734922]  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734932]
[ 1614.734932] other info that might help us debug this:
[ 1614.734935]  Possible unsafe locking scenario:
[ 1614.734935]
[ 1614.734937]        CPU0
[ 1614.734938]        ----
[ 1614.734940]   lock(slock-AF_INET);
[ 1614.734943]   lock(slock-AF_INET);
[ 1614.734946]
[ 1614.734946]  *** DEADLOCK ***
[ 1614.734946]
[ 1614.734949]  May be due to missing lock nesting notation
[ 1614.734949]
[ 1614.734952] 7 locks held by swapper/11/0:
[ 1614.734954]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff81592801>] __netif_receive_skb+0x251/0xd00
[ 1614.734964]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff815d319c>] ip_local_deliver_finish+0x4c/0x4e0
[ 1614.734972]  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff8160d116>] icmp_socket_deliver+0x46/0x230
[ 1614.734982]  #3:  (slock-AF_INET){+.-...}, at: [<ffffffff815fce23>] tcp_v4_err+0x163/0x6b0
[ 1614.734989]  #4:  (rcu_read_lock){.+.+..}, at: [<ffffffff815da240>] ip_queue_xmit+0x0/0x680
[ 1614.734997]  #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815d9925>] ip_finish_output+0x135/0x890
[ 1614.735004]  #6:  (rcu_read_lock_bh){.+....}, at: [<ffffffff81595680>] dev_queue_xmit+0x0/0xe00
[ 1614.735012]
[ 1614.735012] stack backtrace:
[ 1614.735016] Pid: 0, comm: swapper/11 Not tainted 3.6.0-rc3+ #782
[ 1614.735018] Call Trace:
[ 1614.735020]  <IRQ>  [<ffffffff810a50ac>] __lock_acquire+0x144c/0x1b10
[ 1614.735033]  [<ffffffff810a334b>] ? check_usage+0x9b/0x4d0
[ 1614.735037]  [<ffffffff810a6762>] ? mark_held_locks+0x82/0x130
[ 1614.735042]  [<ffffffff810a5df0>] lock_acquire+0x90/0x200
[ 1614.735047]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735051]  [<ffffffff810a69ad>] ? trace_hardirqs_on+0xd/0x10
[ 1614.735060]  [<ffffffff81749b31>] _raw_spin_lock+0x41/0x50
[ 1614.735065]  [<ffffffffa0209d72>] ? l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735069]  [<ffffffffa0209d72>] l2tp_xmit_skb+0x172/0xa50 [l2tp_core]
[ 1614.735075]  [<ffffffffa014f7f2>] l2tp_eth_dev_xmit+0x32/0x60 [l2tp_eth]
[ 1614.735079]  [<ffffffff81595112>] dev_hard_start_xmit+0x502/0xa70
[ 1614.735083]  [<ffffffff81594c6e>] ? dev_hard_start_xmit+0x5e/0xa70
[ 1614.735087]  [<ffffffff815957c1>] ? dev_queue_xmit+0x141/0xe00
[ 1614.735093]  [<ffffffff815b622e>] sch_direct_xmit+0xfe/0x290
[ 1614.735098]  [<ffffffff81595865>] dev_queue_xmit+0x1e5/0xe00
[ 1614.735102]  [<ffffffff81595680>] ? dev_hard_start_xmit+0xa70/0xa70
[ 1614.735106]  [<ffffffff815b4daa>] ? eth_header+0x3a/0xf0
[ 1614.735111]  [<ffffffff8161d33e>] ? fib_get_table+0x2e/0x280
[ 1614.735117]  [<ffffffff8160a7e2>] arp_xmit+0x22/0x60
[ 1614.735121]  [<ffffffff8160a863>] arp_send+0x43/0x50
[ 1614.735125]  [<ffffffff8160b82f>] arp_solicit+0x18f/0x450
[ 1614.735132]  [<ffffffff8159d9da>] neigh_probe+0x4a/0x70
[ 1614.735137]  [<ffffffff815a191a>] __neigh_event_send+0xea/0x300
[ 1614.735141]  [<ffffffff815a1c93>] neigh_resolve_output+0x163/0x260
[ 1614.735146]  [<ffffffff815d9cf5>] ip_finish_output+0x505/0x890
[ 1614.735150]  [<ffffffff815d9925>] ? ip_finish_output+0x135/0x890
[ 1614.735154]  [<ffffffff815dae79>] ip_output+0x59/0xf0
[ 1614.735158]  [<ffffffff815da1cd>] ip_local_out+0x2d/0xa0
[ 1614.735162]  [<ffffffff815da403>] ip_queue_xmit+0x1c3/0x680
[ 1614.735165]  [<ffffffff815da240>] ? ip_local_out+0xa0/0xa0
[ 1614.735172]  [<ffffffff815f4402>] tcp_transmit_skb+0x402/0xa60
[ 1614.735177]  [<ffffffff815f5a11>] tcp_retransmit_skb+0x1a1/0x620
[ 1614.735181]  [<ffffffff815f7e93>] tcp_retransmit_timer+0x393/0x960
[ 1614.735185]  [<ffffffff815fce23>] ? tcp_v4_err+0x163/0x6b0
[ 1614.735189]  [<ffffffff815fd317>] tcp_v4_err+0x657/0x6b0
[ 1614.735194]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735199]  [<ffffffff8160d19e>] icmp_socket_deliver+0xce/0x230
[ 1614.735203]  [<ffffffff8160d116>] ? icmp_socket_deliver+0x46/0x230
[ 1614.735208]  [<ffffffff8160d464>] icmp_unreach+0xe4/0x2c0
[ 1614.735213]  [<ffffffff8160e520>] icmp_rcv+0x350/0x4a0
[ 1614.735217]  [<ffffffff815d3285>] ip_local_deliver_finish+0x135/0x4e0
[ 1614.735221]  [<ffffffff815d319c>] ? ip_local_deliver_finish+0x4c/0x4e0
[ 1614.735225]  [<ffffffff815d3ffa>] ip_local_deliver+0x4a/0x90
[ 1614.735229]  [<ffffffff815d37b7>] ip_rcv_finish+0x187/0x730
[ 1614.735233]  [<ffffffff815d425d>] ip_rcv+0x21d/0x300
[ 1614.735237]  [<ffffffff81592a1b>] __netif_receive_skb+0x46b/0xd00
[ 1614.735241]  [<ffffffff81592801>] ? __netif_receive_skb+0x251/0xd00
[ 1614.735245]  [<ffffffff81593368>] process_backlog+0xb8/0x180
[ 1614.735249]  [<ffffffff81593cf9>] net_rx_action+0x159/0x330
[ 1614.735257]  [<ffffffff810491f0>] __do_softirq+0xd0/0x3e0
[ 1614.735264]  [<ffffffff8109ed24>] ? tick_program_event+0x24/0x30
[ 1614.735270]  [<ffffffff8175419c>] call_softirq+0x1c/0x30
[ 1614.735278]  [<ffffffff8100425d>] do_softirq+0x8d/0xc0
[ 1614.735282]  [<ffffffff8104983e>] irq_exit+0xae/0xe0
[ 1614.735287]  [<ffffffff8175494e>] smp_apic_timer_interrupt+0x6e/0x99
[ 1614.735291]  [<ffffffff81753a1c>] apic_timer_interrupt+0x6c/0x80
[ 1614.735293]  <EOI>  [<ffffffff810a14ad>] ? trace_hardirqs_off+0xd/0x10
[ 1614.735306]  [<ffffffff81336f85>] ? intel_idle+0xf5/0x150
[ 1614.735310]  [<ffffffff81336f7e>] ? intel_idle+0xee/0x150
[ 1614.735317]  [<ffffffff814e6ea9>] cpuidle_enter+0x19/0x20
[ 1614.735321]  [<ffffffff814e7538>] cpuidle_idle_call+0xa8/0x630
[ 1614.735327]  [<ffffffff8100c1ba>] cpu_idle+0x8a/0xe0
[ 1614.735333]  [<ffffffff8173762e>] start_secondary+0x220/0x222

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 14:07:50 -04:00
Alan Cox
6cf5c95117 netrom: copy_datagram_iovec can fail
Check for an error from this and if so bail properly.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-04 12:57:35 -04:00
Wei Yongjun
b4e4f47e94 nl80211: fix possible memory leak nl80211_connect()
connkeys is malloced in nl80211_parse_connkeys() and should
be freed in the error handling case, otherwise it will cause
memory leak.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-04 18:06:00 +02:00
Eliad Peller
3d2abdfdf1 mac80211: clear bssid on auth/assoc failure
ifmgd->bssid wasn't cleared properly in some
auth/assoc failure cases, causing mac80211 and
the low-level driver to go out of sync.

Clear ifmgd->bssid on failure, and notify the driver.

Cc: stable@kernel.org # 3.4+
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-04 17:14:19 +02:00
Jesse Gross
c303aa94cd openvswitch: Fix FLOW_BUFSIZE definition.
The vlan encapsulation fields in the maximum flow defintion were
never updated when the representation changed before upstreaming.
In theory this could cause a kernel panic when a maximum length
flow is used.  In practice this has never happened (to my knowledge)
because skb allocations are padded out to a cache line so you would
need the right combination of flow and packet being sent to userspace.

Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-03 19:06:27 -07:00
Eric Dumazet
b379135c40 fq_codel: dont reinit flow state
When fq_codel builds a new flow, it should not reset codel state.

Codel algo needs to get previous values (lastcount, drop_next) to get
proper behavior.

Signed-off-by: Dave Taht <dave.taht@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Dave Taht <dave.taht@bufferbloat.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 14:36:50 -04:00
Thomas Graf
4c3a5bdae2 sctp: Don't charge for data in sndbuf again when transmitting packet
SCTP charges wmem_alloc via sctp_set_owner_w() in sctp_sendmsg() and via
skb_set_owner_w() in sctp_packet_transmit(). If a sender runs out of
sndbuf it will sleep in sctp_wait_for_sndbuf() and expects to be waken up
by __sctp_write_space().

Buffer space charged via sctp_set_owner_w() is released in sctp_wfree()
which calls __sctp_write_space() directly.

Buffer space charged via skb_set_owner_w() is released via sock_wfree()
which calls sk->sk_write_space() _if_ SOCK_USE_WRITE_QUEUE is not set.
sctp_endpoint_init() sets SOCK_USE_WRITE_QUEUE on all sockets.

Therefore if sctp_packet_transmit() manages to queue up more than sndbuf
bytes, sctp_wait_for_sndbuf() will never be woken up again unless it is
interrupted by a signal.

This could be fixed by clearing the SOCK_USE_WRITE_QUEUE flag but ...

Charging for the data twice does not make sense in the first place, it
leads to overcharging sndbuf by a factor 2. Therefore this patch only
charges a single byte in wmem_alloc when transmitting an SCTP packet to
ensure that the socket stays alive until the packet has been released.

This means that control chunks are no longer accounted for in wmem_alloc
which I believe is not a problem as skb->truesize will typically lead
to overcharging anyway and thus compensates for any control overhead.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
CC: David Miller <davem@davemloft.net>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 13:24:13 -04:00
Eric Dumazet
e812347ccf net: sock_edemux() should take care of timewait sockets
sock_edemux() can handle either a regular socket or a timewait socket

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-03 13:22:43 -04:00
Joe Stringer
39855b5ba9 openvswitch: Fix typo
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-09-02 12:18:25 -07:00
Linus Torvalds
0b1a34c992 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) NLA_PUT* --> nla_put_* conversion got one case wrong in
    nfnetlink_log, fix from Patrick McHardy.

 2) Missed error return check in ipw2100 driver, from Julia Lawall.

 3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
    from Eric Dumazet.

 4) SFC driver erroneously reversed src and dst when reporting filters
    via ethtool.

 5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
    sja1000 can platform driver, from Alexey Khoroshilov and Sven
    Schmitt.

 6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
    take the lock when we really need to.  From Eric Dumazet.

 7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

 8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
    fix from Yuchung Cheng.

 9) Revert netpoll change, and fix what was actually a driver specific
    problem.  From Amerigo Wang.  This should cure bootup hangs with
    netconsole some people reported.

10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
    pointer.  From Ian Campbell.

11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
    situation.  From Oliver Neukum.

14) The davinci ethernet driver triggers an OOPS on removal because it
    frees an MDIO object before unregistering it.  Fix from Bin Liu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  net: qmi_wwan: add several new Gobi devices
  fddi: 64 bit bug in smt_add_para()
  net: ethernet: fix kernel OOPS when remove davinci_mdio module
  net/xfrm/xfrm_state.c: fix error return code
  net: ipv6: fix error return code
  net: qmi_wwan: new device: Foxconn/Novatel E396
  usbnet: fix deadlock in resume
  cs89x0 : packet reception not working
  netfilter: nf_conntrack: fix racy timer handling with reliable events
  bnx2x: Correct the ndo_poll_controller call
  bnx2x: Move netif_napi_add to the open call
  ipv4: must use rcu protection while calling fib_lookup
  bnx2x: fix 57840_MF pci id
  net: ipv4: ipmr_expire_timer causes crash when removing net namespace
  e1000e: DoS while TSO enabled caused by link partner with small MSS
  l2tp: avoid to use synchronize_rcu in tunnel free function
  gianfar: fix default tx vlan offload feature flag
  netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
  xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
  netfilter: nfnetlink_log: fix error return code in init path
  ...
2012-09-02 11:28:00 -07:00
Julia Lawall
599901c3e4 net/xfrm/xfrm_state.c: fix error return code
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:27:48 -04:00
Julia Lawall
48f125ce1c net: ipv6: fix error return code
Initialize return variable before exiting on an error path.

The initial initialization of the return variable is also dropped, because
that value is never used.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-31 16:27:48 -04:00
David S. Miller
0dcd5052c8 Merge branch 'master' of git://1984.lsi.us.es/nf 2012-08-31 13:06:37 -04:00
Pablo Neira Ayuso
5b423f6a40 netfilter: nf_conntrack: fix racy timer handling with reliable events
Existing code assumes that del_timer returns true for alive conntrack
entries. However, this is not true if reliable events are enabled.
In that case, del_timer may return true for entries that were
just inserted in the dying list. Note that packets / ctnetlink may
hold references to conntrack entries that were just inserted to such
list.

This patch fixes the issue by adding an independent timer for
event delivery. This increases the size of the ecache extension.
Still we can revisit this later and use variable size extensions
to allocate this area on demand.

Tested-by: Oliver Smith <olipro@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-31 15:50:28 +02:00
Eric Dumazet
c5ae7d4192 ipv4: must use rcu protection while calling fib_lookup
Following lockdep splat was reported by Pavel Roskin :

[ 1570.586223] ===============================
[ 1570.586225] [ INFO: suspicious RCU usage. ]
[ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
[ 1570.586229] -------------------------------
[ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
[ 1570.586233]
[ 1570.586233] other info that might help us debug this:
[ 1570.586233]
[ 1570.586236]
[ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
[ 1570.586238] 2 locks held by Chrome_IOThread/4467:
[ 1570.586240]  #0:  (slock-AF_INET){+.-...}, at: [<ffffffff814f2c0c>] release_sock+0x2c/0xa0
[ 1570.586253]  #1:  (fnhe_lock){+.-...}, at: [<ffffffff815302fc>] update_or_create_fnhe+0x2c/0x270
[ 1570.586260]
[ 1570.586260] stack backtrace:
[ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
[ 1570.586265] Call Trace:
[ 1570.586271]  [<ffffffff810976ed>] lockdep_rcu_suspicious+0xfd/0x130
[ 1570.586275]  [<ffffffff8153042c>] update_or_create_fnhe+0x15c/0x270
[ 1570.586278]  [<ffffffff815305b3>] __ip_rt_update_pmtu+0x73/0xb0
[ 1570.586282]  [<ffffffff81530619>] ip_rt_update_pmtu+0x29/0x90
[ 1570.586285]  [<ffffffff815411dc>] inet_csk_update_pmtu+0x2c/0x80
[ 1570.586290]  [<ffffffff81558d1e>] tcp_v4_mtu_reduced+0x2e/0xc0
[ 1570.586293]  [<ffffffff81553bc4>] tcp_release_cb+0xa4/0xb0
[ 1570.586296]  [<ffffffff814f2c35>] release_sock+0x55/0xa0
[ 1570.586300]  [<ffffffff815442ef>] tcp_sendmsg+0x4af/0xf50
[ 1570.586305]  [<ffffffff8156fc60>] inet_sendmsg+0x120/0x230
[ 1570.586308]  [<ffffffff8156fb40>] ? inet_sk_rebuild_header+0x40/0x40
[ 1570.586312]  [<ffffffff814f4bdd>] ? sock_update_classid+0xbd/0x3b0
[ 1570.586315]  [<ffffffff814f4c50>] ? sock_update_classid+0x130/0x3b0
[ 1570.586320]  [<ffffffff814ec435>] do_sock_write+0xc5/0xe0
[ 1570.586323]  [<ffffffff814ec4a3>] sock_aio_write+0x53/0x80
[ 1570.586328]  [<ffffffff8114bc83>] do_sync_write+0xa3/0xe0
[ 1570.586332]  [<ffffffff8114c5a5>] vfs_write+0x165/0x180
[ 1570.586335]  [<ffffffff8114c805>] sys_write+0x45/0x90
[ 1570.586340]  [<ffffffff815d2722>] system_call_fastpath+0x16/0x1b

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 13:33:08 -04:00
Francesco Ruggeri
acbb219d5f net: ipv4: ipmr_expire_timer causes crash when removing net namespace
When tearing down a net namespace, ipv4 mr_table structures are freed
without first deactivating their timers. This can result in a crash in
run_timer_softirq.
This patch mimics the corresponding behaviour in ipv6.
Locking and synchronization seem to be adequate.
We are about to kfree mrt, so existing code should already make sure that
no other references to mrt are pending or can be created by incoming traffic.
The functions invoked here do not cause new references to mrt or other
race conditions to be created.
Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
Both ipmr_expire_process (whose completion we may have to wait in
del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
or other synchronizations when needed, and they both only modify mrt.

Tested in Linux 3.4.8.

Signed-off-by: Francesco Ruggeri <fruggeri@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:51:32 -04:00
xeb@mail.ru
99469c32f7 l2tp: avoid to use synchronize_rcu in tunnel free function
Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
atomic.

Signed-off-by: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-30 12:31:03 -04:00
Pablo Neira Ayuso
3f509c689a netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
We're hitting bug while trying to reinsert an already existing
expectation:

kernel BUG at kernel/timer.c:895!
invalid opcode: 0000 [#1] SMP
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0069563>] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
 [<ffffffff812d423a>] ? in4_pton+0x72/0x131
 [<ffffffffa00ca69e>] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
 [<ffffffffa00b5b9b>] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
 [<ffffffffa00b5f15>] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
 [<ffffffff8103f1eb>] ? irq_exit+0x9a/0x9c
 [<ffffffffa00ca738>] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]

We have to remove the RTP expectation if the RTCP expectation hits EBUSY
since we keep trying with other ports until we succeed.

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 18:27:14 +02:00
Julia Lawall
6fc09f10f1 netfilter: nfnetlink_log: fix error return code in init path
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:29:58 +02:00
Julia Lawall
ef6acf68c2 netfilter: ctnetlink: fix error return code in init path
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:28:22 +02:00
Julia Lawall
0a54e939d8 ipvs: fix error return code
Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
 { ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
 return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 03:27:19 +02:00
Amerigo Wang
072a9c4860 netpoll: revert 6bdb7fe310 and fix be_poll() instead
Against -net.

In the patch "netpoll: re-enable irq in poll_napi()", I tried to
fix the following warning:

[100718.051041] ------------[ cut here ]------------
[100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
(Not tainted)
[100718.051049] Hardware name: ProLiant BL460c G7
...
[100718.051068] Call Trace:
[100718.051073]  [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[100718.051075]  [<ffffffff8106b79a>] ? warn_slowpath_null+0x1a/0x20
[100718.051077]  [<ffffffff810747ed>] ? local_bh_enable_ip+0x7d/0xb0
[100718.051080]  [<ffffffff8150041b>] ? _spin_unlock_bh+0x1b/0x20
[100718.051085]  [<ffffffffa00ee974>] ? be_process_mcc+0x74/0x230 [be2net]
[100718.051088]  [<ffffffffa00ea68c>] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
[100718.051090]  [<ffffffff8144fe76>] ? netpoll_poll_dev+0xd6/0x490
[100718.051095]  [<ffffffffa01d24a5>] ? bond_poll_controller+0x75/0x80 [bonding]
[100718.051097]  [<ffffffff8144fde5>] ? netpoll_poll_dev+0x45/0x490
[100718.051100]  [<ffffffff81161b19>] ? ksize+0x19/0x80
[100718.051102]  [<ffffffff81450437>] ? netpoll_send_skb_on_dev+0x157/0x240

by reenabling IRQ before calling ->poll, but it seems more
problems are introduced after that patch:

http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
http://marc.info/?l=linux-netdev&m=134563282530588&w=2

So it is safe to fix be2net driver code directly.

This patch reverts the offending commit and fixes be_poll() by
avoid disabling BH there, this is okay because be_poll()
can be called either by poll_napi() which already disables
IRQ, or by net_rx_action() which already disables BH.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-29 15:03:23 -04:00
Vinicius Costa Gomes
d8343f1257 Bluetooth: Fix sending a HCI Authorization Request over LE links
In the case that the link is already in the connected state and a
Pairing request arrives from the mgmt interface, hci_conn_security()
would be called but it was not considering LE links.

Reported-by: João Paulo Rechi Vita <jprvita@openbossa.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:11:51 -07:00
Vinicius Costa Gomes
cc110922da Bluetooth: Change signature of smp_conn_security()
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:07:18 -07:00
Linus Torvalds
8497ae61d0 Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
  Heilman for their testing and debugging help."

* 'for-3.6' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
  svcrpc: sends on closed socket should stop immediately
  svcrpc: fix BUG() in svc_tcp_clear_pages
  nfsd4: fix security flavor of NFSv4.0 callback
2012-08-25 11:43:41 -07:00
David S. Miller
d05cebb915 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
This batch of fixes is intended for 3.6...

Johannes Berg gives us a pair of iwlwifi fixes.  One corrects some
improperly defined ifdefs that lead to crashes and BUG_ONs.  The other
prevents attempts to read SRAM for devices that aren't actually started.

Julia Lawall provides an ipw2100 fix to properly set the return code
from a function call before testing it! :-)

Thomas Huehn corrects the improper use of a constant related to a power
setting in ath5k.

Thomas Pedersen offers a mac80211 fix to properly handle destination
addresses of unicast frames passing though a mesh gate.

Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
interface state when the device goes down.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 15:15:10 -04:00
Yuchung Cheng
7c4a56fec3 tcp: fix cwnd reduction for non-sack recovery
The cwnd reduction in fast recovery is based on the number of packets
newly delivered per ACK. For non-sack connections every DUPACK
signifies a packet has been delivered, but the sender mistakenly
skips counting them for cwnd reduction.

The fix is to compute newly_acked_sacked after DUPACKs are accounted
in sacked_out for non-sack connections.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:48:58 -04:00
Pablo Neira Ayuso
20e1db19db netlink: fix possible spoofing from non-root processes
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.

The userspace process usually verifies the legitimate origin in
two ways:

a) Socket credentials. If UID != 0, then the message comes from
   some ilegitimate process and the message needs to be dropped.

b) Netlink portID. In general, portID == 0 means that the origin
   of the messages comes from the kernel. Thus, discarding any
   message not coming from the kernel.

However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.

Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.

This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.

Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:36:09 -04:00
Eric Dumazet
78df76a065 ipv4: take rt_uncached_lock only if needed
Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.

This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.

Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:47:48 -04:00
John W. Linville
e72615f6ab Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-08-24 11:16:58 -04:00
Fengguang Wu
a0dfb2634e af_packet: match_fanout_group() can be static
cc: Eric Leblond <eric@regit.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:27:12 -07:00
John W. Linville
c6b6eedc29 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-23 09:51:15 -04:00
Eric Dumazet
9b04f35005 ipv4: properly update pmtu
Sylvain Munault reported following info :

 - TCP connection get "stuck" with data in send queue when doing
   "large" transfers ( like typing 'ps ax' on a ssh connection )
 - Only happens on path where the PMTU is lower than the MTU of
   the interface
 - Is not present right after boot, it only appears 10-20min after
   boot or so. (and that's inside the _same_ TCP connection, it works
   fine at first and then in the same ssh session, it'll get stuck)
 - Definitely seems related to fragments somehow since I see a router
   sending ICMP message saying fragmentation is needed.
 - Exact same setup works fine with kernel 3.5.1

Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.

ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.

It seems we want to set the expires field to a new value, regardless
of prior one.

With help from Julian Anastasov.

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Julian Anastasov <ja@ssi.bg>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 19:14:30 -07:00
Linus Torvalds
f753c4ec15 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "Jim's fix closes a narrow race introduced with the msgr changes.  One
  fix resolves problems with debugfs initialization that Yan found when
  multiple client instances are created (e.g., two clusters mounted, or
  rbd + cephfs), another one fixes problems with mounting a nonexistent
  server subdirectory, and the last one fixes a divide by zero error
  from unsanitized ioctl input that Dan Carpenter found."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: avoid divide by zero in __validate_layout()
  libceph: avoid truncation due to racing banners
  ceph: tolerate (and warn on) extraneous dentry from mds
  libceph: delay debugfs initialization until we learn global_id
2012-08-22 09:58:05 -07:00
Thomas Pedersen
27f011243a mac80211: fix DS to MBSS address translation
The destination address of unicast frames forwarded through a mesh gate
was being replaced with the broadcast address. Instead leave the
original destination address as the mesh DA. If the nexthop address is
not in the mpath table it will be resolved. If that fails, the frame
will be forwarded to known mesh gates.

Reported-by: Cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-22 09:45:05 +02:00
Jim Schutt
6d4221b537 libceph: avoid truncation due to racing banners
Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work().  During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner.  In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client.  As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read().  If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner.  Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21 15:55:27 -07:00
Eric Dumazet
e0e3cea46d af_netlink: force credentials passing [CVE-2012-3520]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e5726269
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:53:01 -07:00
Eric Dumazet
a9915a1b52 ipv4: fix ip header ident selection in __ip_make_skb()
Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3 (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:51:06 -07:00
Christoph Paasch
1a7b27c97c ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
Since 0e73441992 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:49:11 -07:00