Commit Graph

9854 Commits

Author SHA1 Message Date
Tom Talpey
08ca0dce1e RPC/RDMA: correct the reconnect timer backoff
The RPC/RDMA code had a constant 5-second reconnect backoff, and
always performed it, even when re-establishing a connection to a
server after the RPC layer closed it due to being idle. Make it
an geometric backoff (up to 30 seconds), and don't delay idle
reconnect.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:15:02 -04:00
Tom Talpey
b3cd8d45a7 RPC/RDMA: optionally emit useful transport info upon connect/disconnect.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:59 -04:00
Tom Talpey
5f37d561e0 RPC/RDMA: reformat a debug printk to keep lines together.
The send marshaling code split a particular dprintk across two
lines, which makes it hard to extract from logfiles.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:42 -04:00
Tom Talpey
5675add36e RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.
Add defensive timeouts to wait_for_completion() calls in RDMA
address resolution, and make them interruptible. Fix the timeout
units to milliseconds (formerly jiffies) and move to private header.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:31 -04:00
Tom Talpey
1a954051b0 RPC/RDMA: fix connect/reconnect resource leak.
The RPC/RDMA code can leak RDMA connection manager endpoints in
certain error cases on connect. Don't signal unwanted events,
and be certain to destroy any allocated qp.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:02 -04:00
Tom Talpey
926449ba66 RPC/RDMA: return a consistent error, when connect fails.
The xprt_connect call path does not expect such errors as ECONNREFUSED
to be returned from failed transport connection attempts, otherwise it
translates them to EIO and signals fatal errors. For example, mount.nfs
prints simply "internal error". Translate all such errors to ENOTCONN
from RPC/RDMA to match sockets behavior.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:12:44 -04:00
Tom Talpey
9191ca3b38 RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
The RPC/RDMA protocol allows clients and servers to avoid RDMA
operations for data which is purely the result of XDR padding.
On the client, automatically insert the necessary padding for
such server replies, and optionally don't marshal such chunks.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:12:33 -04:00
Tom Talpey
fee08caf94 RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.
RDMA disconnects yield an upcall from the RDMA connection manager,
which can race with rpc transport close, e.g. on ^C of a mount.
Ensure any rdma cm_id and qp are fully destroyed before continuing.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:11:40 -04:00
Tom Talpey
ad0e9e01da RPC/RDMA: maintain the RPC task bytes-sent statistic.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:10:40 -04:00
Tom Talpey
575448bd36 RPC/RDMA: suppress retransmit on RPC/RDMA clients.
An RPC/RDMA client cannot retransmit on an unbroken connection,
doing so violates its flow control with the server.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:10:36 -04:00
Tom Tucker
b334eaabf4 RPC/RDMA: fix connection IRD/ORD setting
This logic sets the connection parameter that configures the local device
and informs the remote peer how many concurrent incoming RDMA_READ
requests are supported. The original logic didn't really do what was
intended for two reasons:

- The max number supported by the device is typically smaller than
any one factor in the calculation used, and

- The field in the connection parameter structure where the value is
stored is a u8 and always overflows for the default settings.

So what really happens is the value requested for responder resources
is the left over 8 bits from the "desired value". If the desired value
happened to be a multiple of 256, the result was zero and it wouldn't
connect at all.

Given the above and the fact that max_requests is almost always larger
than the max responder resources supported by the adapter, this patch
simplifies this logic and simply requests the max supported by the device,
subject to a reasonable limit.

This bug was found by Jim Schutt at Sandia.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:56 -04:00
Tom Talpey
3197d309f5 RPC/RDMA: support FRMR client memory registration.
Configure, detect and use "fastreg" support from IB/iWARP verbs
layer to perform RPC/RDMA memory registration.

Make FRMR the default memreg mode (will fall back if not supported
by the selected RDMA adapter).

This allows full and optimal operation over the cxgb3 adapter, and others.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:34 -04:00
Tom Talpey
bd7ed1d133 RPC/RDMA: check selected memory registration mode at runtime.
At transport creation, check for, and use, any local dma lkey.
Then, check that the selected memory registration mode is in fact
supported by the RDMA adapter selected for the mount. Fall back
to best alternative if not.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:26 -04:00
Tom Talpey
fe9053b30b RPC/RDMA: add data types and new FRMR memory registration enum.
Internal RPC/RDMA structure updates in preparation for FRMR support.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:24 -04:00
Tom Talpey
8d4ba0347c RPC/RDMA: refactor the inline memory registration code.
Refactor the memory registration and deregistration routines.
This saves stack space, makes the code more readable and prepares
to add the new FRMR registration methods.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:16 -04:00
Cedric Le Goater
63ffc23d30 sunrpc: fix oops in rpc_create when the mount namespace is unshared
On a system with nfs mounts, if a task unshares its mount namespace,
a oops can occur when the system is rebooted if the task is the last
to unreference the nfs mount. It will try to create a rpc request
using utsname() which has been invalidated by free_nsproxy().

The patch fixes the issue by using the global init_utsname() which is
always valid. the capability of identifying rpc clients per uts namespace
stills needs some extra work so this should not be a problem.

BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c024c9ab>] rpc_create+0x332/0x42f
Oops: 0000 [#1] DEBUG_PAGEALLOC

Pid: 1857, comm: uts-oops Not tainted (2.6.27-rc5-00319-g7686ad5 #4)
EIP: 0060:[<c024c9ab>] EFLAGS: 00210287 CPU: 0
EIP is at rpc_create+0x332/0x42f
EAX: 00000000 EBX: df26adf0 ECX: c0251887 EDX: 00000001
ESI: df26ae58 EDI: c02f293c EBP: dda0fc9c ESP: dda0fc2c
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process uts-oops (pid: 1857, ti=dda0e000 task=dd9a0778 task.ti=dda0e000)
Stack: c0104532 dda0fffc dda0fcac dda0e000 dda0e000 dd93b7f0 00000009 c02f2880
       df26aefc dda0fc68 c01096b7 00000000 c0266ee0 c039a070 c039a070 dda0fc74
       c012ca67 c039a064 dda0fc8c c012cb20 c03daf74 00000011 00000000 c0275c90
Call Trace:
 [<c0104532>] ? dump_trace+0xc2/0xe2
 [<c01096b7>] ? save_stack_trace+0x1c/0x3a
 [<c012ca67>] ? save_trace+0x37/0x8c
 [<c012cb20>] ? add_lock_to_list+0x64/0x96
 [<c0256fc4>] ? rpcb_register_call+0x62/0xbb
 [<c02570c8>] ? rpcb_register+0xab/0xb3
 [<c0252f4d>] ? svc_register+0xb4/0x128
 [<c0253114>] ? svc_destroy+0xec/0x103
 [<c02531b2>] ? svc_exit_thread+0x87/0x8d
 [<c01a75cd>] ? lockd_down+0x61/0x81
 [<c01a577b>] ? nlmclnt_done+0xd/0xf
 [<c01941fe>] ? nfs_destroy_server+0x14/0x16
 [<c0194328>] ? nfs_free_server+0x4c/0xaa
 [<c019a066>] ? nfs_kill_super+0x23/0x27
 [<c0158585>] ? deactivate_super+0x3f/0x51
 [<c01695d1>] ? mntput_no_expire+0x95/0xb4
 [<c016965b>] ? release_mounts+0x6b/0x7a
 [<c01696cc>] ? __put_mnt_ns+0x62/0x70
 [<c0127501>] ? free_nsproxy+0x25/0x80
 [<c012759a>] ? switch_task_namespaces+0x3e/0x43
 [<c01275a9>] ? exit_task_namespaces+0xa/0xc
 [<c0117fed>] ? do_exit+0x4fd/0x666
 [<c01181b3>] ? do_group_exit+0x5d/0x83
 [<c011fa8c>] ? get_signal_to_deliver+0x2c8/0x2e0
 [<c0102630>] ? do_notify_resume+0x69/0x700
 [<c011d85a>] ? do_sigaction+0x134/0x145
 [<c0127205>] ? hrtimer_nanosleep+0x8f/0xce
 [<c0126d1a>] ? hrtimer_wakeup+0x0/0x1c
 [<c0103488>] ? work_notifysig+0x13/0x1b
 =======================
Code: 70 20 68 cb c1 2c c0 e8 75 4e 01 00 8b 83 ac 00 00 00 59 3d 00 f0 ff ff 5f 77 63 eb 57 a1 00 80 2d c0 8b 80 a8 02 00 00 8d 73 68 <8b> 40 04 83 c0 45 e8 41 46 f7 ff ba 20 00 00 00 83 f8 21 0f 4c
EIP: [<c024c9ab>] rpc_create+0x332/0x42f SS:ESP 0068:dda0fc2c

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:19:10 -04:00
Trond Myklebust
96165e2b7c SUNRPC: Fix a memory leak in rpcb_getport_async
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:18:57 -04:00
Trond Myklebust
9a4bd29fe8 SUNRPC: Fix autobind on cloned rpc clients
Despite the fact that cloned rpc clients won't have the cl_autobind flag
set, they may still find themselves calling rpcb_getport_async(). For this
to happen, it suffices for a _parent_ rpc_clnt to use autobinding, in which
case any clone may find itself triggering the !xprt_bound() case in
call_bind().

The correct fix for this is to walk back up the tree of cloned rpc clients,
in order to find the parent that 'owns' the transport, either because it
has clnt->cl_autobind set, or because it originally created the
transport...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:18:53 -04:00
Denis V. Lunev
c9f6cde6e2 sunrpc: do not pin sunrpc module in the memory
Basically, try_module_get here are pretty useless. Any other module using
this API will pin sunrpc in memory due using exported symbols.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:14:54 -04:00
Timo Teras
0523820482 af_key: Free dumping state on socket close
Fix a xfrm_{state,policy}_walk leak if pfkey socket is closed while
dumping is on-going.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 05:17:54 -07:00
Arnaud Ebalard
5dc121e9a7 XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachep
ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to
be initialized) when dst_alloc() is called from ip6_dst_blackhole().
Otherwise, it results in the following (xfrm_larval_drop is now set to
1 by default):

[   78.697642] Unable to handle kernel paging request for data at address 0x0000004c
[   78.703449] Faulting instruction address: 0xc0097f54
[   78.786896] Oops: Kernel access of bad area, sig: 11 [#1]
[   78.792791] PowerMac
[   78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb
[   78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430
[   78.809997] REGS: eef19ad0 TRAP: 0300   Not tainted  (2.6.27-rc5)
[   78.815743] MSR: 00001032 <ME,IR,DR>  CR: 22242482  XER: 20000000
[   78.821550] DAR: 0000004c, DSISR: 40000000
[   78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000
[   78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000
[   78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960
[   78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30
[   78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064
[   78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94
[   78.862581] LR [c0334a28] dst_alloc+0x40/0xc4
[   78.868451] Call Trace:
[   78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable)
[   78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4
[   78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc
[   78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88
[   78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78
[   78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4
[   78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0
[   78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210
[   78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38
[   78.927295] --- Exception: c01 at 0xfe2d730
[   78.927297]     LR = 0xfe2d71c
[   78.939019] Instruction dump:
[   78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378
[   78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050
[   78.956464] ---[ end trace 05fa1ed7972487a1 ]---

As commented by Benjamin Thery, the bug was introduced by
f2fc6a5458, while adding network
namespaces support to ipv6 routes.

Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:37:56 -07:00
Denis V. Lunev
2a5b82751f ipv6: NULL pointer dereferrence in tcp_v6_send_ack
The following actions are possible:
tcp_v6_rcv
  skb->dev = NULL;
  tcp_v6_do_rcv
    tcp_v6_hnd_req
      tcp_check_req
        req->rsk_ops->send_ack == tcp_v6_send_ack

So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace
from dst entry.

Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding
in IPv4 code.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 02:13:16 -07:00
Vitaliy Gusev
4dd7972d12 tcp: Fix NULL dereference in tcp_4_send_ack()
Fix NULL dereference in tcp_4_send_ack().

As skb->dev is reset to NULL in tcp_v4_rcv() thus OOPS occurs:

BUG: unable to handle kernel NULL pointer dereference at 00000000000004d0
IP: [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250

Stack:  ffff810005dbb000 ffff810015c8acc0 e77b2c6e5f861600 a01610802e90cb6d
 0a08010100000000 88afffff88afffff 0000000080762be8 0000000115c872e8
 0004122000000000 0000000000000001 ffffffff80762b88 0000000000000020
Call Trace:
 <IRQ>  [<ffffffff80499c33>] tcp_v4_reqsk_send_ack+0x20/0x22
 [<ffffffff8049bce5>] tcp_check_req+0x108/0x14c
 [<ffffffff8047aaf7>] ? rt_intern_hash+0x322/0x33c
 [<ffffffff80499846>] tcp_v4_do_rcv+0x399/0x4ec
 [<ffffffff8045ce4b>] ? skb_checksum+0x4f/0x272
 [<ffffffff80485b74>] ? __inet_lookup_listener+0x14a/0x15c
 [<ffffffff8049babc>] tcp_v4_rcv+0x6a1/0x701
 [<ffffffff8047e739>] ip_local_deliver_finish+0x157/0x24a
 [<ffffffff8047ec9a>] ip_local_deliver+0x72/0x7c
 [<ffffffff8047e5bd>] ip_rcv_finish+0x38d/0x3b2
 [<ffffffff803d3548>] ? scsi_io_completion+0x19d/0x39e
 [<ffffffff8047ebe5>] ip_rcv+0x2a2/0x2e5
 [<ffffffff80462faa>] netif_receive_skb+0x293/0x303
 [<ffffffff80465a9b>] process_backlog+0x80/0xd0
 [<ffffffff802630b4>] ? __rcu_process_callbacks+0x125/0x1b4
 [<ffffffff8046560e>] net_rx_action+0xb9/0x17f
 [<ffffffff80234cc5>] __do_softirq+0xa3/0x164
 [<ffffffff8020c52c>] call_softirq+0x1c/0x28
 <EOI>  [<ffffffff8020de1c>] do_softirq+0x34/0x72
 [<ffffffff80234b8e>] local_bh_enable_ip+0x3f/0x50
 [<ffffffff804d43ca>] _spin_unlock_bh+0x12/0x14
 [<ffffffff804599cd>] release_sock+0xb8/0xc1
 [<ffffffff804a6f9a>] inet_stream_connect+0x146/0x25c
 [<ffffffff80243078>] ? autoremove_wake_function+0x0/0x38
 [<ffffffff8045751f>] sys_connect+0x68/0x8e
 [<ffffffff80291818>] ? fd_install+0x5f/0x68
 [<ffffffff80457784>] ? sock_map_fd+0x55/0x62
 [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80

Code: 41 10 11 d0 83 d0 00 4d 85 ed 89 45 c0 c7 45 c4 08 00 00 00 74 07 41 8b 45 04 89 45 c8 48 8b 43 20 8b 4d b8 48 8d 55 b0 48 89 de <48> 8b 80 d0 04 00 00 48 8b b8 60 01 00 00 e8 20 ae fe ff 65 48
RIP  [<ffffffff80498503>] tcp_v4_send_ack+0x203/0x250
 RSP <ffffffff80762b78>
CR2: 00000000000004d0

Signed-off-by: Vitaliy Gusev <vgusev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-01 01:51:39 -07:00
Wei Yongjun
ba0166708e sctp: Fix kernel panic while process protocol violation parameter
Since call to function sctp_sf_abort_violation() need paramter 'arg' with
'struct sctp_chunk' type, it will read the chunk type and chunk length from
the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen()
always with 'struct sctp_paramhdr' type's parameter, it will be passed to
sctp_sf_abort_violation(). This may cause kernel panic.

   sctp_sf_violation_paramlen()
     |-- sctp_sf_abort_violation()
        |-- sctp_make_abort_violation()

This patch fixed this problem. This patch also fix two place which called
sctp_sf_violation_paramlen() with wrong paramter type.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30 05:32:24 -07:00
Heiko Carstens
8b122efd13 iucv: Fix mismerge again.
fb65a7c091 ("iucv: Fix bad merging.") fixed
a merge error, but in a wrong way. We now end up with the bug below.
This patch corrects the mismerge like it was intended.

BUG: scheduling while atomic: swapper/1/0x00000000
Modules linked in:
CPU: 1 Not tainted 2.6.27-rc7-00094-gc0f4d6d #9
Process swapper (pid: 1, task: 000000003fe7d988, ksp: 000000003fe838c0)
0000000000000000 000000003fe839b8 0000000000000002 0000000000000000
       000000003fe83a58 000000003fe839d0 000000003fe839d0 0000000000390de6
       000000000058acd8 00000000000000d0 000000003fe7dcd8 0000000000000000
       000000000000000c 000000000000000d 0000000000000000 000000003fe83a28
       000000000039c5b8 0000000000015e5e 000000003fe839b8 000000003fe83a00
Call Trace:
([<0000000000015d6a>] show_trace+0xe6/0x134)
 [<0000000000039656>] __schedule_bug+0xa2/0xa8
 [<0000000000391744>] schedule+0x49c/0x910
 [<0000000000391f64>] schedule_timeout+0xc4/0x114
 [<00000000003910d4>] wait_for_common+0xe8/0x1b4
 [<00000000000549ae>] call_usermodehelper_exec+0xa6/0xec
 [<00000000001af7b8>] kobject_uevent_env+0x418/0x438
 [<00000000001d08fc>] bus_add_driver+0x1e4/0x298
 [<00000000001d1ee4>] driver_register+0x90/0x18c
 [<0000000000566848>] netiucv_init+0x168/0x2c8
 [<00000000000120be>] do_one_initcall+0x3e/0x17c
 [<000000000054a31a>] kernel_init+0x1ce/0x248
 [<000000000001a97a>] kernel_thread_starter+0x6/0xc
 [<000000000001a974>] kernel_thread_starter+0x0/0xc
 iucv: NETIUCV driver initialized
initcall netiucv_init+0x0/0x2c8 returned with preemption imbalance

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30 03:03:35 -07:00
Herbert Xu
d01dbeb6af ipsec: Fix pskb_expand_head corruption in xfrm_state_check_space
We're never supposed to shrink the headroom or tailroom.  In fact,
shrinking the headroom is a fatal action.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-30 02:03:19 -07:00
Linus Torvalds
efba91bd90 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion
  ath9k: disable MIB interrupts to fix interrupt storm
  [Bluetooth] Fix USB disconnect handling of btusb driver
  [Bluetooth] Fix wrong URB handling of btusb driver
  [Bluetooth] Fix I/O errors on MacBooks with Broadcom chips
2008-09-24 16:45:07 -07:00
Yasuyuki Kozakai
8ca31ce52a netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion
The current code ignores rules for internal options in HBH/DST options
header in packet processing if 'Not strict' mode is specified (which is not
implemented). Clearly it is not expected by user.

Kernel should reject HBH/DST rule insertion with 'Not strict' mode
in the first place.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-24 15:53:39 -07:00
Linus Torvalds
7a528159b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: fix put_data error handling
  9p: use an IS_ERR test rather than a NULL test
  9p: introduce missing kfree
  9p-trans_fd: fix and clean up module init/exit paths
  9p-trans_fd: don't do fs segment mangling in p9_fd_poll()
  9p-trans_fd: clean up p9_conn_create()
  9p-trans_fd: fix trans_fd::p9_conn_destroy()
  9p: implement proper trans module refcounting and unregistration
2008-09-24 15:33:50 -07:00
Eric Van Hensbergen
16ec470012 9p: fix put_data error handling
Abhishek Kulkarni pointed out an inconsistency in the way
errors are returned from p9_put_data.  On deeper exploration it
seems the error handling for this path was completely wrong.
This patch adds checks for allocation problems and propagates
errors correctly.

Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:22 -05:00
Julia Lawall
620678244b 9p: introduce missing kfree
Error handling code following a kmalloc should free the allocated data.

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@

(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
     when != if (...) { <+...x...+> }
x->f = E
...>
(
 return \(0\|<+...x...+>\|ptr\);
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2008-09-24 16:22:22 -05:00
Tejun Heo
206ca50de7 9p-trans_fd: fix and clean up module init/exit paths
trans_fd leaked p9_mux_wq on module unload.  Fix it.  While at it,
collapse p9_mux_global_init() into p9_trans_fd_init().  It's easier to
follow this way and the global poll_tasks array is about to removed
anyway.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Tejun Heo
ec3c68f232 9p-trans_fd: don't do fs segment mangling in p9_fd_poll()
p9_fd_poll() is never called with user pointers and f_op->poll()
doesn't expect its arguments to be from userland.  There's no need to
set kernel ds before calling f_op->poll() from p9_fd_poll().  Remove
it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Tejun Heo
571ffeafff 9p-trans_fd: clean up p9_conn_create()
* Use kzalloc() to allocate p9_conn and remove 0/NULL initializations.

* Clean up error return paths.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Tejun Heo
7dc5d24be0 9p-trans_fd: fix trans_fd::p9_conn_destroy()
p9_conn_destroy() first kills all current requests by calling
p9_conn_cancel(), then waits for the request list to be cleared by
waiting on p9_conn->equeue.  After that, polling is stopped and the
trans is destroyed.  This sequence has a few problems.

* Read and write works were never cancelled and the p9_conn can be
  destroyed while the works are running as r/w works remove requests
  from the list and dereference the p9_conn from them.

* The list emptiness wait using p9_conn->equeue wouldn't trigger
  because p9_conn_cancel() always clears all the lists and the only
  way the wait can be triggered is to have another task to issue a
  request between the slim window between p9_conn_cancel() and the
  wait, which isn't safe under the current implementation with or
  without the wait.

This patch fixes the problem by first stopping poll, which can
schedule r/w works, first and cancle r/w works which guarantees that
r/w works are not and will not run from that point and then calling
p9_conn_cancel() and do the rest of destruction.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Tejun Heo
72029fe85d 9p: implement proper trans module refcounting and unregistration
9p trans modules aren't refcounted nor were they unregistered
properly.  Fix it.

* Add 9p_trans_module->owner and reference the module on each trans
  instance creation and put it on destruction.

* Protect v9fs_trans_list with a spinlock.  This isn't strictly
  necessary as the list is manipulated only during module loading /
  unloading but it's a good idea to make the API safe.

* Unregister trans modules when the corresponding module is being
  unloaded.

* While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2008-09-24 16:22:23 -05:00
Michael Kerrisk
2d4c826677 sys_paccept: disable paccept() until API design is resolved
The reasons for disabling paccept() are as follows:

* The API is more complex than needed.  There is AFAICS no demonstrated
  use case that the sigset argument of this syscall serves that couldn't
  equally be served by the use of pselect/ppoll/epoll_pwait + traditional
  accept().  Roland seems to concur with this opinion
  (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).  I
  have (more than once) asked Ulrich to explain otherwise
  (http://thread.gmane.org/gmane.linux.kernel/723952/focus=731018), but he
  does not respond, so one is left to assume that he doesn't know of such
  a case.

* The use of a sigset argument is not consistent with other I/O APIs
  that can block on a single file descriptor (e.g., read(), recv(),
  connect()).

* The behavior of paccept() when interrupted by a signal is IMO strange:
  the kernel restarts the system call if SA_RESTART was set for the
  handler.  I think that it should not do this -- that it should behave
  consistently with paccept()/ppoll()/epoll_pwait(), which never restart,
  regardless of SA_RESTART.  The reasoning here is that the very purpose
  of paccept() is to wait for a connection or a signal, and that
  restarting in the latter case is probably never useful.  (Note: Roland
  disagrees on this point, believing that rather paccept() should be
  consistent with accept() in its behavior wrt EINTR
  (http://thread.gmane.org/gmane.linux.kernel/723953/focus=732255).)

I believe that instead, a simpler API, consistent with Ulrich's other
recent additions, is preferable:

accept4(int fd, struct sockaddr *sa, socklen_t *salen, ind flags);

(This simpler API was originally proposed by Ulrich:
http://thread.gmane.org/gmane.linux.network/92072)

If this simpler API is added, then if we later decide that the sigset
argument really is required, then a suitable bit in 'flags' could be added
to indicate the presence of the sigset argument.

At this point, I am hoping we either will get a counter-argument from
Ulrich about why we really do need paccept()'s sigset argument, or that he
will resubmit the original accept4() patch.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Alan Cox <alan@redhat.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-23 08:09:14 -07:00
Linus Torvalds
18f22fbb8b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  netdev: simple_tx_hash shouldn't hash inside fragments
2008-09-22 07:45:06 -07:00
Alexander Duyck
ad55dcaff0 netdev: simple_tx_hash shouldn't hash inside fragments
Currently simple_tx_hash is hashing inside of udp fragments.  As a result
packets are getting getting sent to all queues when they shouldn't be.
This causes a serious performance regression which can be seen by sending
UDP frames larger than mtu on multiqueue devices.  This change will make
it so that fragments are hashed only as IP datagrams w/o any protocol
information.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-20 22:05:50 -07:00
Linus Torvalds
764527a1b3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  e100: Use pci_pme_active to clear PME_Status and disable PME#
  e1000: prevent corruption of EEPROM/NVM
  forcedeth: call restore mac addr in nv_shutdown path
  bnx2: Promote vector field in bnx2_irq structure from u16 to unsigned int
  sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH
  sctp: do not enable peer features if we can't do them.
  sctp: set the skb->ip_summed correctly when sending over loopback.
  udp: Fix rcv socket locking
2008-09-19 16:01:37 -07:00
Vlad Yasevich
add52379dd sctp: Fix oops when INIT-ACK indicates that peer doesn't support AUTH
If INIT-ACK is received with SupportedExtensions parameter which
indicates that the peer does not support AUTH, the packet will be
silently ignore, and sctp_process_init() do cleanup all of the
transports in the association.
When T1-Init timer is expires, OOPS happen while we try to choose
a different init transport.

The solution is to only clean up the non-active transports, i.e
the ones that the peer added.  However, that introduces a problem
with sctp_connectx(), because we don't mark the proper state for
the transports provided by the user.  So, we'll simply mark
user-provided transports as ACTIVE.  That will allow INIT
retransmissions to work properly in the sctp_connectx() context
and prevent the crash.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-18 16:28:27 -07:00
Vlad Yasevich
0ef46e285c sctp: do not enable peer features if we can't do them.
Do not enable peer features like addip and auth, if they
are administratively disabled localy.  If the peer resports
that he supports something that we don't, neither end can
use it so enabling it is pointless.  This solves a problem
when talking to a peer that has auth and addip enabled while
we do not.  Found by Andrei Pelinescu-Onciul <andrei@iptel.org>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-18 16:27:38 -07:00
Vlad Yasevich
a3028b8ed1 sctp: set the skb->ip_summed correctly when sending over loopback.
Loopback used to clobber the ip_summed filed which sctp then used
to figure out if it needed to do checksumming or not.  Now that
loopback doesn't do that any more, sctp needs to set the ip_summed
field correctly.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-18 02:48:25 -07:00
Arjan van de Ven
45e9c0de2e warn: Turn the netdev timeout WARN_ON() into a WARN()
this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
so that the device and driver names are inside the warning message.
This helps automated tools like kerneloops.org to collect the data
and do statistics, as well as making it more likely that humans
cut-n-paste the important message as part of a bugreport.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-16 19:39:33 -07:00
Herbert Xu
93821778de udp: Fix rcv socket locking
The previous patch in response to the recursive locking on IPsec
reception is broken as it tries to drop the BH socket lock while in
user context.

This patch fixes it by shrinking the section protected by the
socket lock to sock_queue_rcv_skb only.  The only reason we added
the lock is for the accounting which happens in that function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-15 11:48:46 -07:00
Marcel Holtmann
7c6a329e44 [Bluetooth] Fix regression from using default link policy
To speed up the Simple Pairing connection setup, the support for the
default link policy has been enabled. This is in contrast to settings
the link policy on every connection setup. Using the default link policy
is the preferred way since there is no need to dynamically change it for
every connection.

For backward compatibility reason and to support old userspace the
HCISETLINKPOL ioctl has been switched over to using hci_request() to
issue the HCI command for setting the default link policy instead of
just storing it in the HCI device structure.

However the hci_request() can only be issued when the device is
brought up. If used on a device that is registered, but still down
it will timeout and fail. This is problematic since the command is
put on the TX queue and the Bluetooth core tries to submit it to
hardware that is not ready yet. The timeout for these requests is
10 seconds and this causes a significant regression when setting up
a new device.

The userspace can perfectly handle a failure of the HCISETLINKPOL
ioctl and will re-submit it later, but the 10 seconds delay causes
a problem. So in case hci_request() is called on a device that is
still down, just fail it with ENETDOWN to indicate what happens.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-12 03:11:54 +02:00
Neil Horman
e550dfb0c2 ipv6: Fix OOPS in ip6_dst_lookup_tail().
This fixes kernel bugzilla 11469: "TUN with 1024 neighbours:
ip6_dst_lookup_tail NULL crash"

dst->neighbour is not necessarily hooked up at this point
in the processing path, so blindly dereferencing it is
the wrong thing to do.  This NULL check exists in other
similar paths and this case was just an oversight.

Also fix the completely wrong and confusing indentation
here while we're at it.

Based upon a patch by Evgeniy Polyakov.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09 13:51:35 -07:00
Herbert Xu
225f40055f ipsec: Restore larval states and socket policies in dump
The commit commit 4c563f7669 ("[XFRM]:
Speed up xfrm_policy and xfrm_state walking") inadvertently removed
larval states and socket policies from netlink dumps.  This patch
restores them.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-09 05:23:37 -07:00
David S. Miller
fd9ec7d31f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 2008-09-09 02:11:11 -07:00
Marcel Holtmann
e7c29cb16c [Bluetooth] Reject L2CAP connections on an insecure ACL link
The Security Mode 4 of the Bluetooth 2.1 specification has strict
authentication and encryption requirements. It is the initiators job
to create a secure ACL link. However in case of malicious devices, the
acceptor has to make sure that the ACL is encrypted before allowing
any kind of L2CAP connection. The only exception here is the PSM 1 for
the service discovery protocol, because that is allowed to run on an
insecure ACL link.

Previously it was enough to reject a L2CAP connection during the
connection setup phase, but with Bluetooth 2.1 it is forbidden to
do any L2CAP protocol exchange on an insecure link (except SDP).

The new hci_conn_check_link_mode() function can be used to check the
integrity of an ACL link. This functions also takes care of the cases
where Security Mode 4 is disabled or one of the devices is based on
an older specification.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2008-09-09 07:19:20 +02:00