Commit Graph

195624 Commits

Author SHA1 Message Date
Scott Feldman
8ca9418350 netlink: bug fix: don't overrun skbs on vf_port dump
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer.  Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:42:18 -07:00
Eric Dumazet
50636af715 xt_tee: use skb_dst_drop()
After commit 7fee226a (net: add a noref bit on skb dst), its wrong to
use : dst_release(skb_dst(skb)), since we could decrement a refcount
while skb dst was not refcounted.

We should use skb_dst_drop(skb) instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:41:17 -07:00
Bryan Wu
418bd0d4df netdev/fec: fix ifconfig eth0 down hang issue
BugLink: http://bugs.launchpad.net/bugs/559065

In fec open/close function, we need to use phy_connect and phy_disconnect
operation before we start/stop phy. Otherwise it will cause system hang.

Only call fec_enet_mii_probe() in open function, because the first open
action will cause NULL pointer error.

Signed-off-by: Bryan Wu <bryan.wu@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:40:39 -07:00
Michael Chan
b58ffb41fc cnic: Fix context memory init. on 5709.
We need to zero context memory on 5709 in the function cnic_init_context().
Without this, iscsid restart on 5709 will not work because of stale data.
TX context blocks should not be initialized by cnic_init_context() because
of the special remapping on 5709.

Update version to 2.1.2.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:19 -07:00
Julia Lawall
17d9564003 drivers/net: Eliminate a NULL pointer dereference
At the point of the print, dev is NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
expression E,E1;
identifier f;
statement S1,S2,S3;
@@

if ((E == NULL && ...) || ...)
{
  ... when != if (...) S1 else S2
      when != E = E1
* E->f
  ... when any
  return ...;
}
else S3
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:18 -07:00
Julia Lawall
89dc0be68f drivers/net/hamradio: Eliminate a NULL pointer dereference
At the point of the print, dev is NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
expression E,E1;
identifier f;
statement S1,S2,S3;
@@

if ((E == NULL && ...) || ...)
{
  ... when != if (...) S1 else S2
      when != E = E1
* E->f
  ... when any
  return ...;
}
else S3
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:18 -07:00
Sarveshwar Bandi
84e5b9f75b be2net: Patch removes redundant while statement in loop.
Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:18 -07:00
Herbert Xu
0aa6827151 ipv6: Add GSO support on forwarding path
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.

Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
when they really should be checking gso_size + header instead.

I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message.  Someone
should add that to be 100% correct.

Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:17 -07:00
Eric Dumazet
a47311380e net: fix __neigh_event_send()
commit 7fee226ad2 (net: add a noref bit on skb dst) missed one spot
where an skb is enqueued, with a possibly not refcounted dst entry.

__neigh_event_send() inserts skb into arp_queue, so we must make sure
dst entry is refcounted, or dst entry can be freed by garbage collector
after caller exits from rcu protected section.

Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:16 -07:00
Takuya Yoshikawa
a02c37891a vhost: fix the memory leak which will happen when memory_access_ok fails
We need to free newmem when vhost_set_memory() fails to complete.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-05-27 13:55:17 +03:00
Takuya Yoshikawa
d3553a5249 vhost-net: fix to check the return value of copy_to/from_user() correctly
copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-05-27 13:55:13 +03:00
Takuya Yoshikawa
7ad9c9d270 vhost: fix to check the return value of copy_to/from_user() correctly
copy_to/from_user() returns the number of bytes that could not be copied.

So we need to check if it is not zero, and in that case, we should return
the error number -EFAULT rather than directly return the return value from
copy_to/from_user().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-05-27 13:54:59 +03:00
Krishna Kumar
0f3d9a1746 vhost: Fix host panic if ioctl called with wrong index
Missed a boundary value check in vhost_set_vring. The host panics if
idx == nvqs is used in ioctl commands in vhost_virtqueue_init.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-05-27 12:19:02 +03:00
Eric Dumazet
8a74ad60a5 net: fix lock_sock_bh/unlock_sock_bh
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-27 00:30:53 -07:00
Julia Lawall
a56635a56f net/iucv: Add missing spin_unlock
Add a spin_unlock missing on the error path.  There seems like no reason
why the lock should continue to be held if the kzalloc fail.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* spin_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* spin_unlock(E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 21:09:51 -07:00
Brian Hill
23ecc4bde2 net: ll_temac: fix checksum offload logic
The current checksum offload code does not work and this corrects
that functionality. It also updates the interrupt coallescing
initialization so than there are fewer interrupts and performance
is increased.

Signed-off-by: Brian Hill <brian.hill@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 20:44:30 -07:00
Brian Hill
755fae0ac4 net: ll_temac: fix interrupt bug when interrupt 0 is used
The code is not checking the interrupt for DMA correctly so that an
interrupt number of 0 will cause a false error.

Signed-off-by: Brian Hill <brian.hill@xilinx.com>
Signed-off-by: John Linn <john.linn@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 20:42:18 -07:00
Dan Carpenter
ff937938e7 sctp: dubious bitfields in sctp_transport
Sparse complains because these one-bit bitfields are signed.
  include/net/sctp/structs.h:879:24: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:889:31: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:895:26: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:898:31: error: dubious one-bit signed bitfield
  include/net/sctp/structs.h:901:27: error: dubious one-bit signed bitfield

It doesn't cause a problem in the current code, but it would be better
to clean it up.  This was introduced by c0058a35aa: "sctp: Save some
room in the sctp_transport by using bitfields".

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 00:40:11 -07:00
Dan Carpenter
ed0f160ad6 ipmr: off by one in __ipmr_fill_mroute()
This fixes a smatch warning:
	net/ipv4/ipmr.c +1917 __ipmr_fill_mroute(12) error: buffer overflow
	'(mrt)->vif_table' 32 <= 32

The ipv6 version had the same issue.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 00:38:56 -07:00
Sathya Perla
d938a702e5 be2net: increase POST timeout for EEH recovery
Sometimes BE requires longer time for POST completion after an EEH
reset.  Increasing the timeout value accordingly.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 00:33:43 -07:00
Herbert Xu
ea16f912a6 cls_cgroup: Initialise classid when module is absent
When the cls_cgroup module is not loaded, task_cls_classid will
return an uninitialised classid instead of zero.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 18:53:57 -07:00
David S. Miller
f925b1303e drivers/net/usb/asix.c: Fix pointer cast.
Stephen Rothwell reports the following new warning:

drivers/net/usb/asix.c: In function 'asix_rx_fixup':
drivers/net/usb/asix.c:325: warning: cast from pointer to integer of different size
drivers/net/usb/asix.c:354: warning: cast from pointer to integer of different size

The code just cares about the low alignment bits, so use
an "unsigned long" cast instead of one to "u32".

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:24:03 -07:00
Sarveshwar Bandi
dd131e76e5 be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
Certain firmware commands/operations to upgrade firmware could take several
seconds to complete. The code presently disables bottom half during these
operations which could lead to unpredictable behaviour in certain cases. This
patch now does all firmware upgrade operations asynchronously using a
completion variable.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:16:32 -07:00
J. R. Okajima
563b046710 proc_dointvec: write a single value
The commit 00b7c3395a
"sysctl: refactor integer handling proc code"
modified the behaviour of writing to /proc.
Before the commit, write("1\n") to /proc/sys/kernel/printk succeeded. But
now it returns EINVAL.

This commit supports writing a single value to a multi-valued entry.

Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Reviewed-and-tested-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:10:14 -07:00
Filip Aben
dd7496f217 hso: add support for new products
This patch adds a few new product id's for the hso driver.

Signed-off-by: Filip Aben <f.aben@option.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:09:23 -07:00
Rémi Denis-Courmont
e513480e28 Phonet: fix potential use-after-free in pep_sock_close()
sk_common_release() might destroy our last reference to the socket.
So an extra temporary reference is needed during cleanup.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:08:39 -07:00
David S. Miller
7466a38478 Merge branch 'wimax-2.6.35.y' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2010-05-25 14:05:24 -07:00
David S. Miller
a261af927d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-25 13:15:11 -07:00
Felix Fietkau
a65e4cb402 ath9k: remove VEOL support for ad-hoc
With VEOL, Beacon transmission in ad-hoc does not currently work.
I believe for larger ad-hoc networks, VEOL is too unreliable, as
it can get beacon transmissions stuck during synchronization.
Use SWBA based beacon trasmission similar to AP mode instead.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Benoit Papillault <benoit.papillault@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-25 11:12:54 -04:00
Felix Fietkau
774610e4f2 ath9k: change beacon allocation to prefer the first beacon slot
This fixes IBSS beacon transmissions without VEOL enabled

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-25 11:12:54 -04:00
Randy Dunlap
acfbe96a30 sock.h: fix kernel-doc warning
Fix sock.h kernel-doc warning:
Warning(include/net/sock.h:1438): No description found for parameter 'wq'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 23:54:18 -07:00
Herbert Xu
937eada45f cls_cgroup: Fix build error when built-in
There is a typo in cgroup_cls_state when cls_cgroup is built-in.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 23:53:37 -07:00
Jiri Pirko
f16d3d5748 macvlan: do proper cleanup in macvlan_common_newlink() V2
Fixes possible memory leak.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 18:42:12 -07:00
Sarveshwar Bandi
556ae19110 be2net: Bug fix in init code in probe
PCI function reset needs to invoked after fw init ioctl is issued.

Signed-off-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 18:38:25 -07:00
Yoichi Yuasa
d9b52dc6fd net/dccp: expansion of error code size
Because MIPS's EDQUOT value is 1133(0x46d).
It's larger than u8.

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 18:37:02 -07:00
Vasanthakumar Thiagarajan
ededf1f82a ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
The functionality to keep the device awake until it is done with
the rx of any mcast/bcast frames which are pending on AP should
also be added to the hardwares which support auto sleep feature.
This patch fixes frequent failures in ARP resolution when it is
initiated by the other end. Currently auto sleep is enabled only
for ar9003 in ath9k.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:43 -04:00
Randy Dunlap
a0c9101c05 wireless: fix sta_info.h kernel-doc warnings
Fix sta_info.h kernel-doc warnings:

Warning(net/mac80211/sta_info.h:164): No description found for parameter 'tid_active_rx[STA_TID_NUM]'
Warning(net/mac80211/sta_info.h:164): Excess struct/union/enum/typedef member 'tid_state_rx' description in 'sta_ampdu_mlme'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:43 -04:00
Randy Dunlap
4e8998f09b wireless: fix mac80211.h kernel-doc warnings
Fix kernel-doc warnings in mac80211.h:

Warning(include/net/mac80211.h:838): No description found for parameter 'ap_addr'
Warning(include/net/mac80211.h:1726): No description found for parameter 'get_survey'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
Dan Carpenter
96900c751d iwlwifi: testing the wrong variable in iwl_add_bssid_station()
The intent here is to test that "sta_id_r" is a valid pointer.  We do
this same test later on in the function.

Btw iwl_add_bssid_station() is called from two places and "sta_id_r" is
a valid pointer from both callers.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
Dan Carpenter
7606688afc ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
This is obviously a small picky thing.  The original error handling code
doesn't free the most recent allocations which haven't been added to the
hif_dev->tx.tx_buf list yet.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
Dan Carpenter
690e781c5a ath9k_htc: dereferencing before check in hif_usb_tx_cb()
After c11d8f89d3: "ath9k_htc: Simplify TX URB management" we no longer
assume that tx_buf is a non-null pointer.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
Gertjan van Wingerde
663cb47cc2 rt2x00: Fix rt2800usb TX descriptor writing.
The recent changes to skb handling introduced a bug in the rt2800usb
TX descriptor writing whereby the length of the USB packet wasn't
calculated correctly.
Found via code inspection, as the devices themselves didn't seem to mind.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:42 -04:00
Gertjan van Wingerde
9655a6ec19 rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
(Based on a patch created by Ondrej Zary)

In some circumstances the Ralink devices do not properly go to sleep
or wake up, with timeouts occurring.
Fix this by retrying telling the device that it has to wake up or
sleep.

Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:41 -04:00
John W. Linville
3dc3fc52ea Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea97.

This patch was reported to cause a regression in which connectivity is
lost and cannot be reestablished after a suspend/resume cycle.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:27 -04:00
Tejun Heo
617f3d0d71 wireless: update gfp/slab.h includes
Implicit slab.h inclusion via percpu.h is about to go away.  Make sure
gfp.h or slab.h is included as necessary.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:26 -04:00
Helmut Schaa
52a9bd2a8f rt2x00: don't use to_pci_dev in rt2x00pci_uninitialize
Don't use to_pci_dev in rt2x00pci_uninitialize to get the allocated irq
as it won't work for platform devices (SoC). Instead, use the irq field
that's already used everywhere else.

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:25 -04:00
Bruno Randolf
b5eae9ff5b ath5k: consistently use rx_bufsize for RX DMA
We should use the same buffer size we set up for DMA also in the hardware
descriptor. Previously we used common->rx_bufsize for setting up the DMA
mapping, but used skb_tailroom(skb) for the size we tell to the hardware in the
descriptor itself. The problem is that skb_tailroom(skb) can give us a larger
value than the size we set up for DMA before. This allows the hardware to write
into memory locations not set up for DMA. In practice this should rarely happen
because all packets should be smaller than the maximum 802.11 packet size.

On the tested platform rx_bufsize is 2528, and we allocated an skb of 2559
bytes length (including padding for cache alignment) but sbk_tailroom() was
2592. Just consistently use rx_bufsize for all RX DMA memory sizes.

Also use the return value of the descriptor setup function.

Cc: stable@kernel.org
Signed-off-by: Bruno Randolf <br1@einfach.org>
Reviewed-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:23 -04:00
Baruch Siach
5eb32bd059 fec: add support for PHY interface platform data
The i.MX25 PDK uses RMII to communicate with its PHY. This patch adds
the ability to configure RMII, based on platform data.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Acked-by:  Greg Ungerer <gerg@uclinux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:36:13 -07:00
Herbert Xu
8286274284 tun: Update classid on packet injection
This patch makes tun update its socket classid every time we
inject a packet into the network stack.  This is so that any
updates made by the admin to the process writing packets to
tun is effected.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:14:10 -07:00
Herbert Xu
f845172531 cls_cgroup: Store classid in struct sock
Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:12:34 -07:00