We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.
This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.
This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.
The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.
We define a large value :
#define LISTENING_NULLS_BASE (1U << 29)
So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the slub allocator is used, kmem_cache_create() may merge two or more
kmem_cache's into one but the cache name pointer is not updated and
kmem_cache_name() is no longer guaranteed to return the pointer passed
to the former function. This patch stores the kmalloc'ed pointers in the
corresponding request_sock_ops and timewait_sock_ops structures.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can avoid some useless instructions if !CONFIG_NET_NS
Because of RCU, we use INET_MATCH or INET_TW_MATCH twice for the found
socket, so thats six instructions less per incoming TCP packet.
Yet another tbench speedup :)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds #include <linux/timer.h> in lib80211.h to avoid
these compilation erros.
> In file included from /work/src/wireless-testing/net/wireless/lib80211.c:24:
> /work/src/wireless-testing/include/net/lib80211.h:113: error: field
> 'crypt_deinit_timer' has incomplete type
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_init':
> /work/src/wireless-testing/net/wireless/lib80211.c:83: error: implicit
> declaration of function 'setup_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_free':
> /work/src/wireless-testing/net/wireless/lib80211.c:95: error: implicit
> declaration of function 'del_timer_sync'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_deinit_handler':
> /work/src/wireless-testing/net/wireless/lib80211.c:157: error:
> implicit declaration of function 'add_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_delayed_deinit':
> /work/src/wireless-testing/net/wireless/lib80211.c:182: error:
> implicit declaration of function 'timer_pending'
> make[3]: *** [net/wireless/lib80211.o] Error 1
> make[2]: *** [net/wireless] Error 2
> make[1]: *** [net] Error 2
> make: *** [sub-make] Error 2
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Otherwise, the BUILD_BUG_ON calls in ieee80211_tx_info_clear_status can
fail on some architectures.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
These bits are shared already between ipw2x00 and hostap, and could
probably be shared both more cleanly and with other drivers. This
commit simply relocates the code to lib80211 and adjusts the drivers
appropriately.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Delete kernel-doc struct descriptions for fields that don't exist:
Warning(include/net/mac80211.h:1263): Excess struct/union/enum/typedef member 'conf_ht' description in 'ieee80211_ops'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'addr' description in 'sta_info'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'aid' description in 'sta_info'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
On ARM alignment is done slightly different from other architectures.
struct ieee80211_tx_rate is aligned to word size, even though it only has 3
single-byte members, which triggers the BUILD_BUG_ON in
ieee80211_tx_info_clear_status
This patch marks the struct ieee80211_tx_rate as packed, so that ARM
behaves like the other architectures.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adds an interface to configure the Backward Congestion Notification
(BCN) feature. In a BCN capabale network, congestion notifications
from congested points out in the network can cause the end station
limit the rate of a given traffic flow.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a netlink interface for Data Center Bridging (DCB) to get and set
the enable state of the Priority Flow Control (PFC) feature.
Primarily, this is a way to turn off PFC in the driver while DCB
remains enabled.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds interface for Data Center Bridging (DCB) to query (and set if
supported) the number of traffic classes currently supported by the
device for the two (DCB) features: priority groups (PG) and priority
flow control (PFC).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds to the netlink interface for Data Center Bridging (DCB), allowing
the DCB capabilities supported by a device to be queried.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.
/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.
This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.
listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.
This should reduce hold time of readers, and writers concurrency.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first argument to csum_partial is const void *
casts to char/u8 * are not necessary
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before ieee80211_notify_mac() was added, it was presented with the
use case of using it to tell mac80211 that the association may
have been lost because the firmware crashed/reset.
Since then, it has also been used by iwlwifi to (slightly) speed
up re-association after resume, a workaround around the fact that
mac80211 has no suspend/resume handling yet. It is also not used
by any other drivers, so clearly it cannot be necessary for "good
enough" suspend/resume.
Unfortunately, the callback suffers from a severe problem: It only
works for station mode. If suspend/resume happens while in IBSS or
any other mode (but station), then the callback is pointless.
Recently, it has created a number of locking issues, first because
it required rtnl locking rather than RCU due to calling sleeping
functions within the critical section, and now because it's called
by iwlwifi from the mac80211 workqueue that may not use the rtnl
because it is flushed under rtnl.
(cf. http://bugzilla.kernel.org/show_bug.cgi?id=12046)
I think, therefore, that we should take a step back, remove it
entirely for now and add the small feature it provided properly.
For suspend and resume we will need to introduce new hooks, and for
the case where the firmware was reset the driver will probably
simply just pretend it has done a suspend/resume cycle to get
mac80211 to reprogram the hardware completely, not just try to
connect to the current AP again in station mode. When doing so, we
will need to take into account locking issues and possibly defer
to schedule_work from within mac80211 for the resume operation,
while the suspend operation must be done directly.
Proper suspend/resume should also not necessarily try to reconnect
to the current AP, the time spent in suspend may have been short
enough to not be disconnected from the AP, mac80211 will detect
that the AP went out of range quickly if it did, and if the
association is lost then the AP will disassoc as soon as a data
frame is sent. We might also take into account WWOL then, and
have mac80211 program the hardware into such a mode where it is
available and requested.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
net/netfilter/nfnetlink_log.c:537:1: warning: symbol 'nfulnl_log_packet' was not declared. Should it be static?
Including the proper header also revealed an incorrect prototype.
Signed-off-by: Patrick McHardy <kaber@trash.net>
As for now, the creation and update of conntracks via ctnetlink do not
propagate an event to userspace. This can result in inconsistent situations
if several userspace processes modify the connection tracking table by means
of ctnetlink at the same time. Specifically, using the conntrack command
line tool and conntrackd at the same time can trigger unconsistencies.
This patch also modifies the event cache infrastructure to pass the
process PID and the ECHO flag to nfnetlink_send() to report back
to userspace if the process that triggered the change needs so.
Based on a suggestion from Patrick McHardy.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds module loading for helpers via ctnetlink.
* Creation path: We support explicit and implicit helper assignation. For
the explicit case, we try to load the module. If the module is correctly
loaded and the helper is present, we return EAGAIN to re-start the
creation. Otherwise, we return EOPNOTSUPP.
* Update path: release the spin lock, load the module and check. If it is
present, then return EAGAIN to re-start the update.
This patch provides a refactorized function to lookup-and-set the
connection tracking helper. The function removes the exported symbol
__nf_ct_helper_find as it has not clients anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds the macro MODULE_ALIAS_NFCT_HELPER that defines a
way to provide generic and persistent aliases for the connection
tracking helpers.
This next patch requires this patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Simply delete ops from list and let list debugging do the job.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As found in the past (commit f1dd9c379c
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.
We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.
for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0
As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)
Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.
"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.
This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.
Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.
__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)
Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a straightforward patch, using hlist_nulls infrastructure.
RCUification already done on UDP two weeks ago.
Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.
Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
fix this warning:
net/bluetooth/af_bluetooth.c:60: warning: ‘bt_key_strings’ defined but not used
net/bluetooth/af_bluetooth.c:71: warning: ‘bt_slock_key_strings’ defined but not used
this is a lockdep macro problem in the !LOCKDEP case.
We cannot convert it to an inline because the macro works on multiple types,
but we can mark the parameter used.
[ also clean up a misaligned tab in sock_lock_init_class_and_name() ]
[ also remove #ifdefs from around af_family_clock_key strings - which
were certainly added to get rid of the ugly build warnings. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.
The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The urg_ptr field is not used anywhere and is merely confusing.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.
Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().
Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: netdev@vger.kernel.org
Signed-off-by: James Morris <jmorris@namei.org>
Every user is under CONFIG_NET_DMA already, so ifdef field as well.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using read_pnet() and write_pnet() in neighbour code ease the reading
of code.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can shrink size of "struct inet_bind_bucket" by 50%, using
read_pnet() and write_pnet()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces two helpers that deal with reading and writing
struct net pointers in various network structures.
Their implementation depends on CONFIG_NET_NS
For symmetry, both functions work with "struct net **pnet".
Their usage should reduce the number of #ifdef CONFIG_NET_NS,
without adding many helpers for each network structure
that hold a "struct net *pointer"
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
->pde isn't actually needed, since name is stashed in ->id.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensure regulatory converstion macros safely accept
multiple arguments and make REG_RULE() use them.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new attribute, NL80211_ATTR_WIPHY_TXQ_PARAMS, that can be used with
NL80211_CMD_SET_WIPHY for userspace (e.g., hostapd) to set TX queue
parameters (txop, cwmin, cwmax, aifs).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new attribute, NL80211_ATTR_BSS_BASIC_RATES, that can be used with
NL80211_CMD_SET_BSS for userspace (e.g., hostapd) to set which rates are
in the basic rate set.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds a helper function that, given a bitmap of basic
rates and a bitrate returns the response rate for this rate.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Send a notification to the driver on succesful
reception of an ADDBA response, add IEEE80211_AMPDU_TX_RESUME
for this purpose.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Remove the SSID from the driver API since now there is no
driver that requires knowing the SSID and I think it's
unlikely that any hardware design that does require the
SSID will play well with mac80211.
This also removes support for setting the SSID in master
mode which will require a patch to hostapd to not try.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Previously I assumed that the receive queues of candidates don't
change during the GC. This is only half true, nothing can be received
from the queues (see comment in unix_gc()), but buffers could be added
through the other half of the socket pair, which may still have file
descriptors referring to it.
This can result in inc_inflight_move_tail() erronously increasing the
"inflight" counter for a unix socket for which dec_inflight() wasn't
previously called. This in turn can trigger the "BUG_ON(total_refs <
inflight_refs)" in a later garbage collection run.
Fix this by only manipulating the "inflight" counter for sockets which
are candidates themselves. Duplicating the file references in
unix_attach_fds() is also needed to prevent a socket becoming a
candidate for GC while the skb that contains it is not yet queued.
Reported-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__scm_destroy() walks the list of file descriptors in the scm_fp_list
pointed to by the scm_cookie argument.
Those, in turn, can close sockets and invoke __scm_destroy() again.
There is nothing which limits how deeply this can occur.
The idea for how to fix this is from Linus. Basically, we do all of
the fput()s at the top level by collecting all of the scm_fp_list
objects hit by an fput(). Inside of the initial __scm_destroy() we
keep running the list until it is empty.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds better IPv6 failover support for bonding devices,
especially when in active-backup mode and there are only IPv6 addresses
configured, as reported by Alex Sidorenko.
- Creates a new file, net/drivers/bonding/bond_ipv6.c, for the
IPv6-specific routines. Both regular bonds and VLANs over bonds
are supported.
- Adds a new tunable, num_unsol_na, to limit the number of unsolicited
IPv6 Neighbor Advertisements that are sent on a failover event.
Default is 1.
- Creates two new IPv6 neighbor discovery functions:
ndisc_build_skb()
ndisc_send_skb()
These were required to support VLANs since we have to be able to
add the VLAN id to the skb since ndisc_send_na() and friends
shouldn't be asked to do this. These two routines are basically
__ndisc_send() split into two pieces, in a slightly different order.
- Updates Documentation/networking/bonding.txt and bumps the rev of bond
support to 3.4.0.
On failover, this new code will generate one packet:
- An unsolicited IPv6 Neighbor Advertisement, which helps the switch
learn that the address has moved to the new slave.
Testing has shown that sending just the NA results in pretty good
behavior when in active-back mode, I saw no lost ping packets for example.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
A packet dequeued and stored as gso_skb in qdisc_peek_dequeued() should
be seen as part of the queue for sch->q.qlen queries until it's really
dequeued with qdisc_dequeue_peeked(), so qlen needs additional updating
in these functions. (Updating qstats.backlog shouldn't matter here.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies xt_NFLOG to suppress the call to nf_log_packet()
function. The call of this wrapper in xt_NFLOG was causing NFLOG to
use the first initialized module. Thus, if ipt_ULOG is loaded before
nfnetlink_log all NFLOG rules are treated as plain LOG rules.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove the 'supports_ipv6' scheduler flag since all schedulers now
support IPv6.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One parameter wasn't described and one I forgot to update when
renaming it; also update TBDs in sta_info.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The code needs to be split out and cleaned up, so as a
first step remove the capability, to add it back in a
subsequent patch as a separate function. Also remove the
publically facing return value of the function and the
wiphy argument. A number of internal functions go from
being generic helpers to just being used for alpha2
setting.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The regdom struct is given to the core, so it might as well
free it in error conditions.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Wireless HW without any dedicated queues for aggregation
do not need the ampdu_queues mechanism present right now
in mac80211. Since mac80211 is still incomplete wrt TX MQ
changes, do not allow aggregation sessions for drivers that
set ampdu_queues.
This is only an interim hack until Intel fixes the requeue issue.
Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: Luis Rodriguez <Luis.Rodriguez@Atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It is unnecessary and of questionable value. Also remove
is_empty_ssid, as it is also unnecessary.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This function requires an internal lock to be held, so it cannot
be published to other modules in the kernel.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Correct a handful of errors found while reading the mac80211 book.
Signed-off-by: Bob Copeland <me@bobcopeland.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The two new commands are NL80211_CMD_GET_MESH_PARAMS and
NL80211_CMD_SET_MESH_PARAMS. There is a new attribute enum,
NL80211_ATTR_MESH_PARAMS, which enumerates the various mesh configuration
parameters.
Moved struct mesh_config from mac80211/ieee80211_i.h to net/cfg80211.h.
nl80211_get_mesh_params and nl80211_set_mesh_params unpack the netlink messages
and ask the driver to get or set the configuration. This is done via two new
function stubs, get_mesh_params and set_mesh_params, in struct cfg80211_ops.
Signed-off-by: Colin McCabe <colin@cozybit.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
"Clearing" the rate control algorithm is pointless, none of
the algorithms actually uses this operation and it's not even
invoked properly for all channel switching. Also, there's no
need to since rate control algorithms work per station.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
So after the previous changes we were still unhappy with how
convoluted the API is and decided to make things simpler for
everybody. This completely changes the rate control API, now
taking into account 802.11n with MCS rates and more control,
most drivers don't support that though.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The HT handling has the following deficiencies, which I've
(partially) fixed:
* it always uses the AP info even if there is no AP,
hence has no chance of working as an AP
* it pretends to be HW config, but really is per-BSS
* channel sanity checking is left to the drivers
* it generally lets the driver control too much
HT enabling is still wrong with this patch if you have more than
one virtual STA mode interface, but that never happens currently.
Once WDS, IBSS or AP/VLAN gets HT capabilities, it will also be
wrong, see the comment in ieee80211_enable_ht().
Additionally, this fixes a number of bugs:
* mac80211: ieee80211_set_disassoc doesn't notify the driver any
more since the refactoring
* iwl-agn-rs: always uses the HT capabilities from the wrong stuff
mac80211 gives it rather than the actual peer STA
* ath9k: a number of bugs resulting from the broken HT API
I'm not entirely happy with putting the HT capabilities into
struct ieee80211_sta as restricted to our own HT TX capabilities,
but I see no cleaner solution for now.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move bss_conf into the vif struct so that drivers can
access it during ->tx without having to store it in
the private data or similar. No driver updates because
this is only for when they want to start using it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Instead of having a separate callback, use the HW config callback
with a new flag to change retry limits.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This makes mac80211 notify the driver which configuration
actually changed, e.g. channel etc.
No driver changes, this is just plumbing, driver authors are
expected to act on this if they want to.
Also remove the HW CONFIG debug printk, it's incorrect, often
we configure something else.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch cleans up a number of things:
* the unusable definition of the HT capabilities/HT information
information elements
* variable names that are hard to understand
* mac80211: move ieee80211_handle_ht to ht.c and remove the unused
enable_ht parameter
* mac80211: fix bug with MCS rate 32 in ieee80211_handle_ht
* mac80211: fix bug with casting the result of ieee80211_bss_get_ie
to an information element _contents_ rather than the
whole element, add size checking (another out-of-bounds
access bug fixed!)
* mac80211: remove some unused return values in favour of BUG_ON
checking
* a few minor other things
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 handle short slot requests from the AP
properly. Also warn about uses of IEEE80211_CONF_SHORT_SLOT_TIME
and optimise out the code since it cannot ever be hit anyway.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The antenna gain isn't exactly configurable, despite the belief of
some unnamed individual who thinks that the EEPROM might influence
it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This isn't used by anyone, if we ever need it we can add
it back, until then it's useless.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds qdisc_peek_dequeued() wrapper to emulate peek method
with qdisc->dequeue() and storing "peeked" skb in qdisc->gso_skb until
dequeuing. This is mainly for compatibility reasons not to break some
strange configs because peeking is expected for non-work-conserving
parent qdiscs to query work-conserving child qdiscs.
This implementation requires using qdisc_dequeue_peeked() wrapper
instead of directly calling qdisc->dequeue() for all qdiscs ever
querried with qdisc->ops->peek() or qdisc_peek_dequeued().
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Patrick McHardy <kaber@trash.net>
Just as a demonstration how easy adding a peek operation to the
work-conserving qdiscs actually is. It doesn't need to keep or change
any internal state in many cases thanks to the guarantee that the
packet will either be dequeued or, if another packet arrives, the
upper qdisc will immediately ->peek again to reevaluate the state.
(This is only slightly modified Patrick's patch.)
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Qdisc_ops peek() method in order to replace requeuing.
Based on ideas and patches of Herbert Xu, Patrick McHardy and
David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netns ops which are registered with register_pernet_gen_device() are
shutdown strictly before those which are registered with
register_pernet_subsys(). Sometimes this leads to opposite (read: buggy)
shutdown ordering between two modules.
Add register_pernet_gen_subsys()/unregister_pernet_gen_subsys() for modules
which aren't elite enough for entry in struct net, and which can't use
register_pernet_gen_device(). PPTP conntracking module is such one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove excess kernel-doc function parameters from networking header
& driver files:
Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Corey Minyard found a race added in commit 271b72c7fa
(udp: RCU handling for Unicast packets.)
"If the socket is moved from one list to another list in-between the
time the hash is calculated and the next field is accessed, and the
socket has moved to the end of the new list, the traversal will not
complete properly on the list it should have, since the socket will
be on the end of the new list and there's not a way to tell it's on a
new list and restart the list traversal. I think that this can be
solved by pre-fetching the "next" field (with proper barriers) before
checking the hash."
This patch corrects this problem, introducing a new
sk_for_each_rcu_safenext() macro.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Goals are :
1) Optimizing handling of incoming Unicast UDP frames, so that no memory
writes should happen in the fast path.
Note: Multicasts and broadcasts still will need to take a lock,
because doing a full lockless lookup in this case is difficult.
2) No expensive operations in the socket bind/unhash phases :
- No expensive synchronize_rcu() calls.
- No added rcu_head in socket structure, increasing memory needs,
but more important, forcing us to use call_rcu() calls,
that have the bad property of making sockets structure cold.
(rcu grace period between socket freeing and its potential reuse
make this socket being cold in CPU cache).
David did a previous patch using call_rcu() and noticed a 20%
impact on TCP connection rates.
Quoting Cristopher Lameter :
"Right. That results in cacheline cooldown. You'd want to recycle
the object as they are cache hot on a per cpu basis. That is screwed
up by the delayed regular rcu processing. We have seen multiple
regressions due to cacheline cooldown.
The only choice in cacheline hot sensitive areas is to deal with the
complexity that comes with SLAB_DESTROY_BY_RCU or give up on RCU."
- Because udp sockets are allocated from dedicated kmem_cache,
use of SLAB_DESTROY_BY_RCU can help here.
Theory of operation :
---------------------
As the lookup is lockfree (using rcu_read_lock()/rcu_read_unlock()),
special attention must be taken by readers and writers.
Use of SLAB_DESTROY_BY_RCU is tricky too, because a socket can be freed,
reused, inserted in a different chain or in worst case in the same chain
while readers could do lookups in the same time.
In order to avoid loops, a reader must check each socket found in a chain
really belongs to the chain the reader was traversing. If it finds a
mismatch, lookup must start again at the begining. This *restart* loop
is the reason we had to use rdlock for the multicast case, because
we dont want to send same message several times to the same socket.
We use RCU only for fast path.
Thus, /proc/net/udp still takes spinlocks.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP sockets are hashed in a 128 slots hash table.
This hash table is protected by *one* rwlock.
This rwlock is readlocked each time an incoming UDP message is handled.
This rwlock is writelocked each time a socket must be inserted in
hash table (bind time), or deleted from this table (close time)
This is not scalable on SMP machines :
1) Even in read mode, lock() and unlock() are atomic operations and
must dirty a contended cache line, shared by all cpus.
2) A writer might be starved if many readers are 'in flight'. This can
happen on a machine with some NIC receiving many UDP messages. User
process can be delayed a long time at socket creation/dismantle time.
This patch prepares RCU migration, by introducing 'struct udp_table
and struct udp_hslot', and using one spinlock per chain, to reduce
contention on central rwlock.
Introducing one spinlock per chain reduces latencies, for port
randomization on heavily loaded UDP servers. This also speedup
bindings to specific ports.
udp_lib_unhash() was uninlined, becoming to big.
Some cleanups were done to ease review of following patch
(RCUification of UDP Unicast lookups)
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a patch to provide on demand route cache rebuilding. Currently, our
route cache is rebulid periodically regardless of need. This introduced
unneeded periodic latency. This patch offers a better approach. Using code
provided by Eric Dumazet, we compute the standard deviation of the average hash
bucket chain length while running rt_check_expire. Should any given chain
length grow to larger that average plus 4 standard deviations, we trigger an
emergency hash table rebuild for that net namespace. This allows for the common
case in which chains are well behaved and do not grow unevenly to not incur any
latency at all, while those systems (which may be being maliciously attacked),
only rebuild when the attack is detected. This patch take 2 other factors into
account:
1) chains with multiple entries that differ by attributes that do not affect the
hash value are only counted once, so as not to unduly bias system to rebuilding
if features like QOS are heavily used
2) if rebuilding crosses a certain threshold (which is adjustable via the added
sysctl in this patch), route caching is disabled entirely for that net
namespace, since constant rebuilding is less efficient that no caching at all
Tested successfully by me.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix mac80211.h kernel-doc: it had some extra parameters that were
no longer valid and incorrect format for a return value in 2 places.
Warning(lin2628-rc2//include/net/mac80211.h:1487): Excess function parameter or struct member 'control' description in 'ieee80211_beacon_get'
Warning(lin2628-rc2//include/net/mac80211.h:1596): Excess function parameter or struct member 'control' description in 'ieee80211_get_buffered_bc'
Warning(lin2628-rc2//include/net/mac80211.h:1632): Excess function parameter or struct member 'rc4key' description in 'ieee80211_get_tkip_key'
Warning(lin2628-rc2//include/net/mac80211.h:1735): Excess function parameter or struct member 'return' description in 'ieee80211_start_tx_ba_session'
Warning(lin2628-rc2//include/net/mac80211.h:1775): Excess function parameter or struct member 'return' description in 'ieee80211_stop_tx_ba_session'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This fixes an OOPS in hard_header if a Phonet address is assigned to a
non-Phonet network interface.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits)
tcp: Restore ordering of TCP options for the sake of inter-operability
net: Fix disjunct computation of netdev features
sctp: Fix to handle SHUTDOWN in SHUTDOWN_RECEIVED state
sctp: Fix to handle SHUTDOWN in SHUTDOWN-PENDING state
sctp: Add check for the TSN field of the SHUTDOWN chunk
sctp: Drop ICMP packet too big message with MTU larger than current PMTU
p54: enable 2.4/5GHz spectrum by eeprom bits.
orinoco: reduce stack usage in firmware download path
ath5k: fix suspend-related oops on rmmod
[netdrvr] fec_mpc52xx: Implement polling, to make netconsole work.
qlge: Fix MSI/legacy single interrupt bug.
smc911x: Make the driver safer on SMP
smc911x: Add IRQ polarity configuration
smc911x: Allow Kconfig dependency on ARM
sis190: add identifier for Atheros AR8021 PHY
8139x: reduce message severity on driver overlap
igb: add IGB_DCA instead of selecting INTEL_IOATDMA
igb: fix tx data corruption with transition to L0s on 82575
ehea: Fix memory hotplug support
netdev: DM9000: remove BLACKFIN hacking in DM9000 netdev driver
...
Once an endpoint has reached the SHUTDOWN-RECEIVED state,
it MUST NOT send a SHUTDOWN in response to a ULP request.
The Cumulative TSN Ack of the received SHUTDOWN chunk
MUST be processed.
This patch fix to process Cumulative TSN Ack of the received
SHUTDOWN chunk in SHUTDOWN_RECEIVED state.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several sparse warnings were introduced by patches accepted during the merge
window which weren't caught. This patch fixes those warnings.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This patch implements the RDMA transport provider for 9P. It allows
mounts to be performed over iWARP and IB capable network interfaces.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Latchesar Ionkov <lionkov@lanl.gov>
Fixes build problem with 9p when building with debug disabled.
Also contains some fixes for warnings which pop up when
CONFIG_NET_9P_DEBUG is disabled.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (26 commits)
9p: add more conservative locking
9p: fix oops in protocol stat parsing error path.
9p: fix device file handling
9p: Improve debug support
9p: eliminate depricated conv functions
9p: rework client code to use new protocol support functions
9p: remove unnecessary tag field from p9_req_t structure
9p: remove 9p fcall debug prints
9p: add new protocol support code
9p: encapsulate version function
9p: move dirread to fs layer
9p: adjust 9p vfs write operation
9p: move readn meta-function from client to fs layer
9p: consolidate read/write functions
9p: drop broken unused error path from p9_conn_create()
9p: make rpc code common and rework flush code
9p: use the rcall structure passed in the request in trans_fd read_work
9p: apply common request code to trans_fd
9p: apply common tagpool handling to trans_fd
9p: move request management to client code
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
netfilter: replace old NF_ARP calls with NFPROTO_ARP
netfilter: fix compilation error with NAT=n
netfilter: xt_recent: use proc_create_data()
netfilter: snmp nat leaks memory in case of failure
netfilter: xt_iprange: fix range inversion match
netfilter: netns: use NFPROTO_NUMPROTO instead of NUMPROTO for tables array
netfilter: ctnetlink: remove obsolete NAT dependency from Kconfig
pkt_sched: sch_generic: Fix oops in sch_teql
dccp: Port redirection support for DCCP
tcp: Fix IPv6 fallout from 'Port redirection support for TCP'
netdev: change name dropping error codes
ipvs: Update CONFIG_IP_VS_IPV6 description and help text
The netfilter families have been decoupled from regular protocol families.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new debug support lacks some of the information that the previous fcprint
code provided -- this patch focuses on better presentation of debug data along
with more helpful debug along error paths.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Remove depricated conv functions which have been replaced with new
protocol routines.
This patch also reworks the one instance of the file-system code which
directly calls conversion routines (to accomplish unpacking dirreads).
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Now that the new protocol functions are in place, this patch switches
the client code to using the new support code.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
One of the current debug options allows users to get a verbose dump of fcalls.
This isn't really necessary as correctly parsed protocol frames can be printed
as part of the code in the client functions. The consolidated printfcalls
structure would require new entries to be added for every extension. This
patch removes the debug print methods and their use.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This adds a new protocol processing support code based on Anthony Liguori's
9p library code. This code performs protocol marshalling/unmarshalling using
printf like strings to represent protocol elements. It is my intent to use
them to replace the current functions in conv.c as well as the
p9_create_* functions.
This should make the client implementation much more clear, and also make it
much easier to add new protocol extensions by limiting the number of places
in which changes need to be made.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Currently reading a directory is implemented in the client code.
This function is not actually a wire operation, but a meta operation
which calls read operations and processes the results.
This patch moves this functionality to the fs layer and calls component
wire operations instead of constructing their packets. This provides a
cleaner separation and will help when we reorganize the client functions
and protocol processing methods.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
There are a couple of methods in the client code which aren't actually
wire operations. To keep things organized cleaner, these operations are
being moved to the fs layer.
This patch moves the readn meta-function (which executes multiple wire
reads until a buffer is full) to the fs layer.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Currently there are two separate versions of read and write. One for
dealing with user buffers and the other for dealing with kernel buffers.
There is a tremendous amount of code duplication in the otherwise
identical versions of these functions. This patch adds an additional
user buffer parameter to read and write and conditionalizes handling of
the buffer on whether the kernel buffer or the user buffer is populated.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This code moves the rpc function to the common client base,
reorganizes the flush code to be more simple and stable, and
makes the necessary adjustments to the underlying transports
to adapt to the new structure.
This reduces the overall amount of code duplication between the
transports and should make adding new transports more straightforward.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
The virtio transport uses a simplified request management system
that I want to use for all transports. This patch adapts and moves the
exisiting code for managing requests to the client common code.
Later patches will apply these mechanisms to the other transports.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Right now there is a transport module structure which provides per-transport
type functions and data and a transport structure which contains per-instance
public data as well as function pointers to instance specific functions.
This patch moves public transport visible instance data to the client
structure (which in some cases had duplicate data) and consolidates the
functions into the transport module structure.
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
name and nlen parameters passed to ->strategy hook are unused, remove
them. In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net> [ networking bits ]
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warnings in mac80211.h.
Fields need real explanations added to them.
Warning(lin2627-g3-kdocfixes//include/net/mac80211.h:659): No description found for parameter 'icv_len'
Warning(lin2627-g3-kdocfixes//include/net/mac80211.h:659): No description found for parameter 'iv_len'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch removes the module dependency between ctnetlink and
nf_nat by means of an indirect call that is initialized when
nf_nat is loaded. Now, nf_conntrack_netlink only requires
nf_conntrack and nfnetlink.
This patch puts nfnetlink_parse_nat_setup_hook into the
nf_conntrack_core to avoid dependencies between ctnetlink,
nf_conntrack_ipv4 and nf_conntrack_ipv6.
This patch also introduces the function ctnetlink_change_nat
that is only invoked from the creation path. Actually, the
nat handling cannot be invoked from the update path since
this is not allowed. By introducing this function, we remove
the useless nat handling in the update path and we avoid
deadlock-prone code.
This patch also adds the required EAGAIN logic for nfnetlink.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dummy version of 'nf_conntrack_event_cache()' (used when the
NF_CONNTRACK_EVENTS config option is not enabled) had not been updated
when the calling convention changed.
This was introduced by commit a71996fccc
("netfilter: netns nf_conntrack: pass conntrack to
nf_conntrack_event_cache() not skb")
Tssk.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add the necessary NetLabel support for the new CIPSO mapping,
CIPSO_V4_MAP_LOCAL, which allows full LSM label/context support.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
This patch accomplishes three minor tasks: add a new tag type for local
labeling, rename the CIPSO_V4_MAP_STD define to CIPSO_V4_MAP_TRANS and
replace some of the CIPSO "magic numbers" with constants from the header
file. The first change allows CIPSO to support full LSM labels/contexts,
not just MLS attributes. The second change brings the mapping names inline
with what userspace is using, compatibility is preserved since we don't
actually change the value. The last change is to aid readability and help
prevent mistakes.
Signed-off-by: Paul Moore <paul.moore@hp.com>
This patch provides support for including the LSM's secid in addition to
the LSM's MLS information in the NetLabel security attributes structure.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Previous work enabled the use of address based NetLabel selectors, which while
highly useful, brought the potential for additional per-packet overhead when
used. This patch attempts to solve that by applying NetLabel socket labels
when sockets are connect()'d. This should alleviate the per-packet NetLabel
labeling for all connected sockets (yes, it even works for connected DGRAM
sockets).
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
This patch builds upon the new NetLabel address selector functionality by
providing the NetLabel KAPI and CIPSO engine support needed to enable the
new packet-based labeling. The only new addition to the NetLabel KAPI at
this point is shown below:
* int netlbl_skbuff_setattr(skb, family, secattr)
... and is designed to be called from a Netfilter hook after the packet's
IP header has been populated such as in the FORWARD or LOCAL_OUT hooks.
This patch also provides the necessary SELinux hooks to support this new
functionality. Smack support is not currently included due to uncertainty
regarding the permissions needed to expand the Smack network access controls.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
This patch extends the NetLabel traffic labeling capabilities to individual
packets based not only on the LSM domain but the by the destination address
as well. The changes here only affect the core NetLabel infrastructre,
changes to the NetLabel KAPI and individial protocol engines are also
required but are split out into a different patch to ease review.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
NetLabel has always had a list of backpointers in the CIPSO DOI definition
structure which pointed to the NetLabel LSM domain mapping structures which
referenced the CIPSO DOI struct. The rationale for this was that when an
administrator removed a CIPSO DOI from the system all of the associated
NetLabel LSM domain mappings should be removed as well; a list of
backpointers made this a simple operation.
Unfortunately, while the backpointers did make the removal easier they were
a bit of a mess from an implementation point of view which was making
further development difficult. Since the removal of a CIPSO DOI is a
realtively rare event it seems to make sense to remove this backpointer
list as the optimization was hurting us more then it was helping. However,
we still need to be able to track when a CIPSO DOI definition is being used
so replace the backpointer list with a reference count. In order to
preserve the current functionality of removing the associated LSM domain
mappings when a CIPSO DOI is removed we walk the LSM domain mapping table,
removing the relevant entries.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
At some point I think I messed up and dropped the calls to netlbl_skbuff_err()
which are necessary for CIPSO to send error notifications to remote systems.
This patch re-introduces the error handling calls into the SELinux code.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
After some discussions with the Smack folks, well just Casey, I now have a
better idea of what Smack wants out of NetLabel in the future so I think it
is now safe to do some API "pruning". If another LSM comes along that
needs this functionality we can always add it back in, but I don't see any
LSMs on the horizon which might make use of these functions.
Thanks to Rami Rosen who suggested removing netlbl_cfg_cipsov4_del() back
in February 2008.
Signed-off-by: Paul Moore <paul.moore@hp.com>
Reviewed-by: James Morris <jmorris@namei.org>
This patch add missing braces of today's net-next-2.6:
include/net/netfilter/nf_conntrack_ecache.h
Signed-off-by: Guo-Fu Tseng <cooldavid@cooldavid.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
consistent. Previously we included the external IP headers on the
way out but not when the packet is inbound.
The new scheme is to count payload only in both directions. For
IPIP and SIT this simply means the exclusion of the external IP
header. For GRE this means that we exclude the GRE header as
well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the Trailer switch tagging format. This is
another tagging that doesn't explicitly mark tagged packets with a
distinct ethertype, so that we need to add a similar hack in the
receive path as for the Original DSA tagging format.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most of the DSA switches currently in the field do not support the
Ethertype DSA tagging format that one of the previous patches added
support for, but only the original DSA tagging format.
The original DSA tagging format carries the same information as the
Ethertype DSA tagging format, but with the difference that it does not
have an ethertype field. In other words, when receiving a packet that
is tagged with an original DSA tag, there is no way of telling in
eth_type_trans() that this packet is in fact a DSA-tagged packet.
This patch adds a hook into eth_type_trans() which is only compiled in
if support for a switch chip that doesn't support Ethertype DSA is
selected, and which checks whether there is a DSA switch driver
instance attached to this network device which uses the old tag format.
If so, it sets the protocol field to ETH_P_DSA without looking at the
packet, so that the packet ends up in the right place.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Distributed Switch Architecture is a protocol for managing hardware
switch chips. It consists of a set of MII management registers and
commands to configure the switch, and an ethernet header format to
signal which of the ports of the switch a packet was received from
or is intended to be sent to.
The switches that this driver supports are typically embedded in
access points and routers, and a typical setup with a DSA switch
looks something like this:
+-----------+ +-----------+
| | RGMII | |
| +-------+ +------ 1000baseT MDI ("WAN")
| | | 6-port +------ 1000baseT MDI ("LAN1")
| CPU | | ethernet +------ 1000baseT MDI ("LAN2")
| |MIImgmt| switch +------ 1000baseT MDI ("LAN3")
| +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4")
| | | |
+-----------+ +-----------+
The switch driver presents each port on the switch as a separate
network interface to Linux, polls the switch to maintain software
link state of those ports, forwards MII management interface
accesses to those network interfaces (e.g. as done by ethtool) to
the switch, and exposes the switch's hardware statistics counters
via the appropriate Linux kernel interfaces.
This initial patch supports the MII management interface register
layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
supports the "Ethertype DSA" packet tagging format.
(There is no officially registered ethertype for the Ethertype DSA
packet format, so we just grab a random one. The ethertype to use
is programmed into the switch, and the switch driver uses the value
of ETH_P_EDSA for this, so this define can be changed at any time in
the future if the one we chose is allocated to another protocol or
if Ethertype DSA gets its own officially registered ethertype, and
everything will continue to work.)
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gabs array in the sctp_tsnmap structure is only used
in one place, sctp_make_sack(). As such, carrying the
array around in the sctp_tsnmap and thus directly in
the sctp_association is rather pointless since most
of the time it's just taking up space. Now, let
sctp_make_sack create and populate it and then throw
it away when it's done.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tsn map currently use is 4K large and is stuck inside
the sctp_association structure making memory references REALLY
expensive. What we really need is at most 4K worth of bits
so the biggest map we would have is 512 bytes. Also, the
map is only really usefull when we have gaps to store and
report. As such, starting with minimal map of say 32 TSNs (bits)
should be enough for normal low-loss operations. We can grow
the map by some multiple of 32 along with some extra room any
time we receive the TSN which would put us outside of the map
boundry. As we close gaps, we can shift the map to rebase
it on the latest TSN we've seen. This saves 4088 bytes per
association just in the map alone along savings from the now
unnecessary structure members.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I noticed sysctl_local_port_range[] and its associated seqlock
sysctl_local_port_range_lock were on separate cache lines.
Moreover, sysctl_local_port_range[] was close to unrelated
variables, highly modified, leading to cache misses.
Moving these two variables in a structure can help data
locality and moving this structure to read_mostly section
helps sharing of this data among cpus.
Cleanup of extern declarations (moved in include file where
they belong), and use of inet_get_local_port_range()
accessor instead of direct access to ports values.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The iptables tproxy core is a module that contains the common routines used by
various tproxy related modules (TPROXY target and socket match)
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Netfilter connection tracking requires all IPv4 packets to be defragmented.
Both the socket match and the TPROXY target depend on this functionality, so
this patch separates the Netfilter IPv4 defrag hooks into a separate module.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Same story as with iptable_filter, iptables_raw tables.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Note, sysctl table is always duplicated, this is simpler and less
special-cased.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Heh, last minute proof-reading of this patch made me think,
that this is actually unneeded, simply because "ct" pointers will be
different for different conntracks in different netns, just like they
are different in one netns.
Not so sure anymore.
[Patrick: pointers will be different, flushing can only be done while
inactive though and thus it needs to be per netns]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This is cleaner, we already know conntrack to which event is relevant.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Again, it's deducible from skb, but we're going to use it for
nf_conntrack_checksum and statistics, so just pass it from upper layer.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It's deducible from skb->dev or skb->dst->dev, but we know netns at
the moment of call, so pass it down and use for finding and creating
conntracks.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
What is confirmed connection in one netns can very well be unconfirmed
in another one.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Make per-netns a) expectation hash and b) expectations count.
Expectations always belongs to netns to which it's master conntrack belong.
This is natural and doesn't bloat expectation.
Proc files and leaf users are stubbed to init_net, this is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* make per-netns conntrack hash
Other solution is to add ->ct_net pointer to tuplehashes and still has one
hash, I tried that it's ugly and requires more code deep down in protocol
modules et al.
* propagate netns pointer to where needed, e. g. to conntrack iterators.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Sysctls and proc files are stubbed to init_net's one. This is temporary.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Conntrack (struct nf_conn) gets pointer to netns: ->ct_net -- netns in which
it was created. It comes from netdevice.
->ct_net is write-once field.
Every conntrack in system has ->ct_net initialized, no exceptions.
->ct_net doesn't pin netns: conntracks are recycled after timeouts and
pinning background traffic will prevent netns from even starting shutdown
sequence.
Right now every conntrack is created in init_net.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
One comment: #ifdefs around #include is necessary to overcome amazing compile
breakages in NOTRACK-in-netns patch (see below).
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
and (try to) consistently use u_int8_t for the L3 family.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Wrap calling sk->sk_backlog_rcv() in a function. This will allow extending the
generic sk_backlog_rcv behaviour.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the socket cached in the skb if it's present.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To be able to use the cached socket reference in the skb during input
processing we add a new set of lookup functions that receive the skb on
their argument list.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adjusts the rate control API to allow multi-rate retry
if supported by the driver. The ieee80211_hw struct specifies how
many alternate rate selections the driver supports.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Free up 2 bytes in skb->cb to be used for multi-rate retry later.
Move iv_len and icv_len initialization into key alloc.
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Jay Cliburn noticed and diagnosed a bug triggered in
dev_gso_skb_destructor() after last change from qdisc->gso_skb
to qdisc->requeue list. Since gso_segmented skbs can't be queued
to another list this patch brings back qdisc->gso_skb for them.
Reported-by: Jay Cliburn <jcliburn@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provides implementation of the enhancements of XFRM/PF_KEY MIGRATE mechanism
specified in draft-ebalard-mext-pfkey-enhanced-migrate-00. Defines associated
PF_KEY SADB_X_EXT_KMADDRESS extension and XFRM/netlink XFRMA_KMADDRESS
attribute.
Signed-off-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This protocol provides some connection handling and negotiated
congestion control. Nokia cellular modems use it for bulk transfers.
It provides packet boundaries (hence SOCK_SEQPACKET). Congestion
control is per packet rather per byte, so we do not re-use the
generic socket memory accounting.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_is_any() function that is used to check for wildcard addresses
only looks at the address itself to determine the address family.
This function is used in the API to check the address passed in from
the user. If the user simply zerroes out the sockaddr_storage and
pass that in, we'll end up failing. So, let's try harder to determine
the address family by also checking the socket if it's possible.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
sctp_chunks should be put on a diet. This is some of the low hanging
fruit that we can strip out. Changes all the __s8/__u8 flags to
bitfields. Saves 12 bytes per chunk.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.
This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netfilter's ip_route_me_harder() tries to re-route packets either
generated or re-routed by Netfilter. This patch changes
ip_route_me_harder() to handle packets from non-locally-bound sockets
with IP_TRANSPARENT set as local and to set the appropriate flowi
flags when re-doing the routing lookup.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TCP stack sends out SYN+ACK/ACK/RST reply packets in response to
incoming packets. The non-local source address check on output bites
us again, as replies for transparently redirected traffic won't have a
chance to leave the node.
This patch selectively sets the FLOWI_FLAG_ANYSRC flag when doing the
route lookup for those replies. Transparent replies are enabled if the
listening socket has the transparent socket flag set.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set FLOWI_FLAG_ANYSRC in flowi->flags if the socket has the
transparent socket option set. This way we selectively enable certain
connections with non-local source addresses to be routed.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_iif() in inet_sock.h requires route.h. Since users of inet_iif()
usually require other route.h functionality anyway this patch moves
inet_iif() to route.h.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the IP_TRANSPARENT socket option: enabling that
will make the IPv4 routing omit the non-local source address check on
output. Setting IP_TRANSPARENT requires NET_ADMIN capability.
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. This obviously makes using
such addresses impossible.
This patch introduces a flowi flag which makes omitting this check
possible. The new flag provides a way of handling transparent and
non-transparent connections differently.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu came up with the idea and the original patch to make
xfrm_state dump list contain also dumpers:
As it is we go to extraordinary lengths to ensure that states
don't go away while dumpers go to sleep. It's much easier if
we just put the dumpers themselves on the list since they can't
go away while they're going.
I've also changed the order of addition on new states to prevent
a never-ending dump.
Timo Teräs improved the patch to apply cleanly to latest tree,
modified iteration code to be more readable by using a common
struct for entries in the list, implemented the same idea for
xfrm_policy dumping and moved the af_key specific "last" entry
caching to af_key.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since call to function sctp_sf_abort_violation() need paramter 'arg' with
'struct sctp_chunk' type, it will read the chunk type and chunk length from
the chunk_hdr member of chunk. But call to sctp_sf_violation_paramlen()
always with 'struct sctp_paramhdr' type's parameter, it will be passed to
sctp_sf_abort_violation(). This may cause kernel panic.
sctp_sf_violation_paramlen()
|-- sctp_sf_abort_violation()
|-- sctp_make_abort_violation()
This patch fixed this problem. This patch also fix two place which called
sctp_sf_violation_paramlen() with wrong paramter type.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
9p trans modules aren't refcounted nor were they unregistered
properly. Fix it.
* Add 9p_trans_module->owner and reference the module on each trans
instance creation and put it on destruction.
* Protect v9fs_trans_list with a spinlock. This isn't strictly
necessary as the list is manipulated only during module loading /
unloading but it's a good idea to make the API safe.
* Unregister trans modules when the corresponding module is being
unloaded.
* While at it, kill unnecessary EXPORT_SYMBOL on p9_trans_fd_init().
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Long awaited, hard work. This patch totally cleans up the rate control
API to remove the requirement to include internal headers outside of
net/mac80211/.
There's one internal use in the PID algorithm left for mesh networking,
we'll have to figure out a way to clean that one up and decide how to
do the peer link evaluation, possibly independent of the rate control
algorithm or via new API.
Additionally, ath9k is left using the cross-inclusion hack for now, we
will add new API where necessary to make this work properly, but right
now I'm not expert enough to do it. It's still off better than before.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes cfg80211 show the interface in the nl80211
information about a specific interface. API users are required
to keep the type updated (everything else is fairly complicated)
but you will get a warning if you fail to keep it updated.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Luis added the regulatory hint stuff to this file without
observing that __ieee80211_get_channel and ieee80211_get_channel
really belong together.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The current check wrongly uses the state of one (currently the first)
tx queue for all tx queues in case of non-default qdiscs. This check
mainly prevented requeuing loop with __netif_schedule(), but now it's
controlled inside __qdisc_run(), while dequeuing. The wrongness of
this check was first noticed by Herbert Xu.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no listener socket for a received packet, send an error
back to the sender.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Phonet endpoints are bound to individual ports.
This provides a /proc/sys/net/phonet (or sysctl) interface for
selecting the range of automatically allocated ports (much like the
ip_local_port_range with IPv4).
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides the basic SOCK_DGRAM transport protocol for Phonet.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides the socket API for the Phonet protocols family.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides support for configuring Phonet addresses, notifying
Phonet configuration changes, and dumping the configuration.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides support for adding Phonet addresses to and removing
Phonet addresses from network devices.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the basis for the Phonet protocol families, and introduces
the ETH_P_PHONET packet type and the PF_PHONET socket family.
Signed-off-by: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As discovered by Timo Teräs, the currently xfrm_state_walk scheme
is racy because if a second dump finishes before the first, we
may free xfrm states that the first dump would walk over later.
This patch fixes this by storing the dumps in a list in order
to calculate the correct completion counter which cures this
problem.
I've expanded netlink_cb in order to accomodate the extra state
related to this. It shouldn't be a big deal since netlink_cb
is kmalloced for each dump and we're just increasing it by 4 or
8 bytes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This minor cleanup simplifies later changes which will convert
struct sk_buff and friends over to using struct list_head.
Signed-off-by: David S. Miller <davem@davemloft.net>
Most importantly avoid doing it with cumulative ACK. Not clearing
means that we no longer need n^2 processing in resolution of each
fast recovery.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both loops are quite similar, so they can be combined
with little effort. As a result, forward_skb_hint becomes
obsolete as well.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Main benefit in this is that we can then freely point
the retransmit_skb_hint to anywhere we want to because
there's no longer need to know what would be the count
changes involve, and since this is really used only as a
terminator, unnecessary work is one time walk at most,
and if some retransmissions are necessary after that
point later on, the walk is not full waste of time
anyway.
Since retransmit_high must be kept valid, all lost
markers must ensure that.
Now I also have learned how those "holes" in the
rexmittable skbs can appear, mtu probe does them. So
I removed the misleading comment as well.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ie., the difference between partial and all clearing doesn't
exists anymore since the SACK optimizations got dropped by
an sacktag rewrite.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, virtual interface pointers passed to drivers might be
from monitor interfaces and as such completely uninitialised
because we do not tell the driver about monitor interfaces when
those are created. Instead of passing them, we should therefore
indicate to the driver that there is no information; do that by
passing a NULL value and adjust drivers to cope with it.
As a result, some mac80211 API functions also need to cope with
a NULL vif pointer so drivers can still call them unconditionally.
Also, when injecting frames we really don't want to pass NULL all
the time, if we know we are the source address of a frame and have
a local interface for that address, we can to use that interface.
This also helps with processing the frame correctly for that
interface which will help the 802.11w implementation. It's not
entirely correct for VLANs or WDS interfaces because there the MAC
address isn't unique, but it's already a lot better than what we
do now.
Finally, when injecting without a matching local interface, don't
assign sequence numbers at all.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Rate control algorithms may need access to a station's
HT capabilities, so share the ht_info struct in the
public station API.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
As more preparation for a saner rate control algorithm API,
share the supported rates bitmap in the public API.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch changes mac80211 to share some more data about
stations with drivers. Should help iwlwifi and ath9k when
they get around to updating, and might also help with
implementing rate control algorithms without internals.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There's really no reason for mac80211 to be using its
own interface type defines. Use the nl80211 types and
simplify the configuration code a bit: there's no need
to translate them any more now.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Drivers need to know the basic rateset to be able to configure
the ACK/CTS programming in hardware correctly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we remove an interface, we can currently end up having
a pointer to it left in local->scan_sdata after it has been
set down, and then with a hardware scan the scan completion
can try to access it which is a bug. Alternatively, a scan
that started as a hardware scan may terminate as though it
was a software scan, if the timing is just right.
On SMP systems, software scan also has a similar problem,
just canceling the delayed work and setting a flag isn't
enough since it may be running concurrently; in this case
we would also never restore state of other interfaces.
This patch hopefully fixes the problems by always invoking
ieee80211_scan_completed or requiring it to be invoked by
the driver, I suspect the drivers that have ->hw_scan() are
buggy. The bug will not manifest itself unless you remove
the interface while hw-scanning which will also turn off
the hw, and then add a new interface which will be unusable
until you scan once.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This adds the new wireless regulatory infrastructure. The
main motiviation behind this was to centralize regulatory
code as each driver was implementing their own regulatory solution,
and to replace the initial centralized code we have where:
* only 3 regulatory domains are supported: US, JP and EU
* regulatory domains can only be changed through module parameter
* all rules were built statically in the kernel
We now have support for regulatory domains for many countries
and regulatory domains are now queried through a userspace agent
through udev allowing distributions to update regulatory rules
without updating the kernel.
Each driver can regulatory_hint() a regulatory domain
based on either their EEPROM mapped regulatory domain value to a
respective ISO/IEC 3166-1 country code or pass an internally built
regulatory domain. We also add support to let the user set the
regulatory domain through userspace in case of faulty EEPROMs to
further help compliance.
Support for world roaming will be added soon for cards capable of
this.
For more information see:
http://wireless.kernel.org/en/developers/Regulatory/CRDA
For now we leave an option to enable the old module parameter,
ieee80211_regdom, and to build the 3 old regdomains statically
(US, JP and EU). This option is CONFIG_WIRELESS_OLD_REGULATORY.
These old static definitions and the module parameter is being
scheduled for removal for 2.6.29. Note that if you use this
you won't make use of a world regulatory domain as its pointless.
If you leave this option enabled and if CRDA is present and you
use US or JP we will try to ask CRDA to update us a regulatory
domain for us.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This new action will have the ability to change the priority and/or
queue_mapping fields on an sk_buff.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kmemcheck reported this:
kmemcheck: Caught 16-bit read from uninitialized memory (f6c1ba30)
0500110001508abf050010000500000002017300140000006f72672e66726565
i i i i i i i i i i i i i u u u u u u u u u u u u u u u u u u u
^
Pid: 3462, comm: wpa_supplicant Not tainted (2.6.27-rc3-00054-g6397ab9-dirty #13)
EIP: 0060:[<c05de64a>] EFLAGS: 00010296 CPU: 0
EIP is at nla_parse+0x5a/0xf0
EAX: 00000008 EBX: fffffffd ECX: c06f16c0 EDX: 00000005
ESI: 00000010 EDI: f6c1ba30 EBP: f6367c6c ESP: c0a11e88
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f781cc84 CR3: 3632f000 CR4: 000006d0
DR0: c0ead9bc DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
[<c05d4b23>] rtnl_setlink+0x63/0x130
[<c05d5f75>] rtnetlink_rcv_msg+0x165/0x200
[<c05ddf66>] netlink_rcv_skb+0x76/0xa0
[<c05d5dfe>] rtnetlink_rcv+0x1e/0x30
[<c05dda21>] netlink_unicast+0x281/0x290
[<c05ddbe9>] netlink_sendmsg+0x1b9/0x2b0
[<c05beef2>] sock_sendmsg+0xd2/0x100
[<c05bf945>] sys_sendto+0xa5/0xd0
[<c05bf9a6>] sys_send+0x36/0x40
[<c05c03d6>] sys_socketcall+0x1e6/0x2c0
[<c020353b>] sysenter_do_call+0x12/0x3f
[<ffffffff>] 0xffffffff
This is the line in nla_ok():
/**
* nla_ok - check if the netlink attribute fits into the remaining bytes
* @nla: netlink attribute
* @remaining: number of bytes remaining in attribute stream
*/
static inline int nla_ok(const struct nlattr *nla, int remaining)
{
return remaining >= sizeof(*nla) &&
nla->nla_len >= sizeof(*nla) &&
nla->nla_len <= remaining;
}
It turns out that remaining can become negative due to alignment in
nla_next(). But GCC promotes "remaining" to unsigned in the test
against sizeof(*nla) above. Therefore the test succeeds, and the
nla_for_each_attr() may access memory outside the received buffer.
A short example illustrating this point is here:
#include <stdio.h>
main(void)
{
printf("%d\n", -1 >= sizeof(int));
}
...which prints "1".
This patch adds a cast in front of the sizeof so that GCC will make
a signed comparison and fix the illegal memory dereference. With the
patch applied, there is no kmemcheck report.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The conf_tx callback currently needs to be atomic, this requirement
is just because it can be called from scanning. This rearranges it
slightly to only update while not scanning (which is fine, we'll be
getting beacons when associated) and thus removes the atomic
requirement.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that we save states within a walk we need synchronisation
so that the list the saved state is on doesn't disappear from
under us.
As it stands this is done by keeping the state on the list which
is bad because it gets in the way of the management of the state
life-cycle.
An alternative is to make our own pseudo-RCU system where we use
counters to indicate which state can't be freed immediately as
it may be referenced by an ongoing walk when that resumes.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Security Mode 4 of the Bluetooth 2.1 specification has strict
authentication and encryption requirements. It is the initiators job
to create a secure ACL link. However in case of malicious devices, the
acceptor has to make sure that the ACL is encrypted before allowing
any kind of L2CAP connection. The only exception here is the PSM 1 for
the service discovery protocol, because that is allowed to run on an
insecure ACL link.
Previously it was enough to reject a L2CAP connection during the
connection setup phase, but with Bluetooth 2.1 it is forbidden to
do any L2CAP protocol exchange on an insecure link (except SDP).
The new hci_conn_check_link_mode() function can be used to check the
integrity of an ACL link. This functions also takes care of the cases
where Security Mode 4 is disabled or one of the devices is based on
an older specification.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
With the introduction of Security Mode 4 and Simple Pairing from the
Bluetooth 2.1 specification it became mandatory that the initiator
requires authentication and encryption before any L2CAP channel can
be established. The only exception here is PSM 1 for the service
discovery protocol (SDP). It is meant to be used without any encryption
since it contains only public information. This is how Bluetooth 2.0
and before handle connections on PSM 1.
For Bluetooth 2.1 devices the pairing procedure differentiates between
no bonding, general bonding and dedicated bonding. The L2CAP layer
wrongly uses always general bonding when creating new connections, but it
should not do this for SDP connections. In this case the authentication
requirement should be no bonding and the just-works model should be used,
but in case of non-SDP connection it is required to use general bonding.
If the new connection requires man-in-the-middle (MITM) protection, it
also first wrongly creates an unauthenticated link key and then later on
requests an upgrade to an authenticated link key to provide full MITM
protection. With Simple Pairing the link key generation is an expensive
operation (compared to Bluetooth 2.0 and before) and doing this twice
during a connection setup causes a noticeable delay when establishing
a new connection. This should be avoided to not regress from the expected
Bluetooth 2.0 connection times. The authentication requirements are known
up-front and so enforce them.
To fulfill these requirements the hci_connect() function has been extended
with an authentication requirement parameter that will be stored inside
the connection information and can be retrieved by userspace at any
time. This allows the correct IO capabilities exchange and results in
the expected behavior.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Instead of duplicating the fields, integrate a user stats structure into
the kernel stats structure. This is more robust when the members are
changed, because they are now automatically kept in sync.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Reviewed-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Instead of checking the value in include/net/ip_vs.h, we can just
restrict the range in our Kconfig file. This will prevent values outside
of the range early.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Reviewed-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
How to reproduce ?
- create a network namespace
- use tcp protocol and get timewait socket
- exit the network namespace
- after a moment (when the timewait socket is destroyed), the kernel
panics.
# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d
Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007
This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to
the namespace like for example in:
inet_twdr_do_twkill_work
-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);
Purging the timewait sockets at the network namespace destruction will:
1) speed up memory freeing for the namespace
2) fix kernel panic on asynchronous timewait destruction
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Denis V. Lunev <den@openvz.org>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is obviously good for userspace to know up front which
interface modes a given piece of hardware might support (even
if adding such an interface might fail later because of
concurrency issues), so let's make cfg80211 aware of that.
For good measure, disallow adding interfaces in all other
modes so drivers don't forget to announce support for one mode
when they add it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Stephen Blackheath <tramp.enshrine.stephen@blacksapphire.com>
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adjust various debug outputs to use the new *_BUF macro variants for
correct output of v4/v6 addresses.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Convert functions for looking up destinations (real servers) to support
IPv6 services/dests.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add xmit functions for IPv6. Also add the already needed __ip_vs_get_out_rt_v6()
to ip_vs_core.c. Bind the new xmit functions to v6 connections.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Extend functions for getting/creating connections and connection
templates for IPv6 support and fix the callers.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Extend protocol DNAT/SNAT and state handlers to work with IPv6. Also
change/introduce new checksumming helper functions for this.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add 'af' arguments to conn_schedule(), conn_in_get(), conn_out_get() and
csum_check() function pointers in struct ip_vs_protocol. Extend the
respective functions for TCP, UDP, AH and ESP and adjust the callers.
The changes in the callers need to be somewhat extensive, since they now
need to pass a filled out struct ip_vs_iphdr * to the modified functions
instead of a struct iphdr *.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add 'supports_ipv6' flag to struct ip_vs_scheduler to indicate whether a
scheduler supports IPv6. Set the flag to 1 in schedulers that work with
IPv6, 0 otherwise. This flag is checked in a later patch while trying to
add a service with a specific scheduler. Adjust debug in v6-supporting
schedulers to work with both address families.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add support for selecting services based on their address family to
ip_vs_service_get() and adjust the callers.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add extended internal versions of struct ip_vs_service_user and struct
ip_vs_dest_user (the originals can't be modified as they are part
of the old sockopt interface). Adjust ip_vs_ctl.c to work with the new
data structures and add some minor AF-awareness.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add some debugging macros that allow conditional output of either v4 or v6
addresses, depending on an 'af' parameter. This is done by creating a
temporary string buffer in an outer debug macro and writing addresses'
string representations into it from another macro which can only be used
when inside the outer one.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Add a struct ip_vs_iphdr for easier handling of common v4 and v6 header
fields in the same code path. ip_vs_fill_iphdr() helps to fill this struct
from an IPv4 or IPv6 header. Add further helper functions for copying and
comparing addresses.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Introduce new 'af' fields into IPVS data structures for specifying an
entry's address family. Convert IP addresses to be of type union
nf_inet_addr.
Signed-off-by: Julius Volz <juliusv@google.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch consolidates the code common to TCP and CCID-2:
* TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
* CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Removes all _nested_compat() functions from the API. The prio qdisc
no longer requires them and netem has its own format anyway. Their
existance is only confusing.
Resend: Also remove the wrapper macro.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow userspace (e.g., hostapd) to set HT capabilities for associated
STAs. This is based on a patch from Zhu Yi <yi.zhu@intel.com> (only
the NL80211_ATTR_HT_CAPABILITY for NEW_STA part is included here).
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This change adds a new cfg80211 command, NL80211_CMD_SET_BSS, to allow
AP mode BSS parameters to be changed from user space (e.g., hostapd).
The drivers using mac80211 are expected to be modified with separate
changes to use the new BSS info parameter for short slot time in the
bss_info_changed() handler.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All users of struct proto::compat_[gs]etsockopt and
struct inet_connection_sock_af_ops::compat_[gs]etsockopt are under
#ifdef already, so use it in structure definition too.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new qdisc_root_sleeping_lock() instead of qdisc_root_lock() as
sch_tree_lock() because this lock could be used while dev is
deactivated, but we never need to use this with noop_qdisc as a root.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While passing a qdisc root lock to gen_new_estimator() and
gen_replace_estimator() dev could be deactivated or even before
grafting proper root qdisc as qdisc_sleeping (e.g. qdisc_create), so
using qdisc_root_lock() is not enough. This patch adds
qdisc_root_sleeping_lock() for this, plus additional checks, where
necessary.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users have been moved over to the version taking a le16 frame control
rather than a cpu-endian value.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds ieee80211_queue_stopped that let drivers to query
queue status
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since some qdiscs call qdisc_tree_decrease_qlen() (so qdisc_lookup())
without rtnl_lock(), adding and deleting from a qdisc list needs
additional locking. This patch adds global spinlock qdisc_list_lock
and wrapper functions for modifying the list. It is considered as a
temporary solution until hfsc_dequeue(), netem_dequeue() and
tbf_dequeue() (or qdisc_tree_decrease_qlen()) are redone.
With feedback from Herbert Xu and David S. Miller.
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_deactivate() can skip rescheduling of a qdisc by qdisc_watchdog()
or other timer calling netif_schedule() after dev_queue_deactivate().
We prevent this checking aliveness before scheduling the timer. Since
during deactivation the root qdisc is available only as qdisc_sleeping
additional accessor qdisc_root_sleeping() is created.
With feedback from Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IEEE80211_HW_HOST_GEN_BEACON_TEMPLATE was made unnecessary in
the recent revamp on beacon configuration.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We can now kill them synchronously with all of the
previous dev_deactivate() cures.
This makes netdev destruction and shutdown saner as
the qdiscs hold references to the device.
Signed-off-by: David S. Miller <davem@davemloft.net>
This new state lets dev_deactivate() mark a qdisc as having been
deactivated.
dev_queue_xmit() and ing_filter() check for this bit and do not
try to process the qdisc if the bit is set.
dev_deactivate() polls the qdisc after setting the bit, waiting
for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear.
This isn't perfect yet, but subsequent changesets will make it so.
This part is just one piece of the puzzle.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8ab19ea36c ("ipvs: Fix possible deadlock
in estimator code") fixed a deadlock condition, but that condition can only
happen during unload of IPVS, because during normal operation there is at least
our global stats structure in the estimator list. The mod_timer() and
del_timer_sync() calls are actually initialization and cleanup code in
disguise. Let's make it explicit and move them to their own init and cleanup
function.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Simon Horman <horms@verge.net.au>
ipv6_dev_get_saddr() blindly de-references dst_dev to get the network
namespace, but some callers might pass NULL. Change callers to pass a
namespace pointer instead.
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes ip6_prohibit_entry and ip6_blk_hole_entry
declarations from include/net/ip6_route.h as they are unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes rt6_lock declaration from include/net/ip6_route.h
as it is unused.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon a bug report by Andrew Gallatin on netdev
with subject "CPU utilization increased in 2.6.27rc"
In commit 37437bb2e1
("pkt_sched: Schedule qdiscs instead of netdev_queue.")
the test of the queue being stopped was erroneously
removed from qdisc_run().
When the TX queue of the device fills up, this omission
causes lots of extraneous useless work to be queued up
to softirq context, where we'll just return immediately
because the device is still stuffed up.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no reason for dynamically allocating an estimator object for every
stats object. Directly embed an estimator object into every stats object and
switch to using the kernel-provided list implementation. This makes the code
much simpler and faster, as we do not need to traverse the list of all
estimators to find the one belonging to a stats object. There's no need to use
an rwlock, as we only have one reader. Also reorder the members of the
estimator structure slightly to avoid padding overhead. This can't be done
with the stats object as the members are currently copied to our user space
object via memcpy() and changing it would break ABI.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>