* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (56 commits)
sky2: Fix oops in sky2_xmit_frame() after TX timeout
Documentation/3c509: document ethtool support
af_packet: Don't use skb after dev_queue_xmit()
vxge: use pci_dma_mapping_error to test return value
netfilter: ebtables: enforce CAP_NET_ADMIN
e1000e: fix and commonize code for setting the receive address registers
e1000e: e1000e_enable_tx_pkt_filtering() returns wrong value
e1000e: perform 10/100 adaptive IFS only on parts that support it
e1000e: don't accumulate PHY statistics on PHY read failure
e1000e: call pci_save_state() after pci_restore_state()
netxen: update version to 4.0.72
netxen: fix set mac addr
netxen: fix smatch warning
netxen: fix tx ring memory leak
tcp: update the netstamp_needed counter when cloning sockets
TI DaVinci EMAC: Handle emac module clock correctly.
dmfe/tulip: Let dmfe handle DM910x except for SPARC on-board chips
ixgbe: Fix compiler warning about variable being used uninitialized
netfilter: nf_ct_ftp: fix out of bounds read in update_nl_seq()
mv643xx_eth: don't include cache padding in rx desc buffer size
...
Fix trivial conflict in drivers/scsi/cxgb3i/cxgb3i_offload.c
Add Unscheduled Automatic Power-Save Delivery (U-APSD) client support. The
idea is that the data frames from the client trigger AP to send the buffered
frames with ACs which have U-APSD enabled. This decreases latency and makes it
possible to save even more power.
Driver needs to use IEEE80211_HW_UAPSD to enable the feature. The current
implementation assumes that firmware takes care of the wakeup and
hardware needing IEEE80211_HW_PS_NULLFUNC_STACK is not yet supported.
Tested with wl1251 on a Nokia N900 and Cisco Aironet 1231G AP and running
various test traffic with ping.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Store information elements from Beacon and Probe Response frames in
separate buffers to allow both sets to be made available through
nl80211. This allows user space applications to get access to IEs from
Beacon frames even if we have received Probe Response frames from the
BSS. Previously, the IEs from Probe Response frames would have
overridden the IEs from Beacon frames.
This feature is of somewhat limited use since most protocols include
the same (or extended) information in Probe Response frames. However,
there are couple of exceptions where the IEs from Beacon frames could
be of some use: TIM IE is only included in Beacon frames (and it would
be needed to figure out the DTIM period used in the BSS) and at least
some implementations of Wireless Provisioning Services seem to include
the full IE only in Beacon frames).
The new BSS attribute for scan results is added to allow both the IE
sets to be delivered. This is done in a way that maintains the
previously used behavior for applications that are not aware of the
new NL80211_BSS_BEACON_IES attribute.
Signed-off-by: Jouni Malinen <j@w1.fi>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Certain type of hardware, for example wl1251 and wl1271, need a template
for the Probe Request. Create a function ieee80211_probereq_get() which
creates the template and drivers send it to hardware.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Some hardware, for example wl1251 and wl1271, handle the transmission
of power save related frames in hardware, but the driver is responsible
for creating the templates. It's better to create the templates in mac80211,
that way all drivers can benefit from this.
Add two new functions, ieee80211_pspoll_get() and ieee80211_nullfunc_get()
which drivers need to call to get the frame. Drivers are also responsible
for updating the templates after each association.
Also new struct ieee80211_hdr_3addr is added to ieee80211.h to make it
easy to calculate length of the Nullfunc frame.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add a new NL80211_CMD_SET_TX_BITRATE_MASK command and related
attributes to provide support for setting TX rate mask for rate
control. This uses the existing cfg80211 set_bitrate_mask operation
that was previously used only with WEXT compat code (SIOCSIWRATE). The
nl80211 command allows more generic configuration of allowed rates as
a mask instead of fixed/max rate.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Extend struct cfg80211_bitrate_mask to actually use a bitfield mask
instead of just a single fixed or maximum rate index. This change
itself does not modify the behavior (except for debugfs files), but it
prepares cfg80211 and mac80211 for a new nl80211 command for setting
which rates can be used in TX rate control.
Since frames are now going through the rate control algorithm
unconditionally, the internal IEEE80211_TX_INTFL_RCALGO flag can now
be removed. The RC implementations can use the rate_idx_mask value to
optimize their behavior if only a single rate is enabled.
The old max_rate_idx in struct ieee80211_tx_rate_control is maintained
(but commented as deprecated) for backwards compatibility with existing
RC implementations. Once these implementations have been updated to
use the more generic rate_idx_mask, the max_rate_idx value can be
removed.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If the basic rate set is configured to not include the lowest rate
(e.g., basic rate set = 6, 12, 24 Mbps in IEEE 802.11g mode), the AP
should not send out broadcast frames at 1 Mbps. This type of
configuration can be used to optimize channel usage in cases where
there is no need for backwards compatibility with IEEE 802.11b-only
devices.
In AP mode, mac80211 was unconditionally using the lowest rate for
Beacon frames and similarly, with all rate control algorithms that use
rate_control_send_low(), the lowest rate ended up being used for all
broadcast frames (and all unicast frames that are sent before
association). Change this to take into account the basic rate
configuration in AP mode, i.e., use the lowest rate in the basic rate
set instead of the lowest supported rate when selecting the rate.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Mac80211 callback to driver set_coverage_class() sets slot time and ACK
timeout for given IEEE 802.11 coverage class. The callback is optional,
but it's essential for long distance links.
Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The new attribute NL80211_ATTR_WIPHY_COVERAGE_CLASS sets IEEE 802.11
Coverage Class, which depends on maximum distance of nodes in a
wireless network. It's required for long distance links (more than a few
hundred meters).
The attribute is now ignored by two non-mac80211 drivers, rndis and
iwmc3200wifi, together with WIPHY_PARAM_RETRY_SHORT and
WIPHY_PARAM_RETRY_LONG. If it turns out to be a problem, we could split
set_wiphy_params callback or add new capability bits.
Signed-off-by: Lukas Turek <8an@praha12.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds the kernel portions needed to implement
RFC 5082 Generalized TTL Security Mechanism (GTSM).
It is a lightweight security measure against forged
packets causing DoS attacks (for BGP).
This is already implemented the same way in BSD kernels.
For the necessary Quagga patch
http://www.gossamer-threads.com/lists/quagga/dev/17389
Description from Cisco
http://www.cisco.com/en/US/docs/ios/12_3t/12_3t7/feature/guide/gt_btsh.html
It does add one byte to each socket structure, but I did
a little rearrangement to reuse a hole (on 64 bit), but it
does grow the structure on 32 bit
This should be documented on ip(4) man page and the Glibc in.h
file also needs update. IPV6_MINHOPLIMIT should also be added
(although BSD doesn't support that).
Only TCP is supported, but could also be added to UDP, DCCP, SCTP
if desired.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send aligned pipe payload if requested to do so. Then, the socket buffer
needs not be fragmented anymore.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer Nokia cellular modems can use aligned payload for their GPRS pipe.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have L3 tunnels with different inner/outer families
(i.e. IPV4/IPV6) which use a multicast address as the outer tunnel
destination address, multicast packets will be loopbacked back to the
sending socket even if IP*_MULTICAST_LOOP is set to disabled.
The mc_loop flag is present in the family specific part of the socket
(e.g. the IPv4 or IPv4 specific part). setsockopt sets the inner
family mc_loop flag. When the packet is pushed through the L3 tunnel
it will eventually be processed by the outer family which if different
will check the flag in a different part of the socket then it was set.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I was very frustrated about the fact that I have to recompile the kernel
to change the hash size. So, I created this patch.
If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel
command line, or, if you built IPVS as modules, you can add
options ip_vs conn_tab_bits=??.
To keep everything backward compatible, you still can select the size at
compile time, and that will be used as default.
It has been about a year since this patch was originally posted
and subsequently dropped on the basis of insufficient test data.
Mark Bergsma has provided the following test results which seem
to strongly support the need for larger hash table sizes:
We do however run into the same problem with the default setting (212 =
4096 entries), as most of our LVS balancers handle around a million
connections/SLAB entries at any point in time (around 100-150 kpps
load). With only 4096 hash table entries this implies that each entry
consists of a linked list of 256 connections *on average*.
To provide some statistics, I did an oprofile run on an 2.6.31 kernel,
with both the default 4096 table size, and the same kernel recompiled
with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a
quick test setup with a part of Wikimedia/Wikipedia's live traffic
mirrored by the switch to the test host.
With the default setting, at ~ 120 kpps packet load we saw a typical %si
CPU usage of around 30-35%, and oprofile reported a hot spot in
ip_vs_conn_in_get:
samples % image name app name
symbol name
1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
302577 7.4554 bnx2 bnx2 /bnx2
181984 4.4840 vmlinux vmlinux __ticket_spin_lock
128636 3.1695 vmlinux vmlinux ip_route_input
74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get
68482 1.6874 vmlinux vmlinux mwait_idle
After loading the recompiled kernel with 218 entries, %si CPU usage
dropped in half to around 12-18%, and oprofile looks much healthier,
with only 7% spent in ip_vs_conn_in_get:
samples % image name app name
symbol name
265641 14.4616 bnx2 bnx2 /bnx2
143251 7.7986 vmlinux vmlinux __ticket_spin_lock
140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get
94364 5.1372 vmlinux vmlinux mwait_idle
86267 4.6964 vmlinux vmlinux ip_route_input
[ horms@verge.net.au: trivial up-port and minor style fixes ]
Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro>
Cc: Mark Bergsma <mark@wikimedia.org>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (74 commits)
Revert "b43: Enforce DMA descriptor memory constraints"
iwmc3200wifi: fix array out-of-boundary access
wl1251: timeout one too soon in wl1251_boot_run_firmware()
mac80211: fix propagation of failed hardware reconfigurations
mac80211: fix race with suspend and dynamic_ps_disable_work
ath9k: fix missed error codes in the tx status check
ath9k: wake hardware during AMPDU TX actions
ath9k: wake hardware for interface IBSS/AP/Mesh removal
ath9k: fix suspend by waking device prior to stop
cfg80211: fix error path in cfg80211_wext_siwscan
wl1271_cmd.c: cleanup char => u8
iwlwifi: Storage class should be before const qualifier
ath9k: Storage class should be before const qualifier
cfg80211: fix race between deauth and assoc response
wireless: remove remaining qual code
rt2x00: Add USB ID for Linksys WUSB 600N rev 2.
ath5k: fix SWI calibration interrupt storm
mac80211: fix ibss join with fixed-bssid
libertas: Remove carrier signaling from the scan code
orinoco: fix GFP_KERNEL in orinoco_set_key with interrupts disabled
...
To make it easier to notice cases of calling sleeping ops in atomic context,
annotate driver-ops.h with appropiate might_sleep() calls. At the same time,
also document in mac80211.h the op functions with missing contexts.
mac80211 doesn't seem to use get_tx_stats anywhere currently. Just to be on
the safe side, I documented it to be atomic, but hopefully the op can be
removed in the future.
Compile-tested only.
Signed-off-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All its members (vif, mac_addr, type) are now available
in the vif struct directly, so we can pass that instead
of the conf struct. I generated this patch (except the
mac80211 and header file changes) with this semantic
patch:
@@
identifier conf, fn, hw;
type tp;
@@
tp fn(struct ieee80211_hw *hw,
-struct ieee80211_if_init_conf *conf)
+struct ieee80211_vif *vif)
{
<...
(
-conf->type
+vif->type
|
-conf->mac_addr
+vif->addr
|
-conf->vif
+vif
)
...>
}
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When, for instance, a new IBSS peer is found, userspace
wants to be notified. Add events for all new stations
that mac80211 learns about.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add new commands for requesting the driver to remain awake
on a specified channel for the specified amount of time
(and another command to cancel such an operation). This
can be used to implement userspace-controlled off-channel
operations, like Public Action frame exchange on another
channel than the operation channel.
The off-channel operation should behave similarly to scan,
i.e. the local station (if associated) moves into power
save mode to request the AP to buffer frames for it and
then moves to the other channel to allow the off-channel
operation to be completed. The duration parameter can be
used to request enough time to receive a response from
the target station.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We've long lacked a good confirmation that frames
have really gone out, e.g. before going off-channel
for a scan. Add a flush() operation that drivers
can implement to provide that confirmation, and use
it in a few places:
* before scanning sends the nullfunc frames
* after scanning sends the nullfunc frames, if any
* when going idle, to send any pending frames
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This removes the remaining users of the rx status
'qual' field and the field itself.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For the cases where a lot of interfaces are used in conjunction with a
lot of LLC sockets bound to the same SAP, the iteration of the socket
list becomes prohibitively expensive.
Replacing the list with a a local address based hash significantly
improves the bind and listener lookup operations as well as the
datagram delivery.
Connected sockets delivery is also improved, but this patch does not
address the case where we have lots of sockets with the same local
address connected to different remote addresses.
In order to keep the socket sanity checks alive and fast a socket
counter was added to the SAP structure.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a per SAP device based hash table to solve the
multicast delivery scalability issue when we have large number of
interfaces and a large number of sockets bound to the same SAP.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
which require some extra checks in the lookup code:
a) If the current socket was released, reallocated & inserted in
another list it will short circuit the iteration for the current list,
thus we need to restart the lookup.
b) If the current socket was released, reallocated & inserted in the
same list we just need to recheck it matches the look-up criteria and
if not we can skip to the next element.
In this case there is no need to restart the lookup, since sockets are
inserted at the start of the list and the worst that will happen is
that we will iterate throught some of the list elements more then
once.
Note that the /proc and multicast delivery was not yet converted to
RCU, it still uses spinlocks for protection.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rtnetlink init_rcvwnd to set the TCP initial receive window size
advertised by passive and active TCP connections.
The current Linux TCP implementation limits the advertised TCP initial
receive window to the one prescribed by slow start. For short lived
TCP connections used for transaction type of traffic (i.e. http
requests), bounding the advertised TCP initial receive window results
in increased latency to complete the transaction.
Support for setting initial congestion window is already supported
using rtnetlink init_cwnd, but the feature is useless without the
ability to set a larger TCP initial receive window.
The rtnetlink init_rcvwnd allows increasing the TCP initial receive
window, allowing TCP connection to advertise larger TCP receive window
than the ones bounded by slow start.
Signed-off-by: Laurent Chavey <chavey@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_push checks tcp_send_head and calls __tcp_push_pending_frames,
which again checks tcp_send_head, and this unnecessary check is
done for every other caller of __tcp_push_pending_frames.
Remove tcp_send_head check in __tcp_push_pending_frames and add
the check to tcp_push_pending_frames. Other functions call
__tcp_push_pending_frames only when tcp_send_head would evaluate
to true.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable spatial multiplexing in mac80211 by telling the
driver what to do and, where necessary, sending action
frames to the AP to update the requested SMPS mode.
Also includes a trivial implementation for hwsim that
just logs the requested mode.
For now, the userspace interface is in debugfs only,
and let you toggle the requested mode at any time.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Move the A-MSDU handling code from mac80211 to cfg80211 so that more
drivers can use it. The new created function ieee80211_amsdu_to_8023s
converts an A-MSDU frame to a list of 802.3 frames.
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
For bluetooth 3, we will most likely not have
a netdev for a virtual interface (sdata), so
prepare for that by reducing the reliance on
having a netdev. This patch moves the name
and address fields into the sdata struct and
uses them from there all over. Some work is
needed to keep them sync'ed, but that's not
a lot of work and in slow paths anyway.
In doing so, this also reduces the number of
pointer dereferences in many places, because
of things like sdata->dev->dev_addr becoming
sdata->vif.addr.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
ixgbe: allow tx of pre-formatted vlan tagged packets
ixgbe: Fix 82598 premature copper PHY link indicatation
ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
bcm63xx_enet: fix compilation failure after get_stats_count removal
packet: dont call sleeping functions while holding rcu_read_lock()
tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
ipvs: zero usvc and udest
netfilter: fix crashes in bridge netfilter caused by fragment jumps
ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
sky2: leave PCI config space writeable
sky2: print Optima chip name
x25: Update maintainer.
ipvs: fix synchronization on connection close
netfilter: xtables: document minimal required version
drivers/net/bonding/: : use pr_fmt
can: CAN_MCP251X should depend on HAS_DMA
drivers/net/usb: Correct code taking the size of a pointer
drivers/net/cpmac.c: Correct code taking the size of a pointer
drivers/net/sfc: Correct code taking the size of a pointer
...
It creates a regression, triggering badness for SYN_RECV
sockets, for example:
[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24002442 XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c
But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong. They try to use the
listening socket's 'dst' to probe the route settings. The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.
This stuff really isn't ready, so the best thing to do is a
full revert. This reverts the following commits:
f55017a93f022c3f7d821aba721ebacda42ebd67345cda2fd6dc343475ed05eaade2786a2a2d6bf8
Signed-off-by: David S. Miller <davem@davemloft.net>
When fragments from bridge netfilter are passed to IPv4 or IPv6 conntrack
and a reassembly queue with the same fragment key already exists from
reassembling a similar packet received on a different device (f.i. with
multicasted fragments), the reassembled packet might continue on a different
codepath than where the head fragment originated. This can cause crashes
in bridge netfilter when a fragment received on a non-bridge device (and
thus with skb->nf_bridge == NULL) continues through the bridge netfilter
code.
Add a new reassembly identifier for packets originating from bridge
netfilter and use it to put those packets in insolated queues.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=14805
Reported-and-Tested-by: Chong Qiao <qiaochong@loongson.cn>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Currently the same reassembly queue might be used for packets reassembled
by conntrack in different positions in the stack (PREROUTING/LOCAL_OUT),
as well as local delivery. This can cause "packet jumps" when the fragment
completing a reassembled packet is queued from a different position in the
stack than the previous ones.
Add a "user" identifier to the reassembly queue key to seperate the queues
of each caller, similar to what we do for IPv4.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Add a definition of the amount of TX headroom reserved by mac80211 itself
for its own purposes. Also add BUILD_BUG_ON to validate the value.
This define can then be used by drivers to request additional TX headroom
in the most efficient manner.
Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
compat_sys_recvmmsg has a compat_timespec parameter and not a
timespec parameter. This way we also get rid of an odd cast.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves retransmits_timed_out() from include/net/tcp.h
to tcp_timer.c, where it is used.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a problem in the TCP connection timeout calculation.
Currently, timeout decisions are made on the basis of the current
tcp_time_stamp and retrans_stamp, which is usually set at the first
retransmission.
However, if the retransmission fails in tcp_retransmit_skb(),
retrans_stamp is not updated and remains zero. This leads to wrong
decisions in retransmits_timed_out() if tcp_time_stamp is larger than
the specified timeout, which is very likely.
In this case, the TCP connection dies after the first attempted
(and unsuccessful) retransmission.
With this patch, tcp_skb_cb->when is used instead, when retrans_stamp
is not available.
This bug has been introduced together with retransmits_timed_out() in
2.6.32, as the number of retransmissions has been used for timeout
decisions before. The corresponding commit was
6fa12c8503 (Revert Backoff [v3]:
Calculate TCP's connection close threshold as a time value.).
Thanks to Ilpo Järvinen for code suggestions and Frederic Leroy for
testing.
Reported-by: Frederic Leroy <fredo@starox.org>
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we find a timewait connection in __inet_hash_connect() and reuse
it for a new connection request, we have a race window, releasing bind
list lock and reacquiring it in __inet_twsk_kill() to remove timewait
socket from list.
Another thread might find the timewait socket we already chose, leading to
list corruption and crashes.
Fix is to remove timewait socket from bind list before releasing the bind lock.
Note: This problem happens if sysctl_tcp_tw_reuse is set.
Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First patch changes __inet_hash_nolisten() and __inet6_hash()
to get a timewait parameter to be able to unhash it from ehash
at same time the new socket is inserted in hash.
This makes sure timewait socket wont be found by a concurrent
writer in __inet_check_established()
Reported-by: kapil dakhane <kdakhane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
security/tomoyo: Remove now unnecessary handling of security_sysctl.
security/tomoyo: Add a special case to handle accesses through the internal proc mount.
sysctl: Drop & in front of every proc_handler.
sysctl: Remove CTL_NONE and CTL_UNNUMBERED
sysctl: kill dead ctl_handler definitions.
sysctl: Remove the last of the generic binary sysctl support
sysctl net: Remove unused binary sysctl code
sysctl security/tomoyo: Don't look at ctl_name
sysctl arm: Remove binary sysctl support
sysctl x86: Remove dead binary sysctl support
sysctl sh: Remove dead binary sysctl support
sysctl powerpc: Remove dead binary sysctl support
sysctl ia64: Remove dead binary sysctl support
sysctl s390: Remove dead sysctl binary support
sysctl frv: Remove dead binary sysctl support
sysctl mips/lasat: Remove dead binary sysctl support
sysctl drivers: Remove dead binary sysctl support
sysctl crypto: Remove dead binary sysctl support
sysctl security/keys: Remove dead binary sysctl support
sysctl kernel: Remove binary sysctl logic
...
Its currently possible that several threads issuing a connect() find
the same timewait socket and try to reuse it, leading to list
corruptions.
Condition for bug is that these threads bound their socket on same
address/port of to-be-find timewait socket, and connected to same
target. (SO_REUSEADDR needed)
To fix this problem, we could unhash timewait socket while holding
ehash lock, to make sure lookups/changes will be serialized. Only
first thread finds the timewait socket, other ones find the
established socket and return an EADDRNOTAVAIL error.
This second version takes into account Evgeniy's review and makes sure
inet_twsk_put() is called outside of locked sections.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function walks the whole hashtable so there is no point in
passing it a network namespace. Instead I purge all timewait
sockets from dead network namespaces that I find. If the namespace
is one of the once I am trying to purge I am guaranteed no new timewait
sockets can be formed so this will get them all. If the namespace
is one I am not acting for it might form a few more but I will
call inet_twsk_purge again and shortly to get rid of them. In
any even if the network namespace is dead timewait sockets are
useless.
Move the calls of inet_twsk_purge into batch_exit routines so
that if I am killing a bunch of namespaces at once I will just
call inet_twsk_purge once and save a lot of redundant unnecessary
work.
My simple 4k network namespace exit test the cleanup time dropped from
roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell
to about 2ms. 1ms for ipv4 and 1ms for ipv6.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the code so fib_rules_register always takes a template instead
of the actual fib_rules_ops structure that will be used. This is
required for network namespace support so 2 out of the 3 callers already
do this, it allows the error handling to be made common, and it allows
fib_rules_unregister to free the template for hte caller.
Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu
to allw multiple namespaces to be cleaned up in the same rcu grace
period.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm.nlsk is provided by the xfrm_user module and is access via rcu from
other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that
will never be set to NULL. This allows the synchronize_net and
netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets
have been set to NULL.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add exit_list to struct net to support building lists of network
namespaces to cleanup.
- Add exit_batch to pernet_operations to allow running operations only
once during a network namespace exit. Instead of once per network
namespace.
- Factor opt ops_exit_list and ops_exit_free so the logic with cleanup
up a network namespace does not need to be duplicated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 68144d350f4f6c348659c825cde6a82b34c27a91
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:25 2009 +0100
net: fib_rules: add oif classification
Support routing table lookup based on the flow's oif. This is useful to
classify packets originating from sockets bound to interfaces differently.
The route cache already includes the oif and needs no changes.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 229e77eec406ad68662f18e49fda8b5d366768c5
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:23 2009 +0100
net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME
The next patch will add oif classification, rename interface related members
and attributes to reflect that they're used for iif classification.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit b8952893d5d86f69c4e499d191b98c6658f64b0f
Author: Patrick McHardy <kaber@trash.net>
Date: Thu Dec 3 12:05:22 2009 +0100
net: fib_rules: rearrange struct fib_rule
The ifname member is only used to resolve interface names and is not needed
during rule lookups. The target and ctarget members however are used during
rule lookups and are currently located in a second cacheline.
Move ifname further to the end to make sure both target and ctarget are
located in the same cacheline as other members used during rule lookups.
The layout on 64 bit changes from:
struct fib_rule {
...
u32 table; /* 56 4 */
u8 action; /* 60 1 */
/* XXX 3 bytes hole, try to pack */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 target; /* 64 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 72 8 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
to:
struct fib_rule {
...
u32 table; /* 40 4 */
u8 action; /* 44 1 */
/* XXX 3 bytes hole, try to pack */
u32 target; /* 48 4 */
/* XXX 4 bytes hole, try to pack */
struct fib_rule * ctarget; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
char ifname[16]; /* 64 16 */
struct rcu_head rcu; /* 80 16 */
struct net * fr_net; /* 96 8 */
};
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
RejActioned is used to prevent retransmission when a entity is on the
WAIT_F state, i.e., waiting for a frame with F-bit set due local busy
condition or a expired retransmission timer. (When these two events raise
they send a frame with the Poll bit set and enters in the WAIT_F state to
wait for a frame with the Final bit set.)
The local entity doesn't send I-frames(the data frames) until the receipt
of a frame with F-bit set. When that happens it also set RejActioned to false.
RejActioned is a mandatory feature of ERTM spec.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
As specified by ERTM spec an ERTM channel can acknowledge received
I-frames(the data frames) by sending an I-frame with the proper ReqSeq
value (i.e. ReqSeq is set to BufferSeq). Until now we aren't setting the
ReqSeq value on I-frame control bits. That way we can save sending
S-frames(Supervise frames) only to acknowledge receipt of I-frames. It
is very helpful to the full-duplex channel.
ReqSeq is the packet sequence number sent in an acknowledgement frame to
acknowledge receipt of frames up to (ReqSeq - 1).
BufferSeq controls the receiver buffer, it is used to delay
acknowledgement of new frames to not cause buffer overflow. BufferSeq
value is not increased until frames are pulled by reassembly function.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The tasklet schedule function helpers are just an obfuscation. So remove
them and call the schedule functions directly.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
For future simplification it is important that the hci_recv_frame
function is no longer an inline function. So move it into the module
itself and export it.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Eric Dumazet mentioned in a context of another problem:
"Well, it seems NFS reuses its socket, so maybe we miss some
cleaning as spotted in this old patch"
I've not check under which conditions that actually happens but
if true, we need to make sure we don't accidently leave stale
hints behind when the write queue had to be purged (whether reusing
with NFS can actually happen if purging took place is something I'm
not sure of).
...At least it compiles.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse incoming TCP_COOKIE option(s).
Calculate <SYN,ACK> TCP_COOKIE option.
Send optional <SYN,ACK> data.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1d: define TCP cookie option, extend existing struct's
TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS
TCPCT part 1f: Initiator Cookie => Responder
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Data structures are carefully composed to require minimal additions.
For example, the struct tcp_options_received cookie_plus variable fits
between existing 16-bit and 8-bit variables, requiring no additional
space (taking alignment into consideration). There are no additions to
tcp_request_sock, and only 1 pointer in tcp_sock.
This is a significantly revised implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
The principle difference is using a TCP option to carry the cookie nonce,
instead of a user configured offset in the data. This is more flexible and
less subject to user configuration error. Such a cookie option has been
suggested for many years, and is also useful without SYN data, allowing
several related concepts to use the same extension option.
"Re: SYN floods (was: does history repeat itself?)", September 9, 1996.
http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html
"Re: what a new TCP header might look like", May 12, 1998.
ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail
These functions will also be used in subsequent patches that implement
additional features.
Requires:
TCPCT part 1a: add request_values parameter for sending SYNACK
TCPCT part 1b: generate Responder Cookie secret
TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define sysctl (tcp_cookie_size) to turn on and off the cookie option
default globally, instead of a compiled configuration option.
Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant
data values, retrieving variable cookie values, and other facilities.
Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h,
near its corresponding struct tcp_options_received (prior to changes).
This is a straightforward re-implementation of an earlier (year-old)
patch that no longer applies cleanly, with permission of the original
author (Adam Langley):
http://thread.gmane.org/gmane.linux.network/102586
These functions will also be used in subsequent patches that implement
additional features.
Requires:
net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED
Signed-off-by: William.Allen.Simpson@gmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
Define (missing) hash message size for SHA1.
Define hashing size constants specific to TCP cookies.
Add new function: tcp_cookie_generator().
Maintain global secret values for tcp_cookie_generator().
This is a significantly revised implementation of earlier (15-year-old)
Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform.
Linux RCU technique appears to be well-suited to this application, though
neither of the circular queue items are freed.
These functions will also be used in subsequent patches that implement
additional features.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add optional function parameters associated with sending SYNACK.
These parameters are not needed after sending SYNACK, and are not
used for retransmission. Avoids extending struct tcp_request_sock,
and avoids allocating kernel memory.
Also affects DCCP as it uses common struct request_sock_ops,
but this parameter is currently reserved for future use.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No that all of the callers have been updated to set fields in
struct pernet_operations, and simplified to let the network
namespace core handle the allocation and freeing of the storage
for them, remove the surpurpflous methods and update the docs
to the new style.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To get the full benefit of batched network namespace cleanup netowrk
device deletion needs to be performed by the generic code. When
using register_pernet_gen_device and freeing the data in exit_net
it is impossible to delay allocation until after exit_net has called
as the device uninit methods are no longer safe.
To correct this, and to simplify working with per network namespace data
I have moved allocation and deletion of per network namespace data into
the network namespace core. The core now frees the data only after
all of the network namespace exit routines have run.
Now it is only required to set the new fields .id and .size
in the pernet_operations structure if you want network namespace
data to be managed for you automatically.
This makes the current register_pernet_gen_device and
register_pernet_gen_subsys routines unnecessary. For the moment
I have left them as compatibility wrappers in net_namespace.h
They will be removed once all of the users have been updated.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is fairly common to kill several network namespaces at once. Either
because they are nested one inside the other or because they are cooperating
in multiple machine networking experiments. As the network stack control logic
does not parallelize easily batch up multiple network namespaces existing
together.
To get the full benefit of batching the virtual network devices to be
removed must be all removed in one batch. For that purpose I have added
a loop after the last network device operations have run that batches
up all remaining network devices and deletes them.
An extra benefit is that the reorganization slightly shrinks the size
of the per network namespace data structures replaceing a work_struct
with a list_head.
In a trivial test with 4K namespaces this change reduced the cost of
a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
(at 60% cpu). The bulk of that 44s was spent in inet_twsk_purge.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lennert Buytenhek noticed that delBA handling in mac80211
was broken and has remotely triggerable problems, some of
which are due to some code shuffling I did that ended up
changing the order in which things were done -- this was
commit d75636ef9c
Author: Johannes Berg <johannes@sipsolutions.net>
Date: Tue Feb 10 21:25:53 2009 +0100
mac80211: RX aggregation: clean up stop session
and other parts were already present in the original
commit d92684e660
Author: Ron Rindjunsky <ron.rindjunsky@intel.com>
Date: Mon Jan 28 14:07:22 2008 +0200
mac80211: A-MPDU Tx add delBA from recipient support
The first problem is that I moved a BUG_ON before various
checks -- thereby making it possible to hit. As the comment
indicates, the BUG_ON can be removed since the ampdu_action
callback must already exist when the state is != IDLE.
The second problem isn't easily exploitable but there's a
race condition due to unconditionally setting the state to
OPERATIONAL when a delBA frame is received, even when no
aggregation session was ever initiated. All the drivers
accept stopping the session even then, but that opens a
race window where crashes could happen before the driver
accepts it. Right now, a WARN_ON may happen with non-HT
drivers, while the race opens only for HT drivers.
For this case, there are two things necessary to fix it:
1) don't process spurious delBA frames, and be more careful
about the session state; don't drop the lock
2) HT drivers need to be prepared to handle a session stop
even before the session was really started -- this is
true for all drivers (that support aggregation) but
iwlwifi which can be fixed easily. The other HT drivers
(ath9k and ar9170) are behaving properly already.
Reported-by: Lennert Buytenhek <buytenh@marvell.com>
Cc: stable@kernel.org
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".
This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).
Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).
This commit reverts d0ce92910b and
also removes the now unused transport->last_rto, introduced in
b6157d8e03.
p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an interface to set, delete and flush PMKIDs through nl80211.
Main users would be fullmac devices which firmwares are capable of
generating the RSN IEs for the re-association requests, e.g. iwmc3200wifi.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The RX flags should soon be used only for flags
that cannot change within an a-MPDU, so move the
cooked monitor flag into the RX status flags.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.
If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.
Signed-off-by: Martin Willi <martin@strongswan.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current implementation of max.burst ends up limiting new
data during cwnd decay period. The decay is happening becuase
the connection is idle and we are allowed to fill the congestion
window. The point of max.burst is to limit micro-bursts in response
to large acks. This still happens, as max.burst is still applied
to each transmit opportunity. It will also apply if a very large
send is made (greater then allowed by burst).
Tested-by: Florian Niederbacher <florian.niederbacher@student.uibk.ac.at>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Recent attempt to remove deprecated socket options demonstrated
that removing options from the enum space will have severe
binary compatibility issues. The reason is that it changes
the subsequent enum space and causes option values to be redefined.
To solve this, and to get rid of the ugly double statements for
every option, we simply convert to the #define scheme.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The transport last_time_used variable is rather useless.
It was only used when determining if CWND needs to be updated
due to idle transport. However, idle transport detection was
based on a Heartbeat timer and last_time_used was not incremented
when sending Heartbeats. As a result the check for cwnd reduction
was always true. We can get rid of the variable and just base
our cwnd manipulation on the HB timer (like the code comment sais).
We also have to call into the cwnd manipulation function regardless
of whether HBs are enabled or not. That way we will detect idle
transports if the user has disabled Heartbeats.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP_GET_*_OLD stuffs are schedlued to be removed.
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We currently send window update SACKs every time we free up 1 PMTU
worth of data. That a lot more SACKs then necessary. Instead, we'll
now send back the actuall window every time we send a sack, and do
window-update SACKs when a fraction of the receive buffer has been
opened. The fraction is controlled with a sysctl.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
The "Invalid Stream Identifier" error has a 16 bit reserved
field at the end, thus making the parameter length be 8 bytes.
We've never supplied that reserved field making wireshark
tag the packet as malformed.
Reported-by: Chris Dischino <cdischino@sonusnet.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch implement the sender side for SACK-IMMEDIATELY
extension.
Section 4.1. Sender Side Considerations
Whenever the sender of a DATA chunk can benefit from the
corresponding SACK chunk being sent back without delay, the sender
MAY set the I-bit in the DATA chunk header.
Reasons for setting the I-bit include
o The sender is in the SHUTDOWN-PENDING state.
o The application requests to set the I-bit of the last DATA chunk
of a user message when providing the user message to the SCTP
implementation.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
With WEXT, it happens frequently that the SME
requests an authentication but then deauthenticates
right away because some new parameters came along.
Every time this happens we print a deauth message
and send a deauth frame, but both of that is rather
confusing. Avoid it by aborting the authentication
process silently, and telling cfg80211 about that.
The patch looks larger than it really is:
__cfg80211_auth_remove() is split out from
cfg80211_send_auth_timeout(), there's no new code
except __cfg80211_auth_canceled() (a one-liner) and
the mac80211 bits (7 new lines of code).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Right now all frames mac80211 hands to the driver
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set to
request TX status. This isn't really necessary, only
the injected frames need TX status (the latter for
hostapd) so move setting this flag.
The rate control algorithms also need TX status, but
they don't require it.
Also, rt2x00 uses that bit for its own purposes and
seems to require it being set for all frames, but
that can be fixed in rt2x00.
This doesn't really change anything for any drivers
but in the future drivers using hw-rate control may
opt to not report TX status for frames that don't
have the IEEE80211_TX_CTL_REQ_TX_STATUS flag set.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com> [rt2x00 bits]
Signed-off-by: John W. Linville <linville@tuxdriver.com>
It's very likely that not many devices will support
four-address mode in station or AP mode so introduce
capability bits for both modes, set them in mac80211
and check them when userspace tries to use the mode.
Also, keep track of 4addr in cfg80211 (wireless_dev)
and not in mac80211 any more. mac80211 can also be
improved for the VLAN case by not looking at the
4addr flag but maintaining the station pointer for
it correctly. However, keep track of use_4addr for
station mode in mac80211 to avoid all the derefs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
We've accumulated a number of options for wiphys
which make more sense as flags as we keep adding
more. Convert the existing ones.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits)
cxgb3: fix premature page unmap
ibm_newemac: Fix EMACx_TRTR[TRT] bit shifts
vlan: Fix register_vlan_dev() error path
gro: Fix illegal merging of trailer trash
sungem: Fix Serdes detection.
net: fix mdio section mismatch warning
ppp: fix BUG on non-linear SKB (multilink receive)
ixgbe: Fixing EEH handler to handle more than one error
net: Fix the rollback test in dev_change_name()
Revert "isdn: isdn_ppp: Use SKB list facilities instead of home-grown implementation."
TI Davinci EMAC : Fix Console Hang when bringing the interface down
smsc911x: Fix Console Hang when bringing the interface down.
mISDN: fix error return in HFCmulti_init()
forcedeth: mac address fix
r6040: fix version printing
Bluetooth: Fix regression with L2CAP configuration in Basic Mode
Bluetooth: Select Basic Mode as default for SOCK_SEQPACKET
Bluetooth: Set general bonding security for ACL by default
r8169: Fix receive buffer length when MTU is between 1515 and 1536
can: add the missing netlink get_xstats_size callback
...
Some devices implement the entire rate control in
firmware in some way, like wl1271 or like iwlwifi
which does some things in software but not a lot.
Therefore generic software rate control is rather
useless for them and just adds avoidable overhead
to the transmit path.
It's fairly simple to let drivers indicate that
they do not need rate control, but they need to
fulfil a number of conditions that we encode in
WARN_ONs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The entire aggregation code currently operates on the
hw pointer and station addresses, but that needs to
change to make stations purely per-vif; As one step
preparing for that make the aggregation code callable
with the station, or by the combination of virtual
interface and station address.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
While investigating for network latencies, I found inet_getid() was a
contention point for some workloads, as inet_peer_idlock is shared
by all inet_getid() users regardless of peers.
One way to fix this is to make ip_id_count an atomic_t instead
of __u16, and use atomic_add_return().
In order to keep sizeof(struct inet_peer) = 64 on 64bit arches
tcp_ts_stamp is also converted to __u32 instead of "unsigned long".
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first "node" is supposed to be the cursor used in the for_each.
The second "node" is ment literally and should not be macro expanded:
it's the name of the hlist_node field from the inet_bind_bucket.
This currently works because when inet_bind_bucket_for_each is called
it's argument is still "node".
Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define two symbols needed in both kernel and user space.
Remove old (somewhat incorrect) kernel variant that wasn't used in
most cases. Default should apply to both RMSS and SMSS (RFC2581).
Replace numeric constants with defined symbols.
Stand-alone patch, originally developed for TCPCT.
Signed-off-by: William.Allen.Simpson@gmail.com
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent commit 8da645e101
sctp: Get rid of an extra routing lookup when adding a transport
introduced a regression in the connection setup. The behavior was
different between IPv4 and IPv6. IPv4 case ended up working because the
route lookup routing returned a NULL route, which triggered another
route lookup later in the output patch that succeeded. In the IPv6 case,
a valid route was returned for first call, but we could not find a valid
source address at the time since the source addresses were not set on the
association yet. Thus resulted in a hung connection.
The solution is to set the source addresses on the association prior to
adding peers.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the NL80211_CMD_GET_SURVEY command and an get_survey()
ops that a driver can implement. The goal of this command is to allow a
drivers to report channel survey data (e.g. channel noise, channel
occupation).
For now, only the mechanism to report back channel noise has been
implemented.
In future, there will either be a survey-trigger command --- or the existing
scan-trigger command will be enhanced. This will allow user-space to
request survey for arbitrary channels.
Note: any driver that cannot report channel noise should not report
any value at all, e.g. made-up -92 dBm.
Signed-off-by: Holger Schurig <holgerschurig@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resulting object files have the same MD5 as before.
Signed-off-by: Rui Paulo <rpaulo@gmail.com>
Signed-off-by: Javier Cardona <javier@cozybit.com>
Reviewed-by: Andrey Yurovsky <andrey@cozybit.com>
Tested-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.
In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.
Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
UDP bind() can be O(N^2) in some pathological cases.
Thanks to secondary hash tables, we can make it O(N)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net/fsl_pq_mdio: add module license GPL
can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
can: should not use __dev_get_by_index() without locks
hisax: remove bad udelay call to fix build error on ARM
ipip: Fix handling of DF packets when pmtudisc is OFF
qlge: Set PCIe reset type for EEH to fundamental.
qlge: Fix early exit from mbox cmd complete wait.
ixgbe: fix traffic hangs on Tx with ioatdma loaded
ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
ixgbe: Fix gso_max_size for 82599 when DCB is enabled
macsonic: fix crash on PowerBook 520
NET: cassini, fix lock imbalance
ems_usb: Fix byte order issues on big endian machines
be2net: Bug fix to send config commands to hardware after netdev_register
be2net: fix to set proper flow control on resume
netfilter: xt_connlimit: fix regression caused by zero family value
rt2x00: Don't queue ieee80211 work after USB removal
Revert "ipw2200: fix oops on missing firmware"
decnet: netdevice refcount leak
netfilter: nf_nat: fix NAT issue in 2.6.30.4+
...
This fixes the following bug in the current implementation of
net/xfrm: SAD entries timeouts do not count the time spent by the machine
in the suspended state. This leads to the connectivity problems because
after resuming local machine thinks that the SAD entry is still valid, while
it has already been expired on the remote server.
The cause of this is very simple: the timeouts in the net/xfrm are bound to
the old mod_timer() timers. This patch reassigns them to the
CLOCK_REALTIME hrtimer.
I have been using this version of the patch for a few months on my
machines without any problems. Also run a few stress tests w/o any
issues.
This version of the patch uses tasklet_hrtimer by Peter Zijlstra
(commit 9ba5f0).
This patch is against 2.6.31.4. Please CC me.
Signed-off-by: Yury Polyanskiy <polyanskiy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends udp_table to contain a secondary hash table.
socket anchor for this second hash is free, because UDP
doesnt use skc_bind_node : We define an union to hold
both skc_bind_node & a new hlist_nulls_node udp_portaddr_node
udp_lib_get_port() inserts sockets into second hash chain
(additional cost of one atomic op)
udp_lib_unhash() deletes socket from second hash chain
(additional cost of one atomic op)
Note : No spinlock lockdep annotation is needed, because
lock for the secondary hash chain is always get after
lock for primary hash chain.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Union sk_hash with two u16 hashes for udp (no extra memory taken)
One 16 bits hash on (local port) value (the previous udp 'hash')
One 16 bits hash on (local address, local port) values, initialized
but not yet used. This second hash is using jenkin hash for better
distribution.
Because the 'port' is xored later, a partial hash is performed
on local address + net_hash_mix(net)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds a counter in udp_hslot to keep an accurate count
of sockets present in chain.
This will permit to upcoming UDP lookup algo to chose
the shortest chain when secondary hash is added.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no good reason to not support userspace specifying the
network namespace during device creation, and it makes it easier
to create a network device and pass it to a child network namespace
with a well known name.
We have to be careful to ensure that the target network namespace
for the new device exists through the life of the call. To keep
that logic clear I have factored out the network namespace grabbing
logic into rtnl_link_get_net.
In addtion we need to continue to pass the source network namespace
to the rtnl_link_ops.newlink method so that we can find the base
device source network namespace.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Some devices require that all frames to a station
are flushed when that station goes into powersave
mode before being able to send frames to that
station again when it wakes up or polls -- all in
order to avoid reordering and too many or too few
frames being sent to the station when it polls.
Normally, this is the case unless the station
goes to sleep and wakes up very quickly again.
But in that case, frames for it may be pending
on the hardware queues, and thus races could
happen in the case of multiple hardware queues
used for QoS/WMM. Normally this isn't a problem,
but with the iwlwifi mechanism we need to make
sure the race doesn't happen.
This makes mac80211 able to cope with the race
with driver help by a new WLAN_STA_PS_DRIVER
per-station flag that can be controlled by the
driver and tells mac80211 whether it can transmit
frames or not. This flag must be set according to
very specific rules outlined in the documentation
for the function that controls it.
When we buffer new frames for the station, we
normally set the TIM bit right away, but while
the driver has blocked transmission to that sta
we need to avoid that as well since we cannot
respond to the station if it wakes up due to the
TIM bit. Once the driver unblocks, we can set
the TIM bit.
Similarly, when the station just wakes up, we
need to wait until all other frames are flushed
before we can transmit frames to that station,
so the same applies here, we need to wait for
the driver to give the OK.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add support for two more NL802154 commands: ADD_IFACE and DEL_IFACE,
thus allowing creation and removal of logic WPAN interfaces on the top
of wpan-phy.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
ops->get_phy should increment reference to wpan-phy. As we return
the external structure, we should do refcounting correctly.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Follow the usual pattern of devices registration by adding new function
(wpan_phy_set_dev) that sets child->parent relationship and removing
parent argument from wpan_phy_register call.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
IEEE 802.15.4-2006 defines channel pages that hold channels (max 32 pages,
27 channels per page). Allow the driver to specify supported channels
on pages, other than the first one.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Conflicts:
drivers/net/usb/cdc_ether.c
All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
over NAT. The "cause" of the problem was a fix of unacknowledged data
detection with NAT (commit a3a9f79e36).
However, actually, that fix uncovered a long standing bug in TCP conntrack:
when NAT was enabled, we simply updated the max of the right edge of
the segments we have seen (td_end), by the offset NAT produced with
changing IP/port in the data. However, we did not update the other parameter
(td_maxend) which is affected by the NAT offset. Thus that could drift
away from the correct value and thus resulted breaking active FTP.
The patch below fixes the issue by *not* updating the conntrack parameters
from NAT, but instead taking into account the NAT offsets in conntrack in a
consistent way. (Updating from NAT would be more harder and expensive because
it'd need to re-calculate parameters we already calculated in conntrack.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct can_proto had a capability field which wasn't ever used. It is
dropped entirely.
struct inet_protosw had a capability field which can be more clearly
expressed in the code by just checking if sock->type = SOCK_RAW.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trying to parse the option of a SYN packet that we have
no route entry for should just use global wide defaults
for route entry options.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Tested-by: Valdis.Kletnieks@vt.edu
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have a TODO item to make all station
management dependent on virtual interfaces, I
figured I'd start with pushing such a change
to drivers before more drivers start using the
ieee80211_find_sta() API with a hw pointer and
cause us grief later on.
For now continue exporting the old API in form
of ieee80211_find_sta_by_hw(), but discourage
its use strongly.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This cleanup patch puts struct/union/enum opening braces,
in first line to ease grep games.
struct something
{
becomes :
struct something {
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch below also addresses a couple of other corner cases in readdir
seen with a large (e.g. 64k) msize. I'm not sure what people think of
my co-opting of fid->aux here. I'd be happy to rework if there's a better
way.
When the size of the user supplied buffer passed to readdir is smaller
than the data returned in one go by the 9P read request, v9fs_dir_readdir()
currently discards extra data so that, on the next call, a 9P read
request will be issued with offset < previous offset + bytes returned,
which voilates the constraint described in paragraph 3 of read(5) description.
This patch preseves the leftover data in fid->aux for use in the next call.
Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
This value is unused by mac80211, because it was only
be used by wireless extensions, and turned out to not
be useful there because the quality value needs to be
comparable between scan results and the current value
which is impossible when the qual value is calculated
taking into account noise, for example.
Since it is unused anyway, this patch deprecates it
in the hope that drivers will remove their sometimes
quite expensive calculations of the value.
I'm open to actual uses of the value, but the best
way of using it seems to be what the Intel drivers do
which should probably be generalised if we have noise
values from the hardware.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Compared to ieee80211_beacon_get(), the new function
ieee80211_beacon_get_tim() returns information on the
location and length of the TIM IE, which some drivers
need in order to generate the TIM on the device. The
old function, ieee80211_beacon_get(), becomes a small
static inline wrapper around the new one to not break
all drivers.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
While there may be a case for a driver adding its
own bits of radiotap information, none currently
does. Also, drivers would have to copy the code
to generate the radiotap bits that now mac80211
generates. If some driver in the future needs to
add some driver-specific information I'd expect
that to be in a radiotap vendor namespace and we
can add a different way of passing such data up
and having mac80211 include it.
Additionally, rename IEEE80211_CONF_RADIOTAP to
IEEE80211_CONF_MONITOR since it's still used by
b43(legacy) to obtain per-frame timestamps.
The purpose of this patch is to simplify the RX
code in mac80211 to make it easier to add paged
skb support.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In
commit 601ae7f25a
Author: Bruno Randolf <br1@einfach.org>
Date: Thu May 8 19:22:43 2008 +0200
mac80211: make rx radiotap header more flexible
code was added that tried to align the radiotap header
position in memory based on the radiotap header length.
Quite obviously, that is completely useless.
Instead of trying to do that, use unaligned accesses
to generate the radiotap header. To properly do that,
we also need to mark struct ieee80211_radiotap_header
packed, but that is fine since it's already packed
(and it should be marked packed anyway since its a
wire format).
Cc: Bruno Randolf <br1@einfach.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Policy routing is not looked up by mark on reverse path filtering.
This fixes it.
Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding an accessor to existing dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings
Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a list_head parameter to rtnl_link_ops->dellink() methods
allow us to queue devices on a list, in order to dismantle
them all at once.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_rx() must be called with bottom halves disabled. To simplify
driver development implement ieee80211_rx_ni() which disables BH. This
function must be used when in process context.
Signed-off-by: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Get rid of cookies in cfg80211_send_XXX() functions.
Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.
Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.
This considerably speedups "ip link" operations
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIT tunnels use one rwlock to protect their prl entries.
This first patch adds RCU locking for prl management,
with standard call_rcu() calls.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.
(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce sk_tx_queue_mapping; and functions that set, test and
get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
cache is set/reset, and in socket alloc. Setting txq to -1 and
using valid txq=<0 to n-1> allows the tx path to use the value
of sk_tx_queue_mapping directly instead of subtracting 1 on every
tx.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 9e337b0f (net: annotate inet_timewait_sock bitfields)
added 4/8 bytes in struct inet_timewait_sock.
Fix this by declaring tw_ipv6_offset in the 'flags' bitfield
The 14 bits hole is named tw_pad to make it cleary apparent.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i2400m driver uses two different bits to distinguish how much the
driver is up. i2400m->ready is used to denote that the infrastructure
to communicate with the device is up and running. i2400m->updown is
used to indicate if 'ready' and the device is up and running, ready to
take control and data traffic.
However, all this was pretty dirty and not clear, with many open spots
where race conditions were present.
This commit cleans up the situation by:
- documenting the usage of both bits
- setting them only in specific, well controlled places
(i2400m_dev_start, i2400m_dev_stop)
- ensuring the i2400m workqueue can't get in the middle of the
setting by flushing it when i2400m->ready is set to zero. This
allows the report hook not having to check again for the bit to be
set [rx.c:i2400m_report_hook_work()].
- using i2400m->updown to determine if the device is up and running
instead of the wimax state in i2400m_dev_reset_handle().
- not loosing missed messages sent by the hardware before
i2400m->ready is set. In rx.c, whatever the device sends can be
sent to user space over the message pipes as soon as the wimax
device is registered, so don't wait for i2400m->ready to be set.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
The last users of skb_icv_walk are converted to ahash now,
so skb_icv_walk is unused and can be removed.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ah4 and ah6 are converted to ahash now, so we can remove the
code for the obsolete hash algorithm.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support for ahash algorithms, we add a pointer to a
crypto_ahash to ah_data.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.
Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)
This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.
Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.
This takes into account comments made by:
. Paul Moore: sock_recvmsg is called only for the first datagram,
sock_recvmsg_nosec is used for the rest.
. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
works in the same fashion as the ppoll one.
If the underlying protocol returns a datagram with MSG_OOB set, this
will make recvmmsg return right away with as many datagrams (+ the OOB
one) it has received so far.
. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
datagrams and then recvmsg returns an error, recvmmsg will return
the successfully received datagrams, store the error and return it
in the next call.
This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new socket level option to report number of queue overflows
Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.
Notes:
1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.
2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.
3) This applies cleanly to net-next assuming that commit
977750076d (my af packet cmsg patch) is reverted
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_rx() must be called with softirqs disabled
since the networking stack requires this for netif_rx()
and some code in mac80211 can assume that it can not
be processing its own tasklet and this call at the same
time.
It may be possible to remove this requirement after a
careful audit of mac80211 and doing any needed locking
improvements in it along with disabling softirqs around
netif_rx(). An alternative might be to push all packet
processing to process context in mac80211, instead of
to the tasklet, and add other synchronisation.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since commit a98b65a3 (net: annotate struct sock bitfield), we lost
8 bytes in struct sock on 64bit arches because of
kmemcheck_bitfield_end(flags) misplacement.
Fix this by putting together sk_shutdown, sk_no_check, sk_userlocks,
sk_protocol and sk_type in the 'flags' 32bits bitfield
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for
several setups.
4000 active UDP sockets -> 32 sockets per chain in average. An
incoming frame has to lookup all sockets to find best match, so long
chains hurt latency.
Instead of a fixed size hash table that cant be perfect for every
needs, let UDP stack choose its table size at boot time like tcp/ip
route, using alloc_large_system_hash() helper
Add an optional boot parameter, uhash_entries=x so that an admin can
force a size between 256 and 65536 if needed, like thash_entries and
rhash_entries.
dmesg logs two new lines :
[ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes)
[ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes)
Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non
debugging spinlocks.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's useful to provide firmware and hardware version to user space and have a
generic interface to retrieve them. Users can provide the version information
in bug reports etc.
Add fields for firmware and hardware version to struct wiphy.
(Dropped nl80211 bits for now and modified remaining bits in favor of
ethtool. -- JWL)
Cc: Kalle Valo <kalle.valo@nokia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Refactor wext to
* split out iwpriv handling
* split out iwspy handling
* split out procfs support
* allow cfg80211 to have wireless extensions compat code
w/o CONFIG_WIRELESS_EXT
After this, drivers need to
- select WIRELESS_EXT - for wext support
- select WEXT_PRIV - for iwpriv support
- select WEXT_SPY - for iwspy support
except cfg80211 -- which gets new hooks in wext-core.c
and can then get wext handlers without CONFIG_WIRELESS_EXT.
Wireless extensions procfs support is auto-selected
based on PROC_FS and anything that requires the wext core
(i.e. WIRELESS_EXT or CFG80211_WEXT).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All usages of structure net_proto_ops should be declared const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarek Poplawski a écrit :
>
>
> Hmm... So you made me to do some "real" work here, and guess what?:
> there is one serious checkpatch warning! ;-) Plus, this new parameter
> should be added to the function description. Otherwise:
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
>
> Thanks,
> Jarek P.
>
> PS: I guess full "Don't" would show we really mean it...
Okay :) Here is the last round, before the night !
Thanks again
[RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
is running.
# tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
one (because no estimator is active)
After this patch, tc command output is :
$ tc -s -d qdisc
qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
We add a parameter to gnet_stats_copy_rate_est() function so that
it can use gen_estimator_active(bstats, r), as suggested by Jarek.
This parameter can be NULL if check is not necessary, (htb for
example has a mandatory rate estimator)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 Rapid Deployment (6rd; draft-ietf-softwire-ipv6-6rd) builds upon
mechanisms of 6to4 (RFC3056) to enable a service provider to rapidly
deploy IPv6 unicast service to IPv4 sites to which it provides
customer premise equipment. Like 6to4, it utilizes stateless IPv6 in
IPv4 encapsulation in order to transit IPv4-only network
infrastructure. Unlike 6to4, a 6rd service provider uses an IPv6
prefix of its own in place of the fixed 6to4 prefix.
With this option enabled, the SIT driver offers 6rd functionality by
providing additional ioctl API to configure the IPv6 Prefix for in
stead of static 2002::/16 for 6to4.
Original patch was done by Alexandre Cassen <acassen@freebox.fr>
based on old Internet-Draft.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An incoming datagram must bring into cpu cache *lot* of cache lines,
in particular : (other parts omitted (hash chains, ip route cache...))
On 32bit arches :
offsetof(struct sock, sk_rcvbuf) =0x30 (read)
offsetof(struct sock, sk_lock) =0x34 (rw)
offsetof(struct sock, sk_sleep) =0x50 (read)
offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
offsetof(struct sock, sk_receive_queue)=0x74 (rw)
offsetof(struct sock, sk_forward_alloc)=0x98 (rw)
offsetof(struct sock, sk_callback_lock)=0xcc (rw)
offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
offsetof(struct sock, sk_filter) =0xf8 (read)
offsetof(struct sock, sk_socket) =0x138 (read)
offsetof(struct sock, sk_data_ready) =0x15c (read)
We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
with no fasync() structures. (socket->fasync_list ptr is probably already in cache
because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)
This avoids one cache line load per incoming packet for common cases (no fasync())
We can leave (or even move in a future patch) sk->sk_socket in a cold location
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We currently dirty a cache line to update tunnel device stats
(tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets
counters that already are present in cpu cache, in the cache
line shared with txq->_xmit_lock
This patch extends IPTUNNEL_XMIT() macro to use txq pointer
provided by the caller.
Also &tunnel->dev->stats can be replaced by &dev->stats
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FIB algorithim for IPV4 is set at compile time, but kernel goes through
the overhead of function call indirection at runtime. Save some
cycles by turning the indirect calls to direct calls to either
hash or trie code.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SNMP statistic macros can be signficantly simplified.
This will also reduce code size if the arch supports these operations
in hardware.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
The move away from having drivers assign wireless handlers,
in favour of making cfg80211 assign them, broke the sysfs
registration (the wireless/ dir went missing) because the
handlers are now assigned only after registration, which is
too late.
Fix this by special-casing cfg80211-based devices, all
of which are required to have an ieee80211_ptr, in the
sysfs code, and also using get_wireless_stats() to have
the same values reported as in procfs.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reverts commit 645069299a.
While the code does not actually break anything, it does not completely follow
RFC5214 yet. After talking back with Fred L. Templin, I agree that completing the
ISATAP specific RS/RA code, would pollute the kernel a lot with code that is better
implemented in userspace.
The kernel should not send RS packages for ISATAP at all.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-by: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It seems recursion field from "struct ip_tunnel" is not anymore needed.
recursion prevention is done at the upper level (in dev_queue_xmit()),
since we use HARD_TX_LOCK protection for tunnels.
This avoids a cache line ping pong on "struct ip_tunnel" : This structure
should be now mostly read on xmit and receive paths.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds a persistent, read-only caching facility for
9p clients using the FS-Cache caching backend.
When the fscache facility is enabled, each inode is associated
with a corresponding vcookie which is an index into the FS-Cache
indexing tree. The FS-Cache indexing tree is indexed at 3 levels:
- session object associated with each mount.
- inode/vcookie
- actual data (pages)
A cache tag is chosen randomly for each session. These tags can
be read off /sys/fs/9p/caches and can be passed as a mount-time
parameter to re-attach to the specified caching session.
Signed-off-by: Abhishek Kulkarni <adkulkar@umail.iu.edu>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
After the recent mq change there is the new select_queue qdisc class
method used in tc_modify_qdisc, but it works OK only for direct child
qdiscs of mq qdisc. Grandchildren always get the first tx queue, which
would give wrong qdisc_root etc. results (e.g. for sch_htb as child of
sch_prio). This patch fixes it by using parent's dev_queue for such
grandchildren qdiscs. The select_queue method's return type is changed
BTW.
With feedback from: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes commit e36b9d16c6. The approach
there is to call dev_close()/dev_open() whenever the device type is changed in
order to remap the device IP multicast addresses to HW multicast addresses.
This approach suffers from 2 drawbacks:
*. It assumes tha the device is UP when calling dev_close(), or otherwise
dev_close() has no affect. It is worth to mention that initscripts (Redhat)
and sysconfig (Suse) doesn't act the same in this matter.
*. dev_close() has other side affects, like deleting entries from the routing
table, which might be unnecessary.
The fix here is to directly remap the IP multicast addresses to HW multicast
addresses for a bonding device that changes its type, and nothing else.
Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Moni Shoua <monis@voltaire.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was once upon time so that snd_sthresh was a 16-bit quantity.
...That has not been true for long period of time. I run across
some ancient compares which still seem to trust such legacy.
Put all that magic into a single place, I hopefully found all
of them.
Compile tested, though linking of allyesconfig is ridiculous
nowadays it seems.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove long removed "inet_protocol_base" declaration.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When new child qdiscs are attached to the mq qdisc, they are actually
attached as root qdiscs to the device queues. The lock selection for
new estimators incorrectly picks the root lock of the existing and
to be replaced qdisc, which results in a use-after-free once the old
qdisc has been destroyed.
Mark mq qdisc instances with a new flag and treat qdiscs attached to
mq as children similar to regular root qdiscs.
Additionally prevent estimators from being attached to the mq qdisc
itself since it only updates its byte and packet counters during dumps.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a classful dummy scheduler which can be used as root qdisc
for multiqueue devices and exposes each device queue as a child class.
This allows to address queues individually and graft them similar to regular
classes. Additionally it presents an accumulated view of the statistics of
all real root qdiscs in the dummy root.
Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
- cl_ops->select_queue selects the tx queue number for new child classes.
- qdisc_ops->attach() overrides root qdisc device grafting to attach
non-shared qdiscs to the queues.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It will be used in a following patch by the multiqueue qdisc.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This shrinks the size of struct sctp_association a little.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.
This sysctl configuration allows SCTP to function in such
unconventional networking environments.
Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: Bhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association. Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit. Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.
Now, we store the user 'maxseg' value along with the computed
'frag_point'. We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.
This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself. This means
that for each fragment we go all the way down the stack
and back up. This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).
We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine. The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This patch removes an unused union definition (sctp_cmsg_data_t)
from include/net/sctp/user.h.
Signed-off-by: Rami Rosen <rosenrami@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
mount_root => nfs_root_data => tcp_close => lock sk_lock =>
tcp_send_fin => alloc_skb_fclone => page reclaim
David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.
But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.
CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan devices are currently not multi-queue capable.
We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()
This new method gets num_tx_queues/real_num_tx_queues
from real device.
register_vlan_device() is also handled.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function block inet_connect_sock_af_ops contains no data
make it constant.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are full of unresolved problems, mainly that conversions don't
work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
tasklets can't be killed from softirq context.
And when a qdisc gets reset, that's exactly what we need to do here.
We'll work this out in the net-next-2.6 tree and if warranted we'll
backport that work to -stable.
This reverts the following 3 changesets:
a2cb6a4dd4
("pkt_sched: Fix bogon in tasklet_hrtimer changes.")
38acce2d79
("pkt_sched: Convert CBQ to tasklet_hrtimer.")
ee5f9757ea
("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")
Signed-off-by: David S. Miller <davem@davemloft.net>
These tables are never modified at runtime. Move to read-only
section.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch affects the retransmits_timed_out() function.
Changes:
1) Variables have more meaningful names
2) retransmits_timed_out() has an introductionary comment.
3) Small coding style changes.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.
For that:
* move struct dst_ops into separate header to fix circular dependencies
I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns
For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.
For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.
However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).
In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.
This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.
Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.
The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.
This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.
Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.
Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support of dcbnl setapp/getapp to dcbnl_rtnl_ops in netdev to allow
LLDs to implement their corresponding dcbnl setapp/getapp ops to support
the IEEE 802.1Q DCBX setapp/getapp commands.
Signed-off-by: Yi Zou <yi.zou@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce keepalive_probes(tp) helper, and use it, like
keepalive_time_when(tp) and keepalive_intvl_when(tp)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the libipw naming scheme change, it is no longer necessary for
mac80211 to avoid the ieee80211_rx name clash.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This eliminates the dual definition of ieee80211_channel (and possibly
others), further clarifying who defines what and paving the way for
inclusion of cfg80211.h.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Implement all issues related to RemoteBusy in the RECV state table.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Consitfy nlmsghdr arguments to a couple of functions as preparation
for the next patch, which will constify the netlink message data in
all nfnetlink users.
Signed-off-by: Patrick McHardy <kaber@trash.net>
None of this stuff should execute in hw IRQ context, therefore
use a tasklet_hrtimer so that it runs in softirq context.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
When using DEFER_SETUP on a RFCOMM socket, a SABM frame triggers
authorization which when rejected send a DM response. This is fine
according to the RFCOMM spec:
the responding implementation may replace the "proper" response
on the Multiplexer Control channel with a DM frame, sent on the
referenced DLCI to indicate that the DLCI is not open, and that
the responder would not grant a request to open it later either.
But some stacks doesn't seems to cope with this leaving DLCI 0 open after
receiving DM frame.
To fix it properly a timer was introduced to rfcomm_session which is used
to set a timeout when the last active DLC of a session is unlinked, this
will give the remote stack some time to reply with a proper DISC frame on
DLCI 0 avoiding both sides sending DISC to each other on stacks that
follow the specification and taking care of those who don't by taking
down DLCI 0.
Signed-off-by: Luiz Augusto von Dentz <luiz.dentz@openbossa.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Support for receiving of SREJ frames as specified by the state table.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When L2CAP loses an I-frame we send a SREJ frame to the transmitter side
requesting the lost packet. This patch implement all Recv I-frame events
on SREJ_SENT state table except the ones that deal with SendRej (the REJ
exception at receiver side is yet not implemented).
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Implement CRC16 check for L2CAP packets. FCS is used by Streaming Mode and
Enhanced Retransmission Mode and is a extra check for the packet content.
Using CRC16 is the default, L2CAP won't use FCS only when both side send
a "No FCS" request.
Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
L2CAP uses retransmission and monitor timers to inquiry the other side
about unacked I-frames. After sending each I-frame we (re)start the
retransmission timer. If it expires, we start a monitor timer that send a
S-frame with P bit set and wait for S-frame with F bit set. If monitor
timer expires, try again, at a maximum of L2CAP_DEFAULT_MAX_TX.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
When receiving an I-frame with unexpected txSeq, receiver side start the
recovery procedure by sending a REJ S-frame to the transmitter side. So
the transmitter can re-send the lost I-frame.
This patch just adds a basic support for retransmission, it doesn't
mean that ERTM now has full support for packet retransmission.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
ERTM should use Segmentation and Reassembly to break down a SDU in many
PDUs on sending data to the other side.
On sending packets we queue all 'segments' until end of segmentation and
just the add them to the queue for sending. On receiving we create a new
SKB with the SDU reassembled.
Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This patch adds support for ERTM transfers, without retransmission, with
txWindow up to 63 and with acknowledgement of packets received. Now the
packets are queued before call l2cap_do_send(), so packets couldn't be
sent at the time we call l2cap_sock_sendmsg(). They will be sent in
an asynchronous way on later calls of l2cap_ertm_send(). Besides if an
error occurs on calling l2cap_do_send() we disconnect the channel.
Initially based on a patch from Nathan Holstein <nathan@lampreynetworks.com>
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add support to config_req and config_rsp to configure ERTM and Streaming
mode. If the remote device specifies ERTM or Streaming mode, then the
same mode is proposed. Otherwise ERTM or Basic mode is used. And in case
of a state 2 device, the remote device should propose the same mode. If
not, then the channel gets disconnected.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
To enable Enhanced Retransmission mode it needs to be set via a socket
option. A different mode can be set on a socket, but on listen() and
connect() the mode is checked and ERTM is only allowed if it is enabled
via the module parameter.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
hdev->req_lock is used as mutex so make it a mutex.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The device model itself has no real usable reference counting at the
moment and this causes problems if parents are deleted before their
children. The device model itself handles the memory details of this
correctly, but the uevent order is not consistent. This causes various
problems for systems like HAL or even X.
So until device_put() does a proper cleanup, the device for Bluetooth
connection will be protected with an extra reference counting to ensure
the correct order of uevents when connections are terminated.
This is not an automatic feature. Higher Bluetooth layers like HIDP or
BNEP should grab this new reference to ensure that their uevents are
send before the ones from the parent device.
Based on a report by Brian Rogers <brian@xyzw.org>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
My patch "cfg80211: fix deadlock" broke the code it
was supposed to fix, the scan request checking. But
it's not trivial to put it back the way it was, since
the original patch had a deadlock.
Now do it in a completely new way: queue the check
off to a work struct, where we can freely lock. But
that has some more complications, like needing to
wait for it to be done before the wiphy/rdev can be
destroyed, so some code is required to handle that.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All but two drivers have now stopped using the two
deprecated members radio_enabled and beacon_int,
so it's about time to remove them for good.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Kalle Valo <kalle.valo@iki.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Over time, a whole bunch of drivers have come up
with their own scheme to delay the configure_filter
operation to a workqueue. To be able to simplify
things, allow configure_filter to sleep, and add
a new prepare_multicast callback that drivers that
need the multicast address list implement. This new
callback must be atomic, but most drivers either
don't care or just calculate a hash which can be
done atomically and then uploaded to the hardware
non-atomically.
A cursory look suggests that at76c50x-usb, ar9170,
mwl8k (which is actually very broken now), rt2x00,
wl1251, wl1271 and zd1211 should make use of this
new capability.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IEEE 802.15.4-2006 adds new concept: channel pages, which can contain several
channels. Add support for channel pages in the API and in the fakehard driver.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
In 5e140dfc1f "net: reorder struct Qdisc
for better SMP performance" the definition of struct gnet_stats_basic
changed incompatibly, as copies of this struct are shipped to
userland via netlink.
Restoring old behavior is not welcome, for performance reason.
Fix is to use a private structure for kernel, and
teach gnet_stats_copy_basic() to convert from kernel to user land,
using legacy structure (struct gnet_stats_basic)
Based on a report and initial patch from Michael Spang.
Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sometimes drivers might have a good reason to override
the PS default, like iwlwifi right now where it affects
RX performance significantly at this point. This will
allow them to override the default, if desired, in a
way that users can still change it according to their
trade-off choices, not the driver's, like would happen
if the driver just disabled PS completely then.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This removes the max_bandwidth attribute. It is only ever
written to, and is duplicated by max_bandwidth_khz in the
regulatory code.
Signed-off-by: Pat Erley <pat-lkml@erley.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The memory layout for scan requests was rather wrong,
we put the scan SSIDs before the channels which could
lead to the channel pointers being unaligned in memory.
It turns out that using a pointer to the channel array
isn't necessary anyway since we can embed a zero-length
array into the struct.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
If we have a lot of frames to transmit at once, for
instance with fragmentation, it can be an optimisation
to only tell the DMA engine about them on the last
fragment/frame to avoid banging the IO too much. This
patch allows implementation such an optimisation by
telling the driver when more frames can be expected.
Currently, this is used by mac80211 only on fragmented
frames, but could also be used in the future on other
frames when the queue was full and there are multiple
frames pending.
Note that drivers need to be careful when using this
flag, they need to kick their DMA engines not just
when this flag is clear, but also when the queue gets
full so that progress can be made.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This documents what's required to implement that TX powersave
filter properly wrt. handling hardware queues.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Add some more documentation including an example so that
it's clearer what should be done for TX retries.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order for userspace to be able to figure out whether
it obtained a consistent snapshot of data or not when
using netlink dumps, we need to have a generation number
in each dump message that indicates whether the list has
changed or not -- its value is arbitrary.
This patch adds such a number to all dumps, this needs
some mac80211 involvement to keep track of a generation
number to start with when adding/removing mesh paths or
stations.
The wiphy and netdev lists can be fully handled within
cfg80211, of course, but generation numbers need to be
stored there as well.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
With the move of everything related to the SME from
mac80211 to cfg80211, we lost the ability to send
reassociation frames. This adds them back, but only
for wireless extensions. With the userspace SME, it
shall control assoc vs. reassoc (it already can do
so with the nl80211 interface).
I haven't touched the connect() implementation, so
it is not possible to reassociate with the nl80211
connect primitive. I think that should be done with
the NL80211_CMD_ROAM command, but we'll have to see
how that can be handled in the future, especially
with fullmac chips.
This patch addresses only the immediate regression
we had in mac80211, which previously sent reassoc.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
IEEE802154_SIOC_ADD_SLAVE was used to allocate 802.15.4 interfaces
on the top of radio. It's not used anymore, drop it.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit enqueue's a skb and calls qdisc_run which
dequeue's the skb and xmits it. In most cases, the skb that
is enqueue'd is the same one that is dequeue'd (unless the
queue gets stopped or multiple cpu's write to the same queue
and ends in a race with qdisc_run). For default qdiscs, we
can remove the redundant enqueue/dequeue and simply xmit the
skb since the default qdisc is work-conserving.
The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
default fast queue. The controversial part of the patch is
incrementing qlen when a skb is requeued - this is to avoid
checks like the second line below:
+ } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
>> !q->gso_skb &&
+ !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
Results of a 2 hour testing for multiple netperf sessions (1,
2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
aggregate Mb/s across iterations tested with this version on
System-X boxes with Chelsio 10gbps cards:
----------------------------------
Size | ORG BW NEW BW |
----------------------------------
128K | 156964 159381 |
256K | 158650 162042 |
----------------------------------
Changes from ver1:
1. Move sch_direct_xmit declaration from sch_generic.h to
pkt_sched.h
2. Update qdisc basic statistics for direct xmit path.
3. Set qlen to zero in qdisc_reset.
4. Changed some function names to more meaningful ones.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an interface is configured in the AP mode, the mac80211
implementation doesn't inform the driver to receive PS Poll frames.
It leads to inability to communicate with power-saving stations
reliably.
The FIF_CONTROL flag isn't passed by mac80211 to
ieee80211_ops.configure_filter when an interface is in the AP mode.
And it's ok, because we don't want to receive ACK frames and other
control ones, but only PS Poll ones.
This patch introduces the FIF_PSPOLL filter flag in addition to
FIF_CONTROL, which means for the driver "pass PS Poll frames".
This flag is passed to the driver:
A) When an interface is configured in the AP mode.
B) In all cases, when the FIF_CONTROL flag was passed earlier (in
addition to it).
Signed-off-by: Igor Perminov <igor.perminov@inbox.ru>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since the bss is always set now once we are connected, if the
bss has its own information element we refer to it and pass that
instead of relying on mac80211's parsing.
Now all cfg80211 drivers get country IE support, automatically and
we reduce the call overhead that we had on mac80211 which called this
upon every beacon and instead now call this only upon a successfull
connection by a STA on cfg80211.
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The mac80211 workqueue exists to enable mac80211 and drivers
to queue their own work on a single threaded workqueue. mac80211
takes care to flush the workqueue during suspend but we never
really had requirements on drivers for how they should use
the workqueue in consideration for suspend.
We extend mac80211 to document how the mac80211 workqueue should
be used, how it should not be used and finally move raw access to
the workqueue to mac80211 only. Drivers and mac80211 use helpers
to queue work onto the mac80211 workqueue:
* ieee80211_queue_work()
* ieee80211_queue_delayed_work()
These helpers will now warn if mac80211 already completed its
suspend cycle and someone is trying to queue work. mac80211
flushes the mac80211 workqueue prior to suspend a few times,
but we haven't taken the care to ensure drivers won't add more
work after suspend. To help with this we add a warning when
someone tries to add work and mac80211 already completed the
suspend cycle.
Drivers should ensure they cancel any work or delayed work
in the mac80211 stop() callback.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
A regression was added through patch a4ed90d6:
"cfg80211: respect API on orig_flags on channel for beacon hint"
We did indeed respect _orig flags but the intention was not clearly
stated in the commit log. This patch fixes firmware issues picked
up by iwlwifi when we lift passive scan of beaconing restrictions
on channels its EEPROM has been configured to always enable.
By doing so though we also disallowed beacon hints on devices
registering their wiphy with custom world regulatory domains
enabled, this happens to be currently ath5k, ath9k and ar9170.
The passive scan and beacon restrictions on those devices would
never be lifted even if we did find a beacon and the hardware did
support such enhancements when world roaming.
Since Johannes indicates iwlwifi firmware cannot be changed to
allow beacon hinting we set up a flag now to specifically allow
drivers to disable beacon hints for devices which cannot use them.
We enable the flag on iwlwifi to disable beacon hints and by default
enable it for all other drivers. It should be noted beacon hints lift
passive scan flags and beacon restrictions when we receive a beacon from
an AP on any 5 GHz non-DFS channels, and channels 12-14 on the 2.4 GHz
band. We don't bother with channels 1-11 as those channels are allowed
world wide.
This should fix world roaming for ath5k, ath9k and ar9170, thereby
improving scan time when we receive the first beacon from any AP,
and also enabling beaconing operation (AP/IBSS/Mesh) on cards which
would otherwise not be allowed to do so. Drivers not using custom
regulatory stuff (wiphy_apply_custom_regulatory()) were not affected
by this as the orig_flags for the channels would have been cleared
upon wiphy registration.
I tested this with a world roaming ath5k card.
Cc: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
rfcomm tty may be used before rfcomm_tty_driver initilized,
The problem is that now socket layer init before tty layer, if userspace
program do socket callback right here then oops will happen.
reporting in:
http://marc.info/?l=linux-bluetooth&m=124404919324542&w=2
make 3 changes:
1. remove #ifdef in rfcomm/core.c,
make it blank function when rfcomm tty not selected in rfcomm.h
2. tune the rfcomm_init error patch to ensure
tty driver initilized before rfcomm socket usage.
3. remove __exit for rfcomm_cleanup_sockets
because above change need call it in a __init function.
Reported-by: Oliver Hartkopp <oliver@hartkopp.net>
Tested-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current neigh_periodic_timer() function is fired by timer IRQ, and
scans one hash bucket each round (very litle work in fact)
As we are supposed to scan whole hash table in 15 seconds, this means
neigh_periodic_timer() can be fired very often. (depending on the number
of concurrent hash entries we stored in this table)
Converting this to a workqueue permits scanning whole table, minimizing
icache pollution, and firing this work every 15 seconds, independantly
of hash table size.
This 15 seconds delay is not a hard number, as work is a deferrable one.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since pr_err and friends are used instead of printk there is no point
in keeping IP_VS_ERR and friends. Furthermore make use of '__func__'
instead of hard coded function names.
Signed-off-by: Hannes Eder <heder@google.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Choose saner defaults for xfrm[4|6] gc_thresh values on init
Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024). Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.
For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache. Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.
For ipv6, its a bit trickier. there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024. It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we now have handlers IWESSID for all modes, we can
combine them into one.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Since we now have IWAP handlers for all modes, we can
combine them into one.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Until now we implemented iwfreq for managed mode, we
needed to keep the implementations separate, but now
that we have all versions implemented we can combine
them and export just one handler.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When a station queries us for a PS-poll response, we wrongly
queue the frame on the virtual interface's queue rather than
the pending queue.
Additionally, fix a race condition where we could potentially
send multiple frames to the sleeping station due to using a
station flag rather than a packet flag. When converting to a
packet flag, we can also convert p54 and remove the filter
clearing we added for it.
(Also remove a now dead function)
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Bob Copeland <me@bobcopeland.com>
Tested-by: Bob Copeland <me@bobcopeland.com>
Cc: Christian Lamparter <chunkeey@web.de>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In order to make cfg80211/nl80211 aware of network namespaces,
we have to do the following things:
* del_virtual_intf method takes an interface index rather
than a netdev pointer - simply change this
* nl80211 uses init_net a lot, it changes to use the sender's
network namespace
* scan requests use the interface index, hold a netdev pointer
and reference instead
* we want a wiphy and its associated virtual interfaces to be
in one netns together, so
- we need to be able to change ns for a given interface, so
export dev_change_net_namespace()
- for each virtual interface set the NETIF_F_NETNS_LOCAL
flag, and clear that flag only when the wiphy changes ns,
to disallow breaking this invariant
* when a network namespace goes away, we need to reparent the
wiphy to init_net
* cfg80211 users that support creating virtual interfaces must
create them in the wiphy's namespace, currently this affects
only mac80211
The end result is that you can now switch an entire wiphy into
a different network namespace with the new command
iw phy#<idx> set netns <pid>
and all virtual interfaces will follow (or the operation fails).
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The use of a static buffer in rose2asc() to return its result is not
threadproof and can result in corruption if multiple threads are trying
to use one of the procfs files based on rose2asc().
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the internal 'pending' queue system in place, we can simply
put packets there instead of pushing them off to the master dev,
getting rid of the master interface completely.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All current rate control algorithms agree to send management and no-ack
frames at the lowest rate. They also agree to do this when sta
and the private rate control data is NULL. We add a hlper to mac80211
for this and simplify the rate control algorithm code.
Developers wishing to make enhancements to rate control algorithms
are for broadcast/multicast can opt to not use this in their
gate_rate() mac80211 callback.
Cc: Zhu Yi <yi.zhu@intel.com>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Cc: ipw3945-devel@lists.sourceforge.net
Cc: Gabor Juhos <juhosg@openwrt.org>
Acked-by: Felix Fietkau <nbd@openwrt.org>
Cc: Derek Smithies <derek@indranet.co.nz>
Cc: Chittajit Mitra <Chittajit.Mitra@Atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When we're associated we should be able to send data to
target sta. If we cannot we may be trying to use the incorrect
band to talk to the sta. Lets catch any such cases, warn, and
drop the frames to not invalidate assumptions being made on
rate control algorithms when they have a valid sta to
communicate with. Any such cases should be handled and fixed.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
In cfg80211_scan_request n_channels refers to the total number
of channels to scan. Update the misleading comment accordingly.
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This reworks the key operation in cfg80211, and now only
allows, from userspace, configuring keys (via nl80211)
after the connection has been established (in managed
mode), the IBSS been joined (in IBSS mode), at any time
(in AP[_VLAN] modes) or never for all the other modes.
In order to do shared key authentication correctly, it
is now possible to give a WEP key to the AUTH command.
To configure static WEP keys, these are given to the
CONNECT or IBSS_JOIN command directly, for a userspace
SME it is assumed it will configure it properly after
the connection has been established.
Since mac80211 used to check the default key in IBSS
mode to see whether or not the network is protected,
it needs an update in that area, as well as an update
to make use of the WEP key passed to auth() for shared
key authentication.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This provides a list of sockets with their Phonet bind addresses and
some socket debug informations through /proc/net/phonet.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
include/net/ieee802154/af_ieee802154.h (and others) naming seems to be too long
and redundant. Drop one level of subdirectories.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>