* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
This is based upon an excellent bug report from Eric Dumazet.
tcp_ack() should clear ->icsk_probes_out even if there are packets
outstanding. Otherwise if we get a sequence of ACKs while we do have
packets outstanding over and over again, we'll never clear the
probes_out value and eventually think the connection is too sick and
we'll reset it.
This appears to be some "optimization" added to tcp_ack() in the 2.4.x
timeframe. In 2.2.x, probes_out is pretty much always cleared by
tcp_ack().
Here is Eric's original report:
----------------------------------------
Apparently, we can in some situations reset TCP connections in a couple of seconds when some frames are lost.
In order to reproduce the problem, please try the following program on linux-2.6.25.*
Setup some iptables rules to allow two frames per second sent on loopback interface to tcp destination port 12000
iptables -N SLOWLO
iptables -A SLOWLO -m hashlimit --hashlimit 2 --hashlimit-burst 1 --hashlimit-mode dstip --hashlimit-name slow2 -j ACCEPT
iptables -A SLOWLO -j DROP
iptables -A OUTPUT -o lo -p tcp --dport 12000 -j SLOWLO
Then run the attached program and see the output :
# ./loop
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,1)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,3)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,5)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,7)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,9)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,200ms,11)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,201ms,13)
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 40 127.0.0.1:54455 127.0.0.1:12000 timer:(persist,188ms,15)
write(): Connection timed out
wrote 890 bytes but was interrupted after 9 seconds
ESTAB 0 0 127.0.0.1:12000 127.0.0.1:54455
Exiting read() because no data available (4000 ms timeout).
read 860 bytes
While this tcp session makes progress (sending frames with 50 bytes of payload, every 500ms), linux tcp stack decides to reset it, when tcp_retries 2 is reached (default value : 15)
tcpdump :
15:30:28.856695 IP 127.0.0.1.56554 > 127.0.0.1.12000: S 33788768:33788768(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:28.856711 IP 127.0.0.1.12000 > 127.0.0.1.56554: S 33899253:33899253(0) ack 33788769 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
15:30:29.356947 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 1:61(60) ack 1 win 257
15:30:29.356966 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 61 win 257
15:30:29.866415 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 61:111(50) ack 1 win 257
15:30:29.866427 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 111 win 257
15:30:30.366516 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 111:161(50) ack 1 win 257
15:30:30.366527 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 161 win 257
15:30:30.876196 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 161:211(50) ack 1 win 257
15:30:30.876207 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 211 win 257
15:30:31.376282 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 211:261(50) ack 1 win 257
15:30:31.376290 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 261 win 257
15:30:31.885619 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 261:311(50) ack 1 win 257
15:30:31.885631 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 311 win 257
15:30:32.385705 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 311:361(50) ack 1 win 257
15:30:32.385715 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 361 win 257
15:30:32.895249 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 361:411(50) ack 1 win 257
15:30:32.895266 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 411 win 257
15:30:33.395341 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 411:461(50) ack 1 win 257
15:30:33.395351 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 461 win 257
15:30:33.918085 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 461:511(50) ack 1 win 257
15:30:33.918096 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 511 win 257
15:30:34.418163 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 511:561(50) ack 1 win 257
15:30:34.418172 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 561 win 257
15:30:34.927685 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 561:611(50) ack 1 win 257
15:30:34.927698 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 611 win 257
15:30:35.427757 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 611:661(50) ack 1 win 257
15:30:35.427766 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 661 win 257
15:30:35.937359 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 661:711(50) ack 1 win 257
15:30:35.937376 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 711 win 257
15:30:36.437451 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 711:761(50) ack 1 win 257
15:30:36.437464 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 761 win 257
15:30:36.947022 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 761:811(50) ack 1 win 257
15:30:36.947039 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 811 win 257
15:30:37.447135 IP 127.0.0.1.56554 > 127.0.0.1.12000: P 811:861(50) ack 1 win 257
15:30:37.447203 IP 127.0.0.1.12000 > 127.0.0.1.56554: . ack 861 win 257
15:30:41.448171 IP 127.0.0.1.12000 > 127.0.0.1.56554: F 1:1(0) ack 861 win 257
15:30:41.448189 IP 127.0.0.1.56554 > 127.0.0.1.12000: R 33789629:33789629(0) win 0
Source of program :
/*
* small producer/consumer program.
* setup a listener on 127.0.0.1:12000
* Forks a child
* child connect to 127.0.0.1, and sends 10 bytes on this tcp socket every 100 ms
* Father accepts connection, and read all data
*/
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <stdio.h>
#include <time.h>
#include <sys/poll.h>
int port = 12000;
char buffer[4096];
int main(int argc, char *argv[])
{
int lfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in socket_address;
time_t t0, t1;
int on = 1, sfd, res;
unsigned long total = 0;
socklen_t alen = sizeof(socket_address);
pid_t pid;
time(&t0);
socket_address.sin_family = AF_INET;
socket_address.sin_port = htons(port);
socket_address.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
if (lfd == -1) {
perror("socket()");
return 1;
}
setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(int));
if (bind(lfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
perror("bind");
close(lfd);
return 1;
}
if (listen(lfd, 1) == -1) {
perror("listen()");
close(lfd);
return 1;
}
pid = fork();
if (pid == 0) {
int i, cfd = socket(AF_INET, SOCK_STREAM, 0);
close(lfd);
if (connect(cfd, (struct sockaddr *)&socket_address, sizeof(socket_address)) == -1) {
perror("connect()");
return 1;
}
for (i = 0 ; ;) {
res = write(cfd, "blablabla\n", 10);
if (res > 0) total += res;
else if (res == -1) {
perror("write()");
break;
} else break;
usleep(100000);
if (++i == 10) {
system("ss -on dst 127.0.0.1:12000");
i = 0;
}
}
time(&t1);
fprintf(stderr, "wrote %lu bytes but was interrupted after %g seconds\n", total, difftime(t1, t0));
system("ss -on | grep 127.0.0.1:12000");
close(cfd);
return 0;
}
sfd = accept(lfd, (struct sockaddr *)&socket_address, &alen);
if (sfd == -1) {
perror("accept");
return 1;
}
close(lfd);
while (1) {
struct pollfd pfd[1];
pfd[0].fd = sfd;
pfd[0].events = POLLIN;
if (poll(pfd, 1, 4000) == 0) {
fprintf(stderr, "Exiting read() because no data available (4000 ms timeout).\n");
break;
}
res = read(sfd, buffer, sizeof(buffer));
if (res > 0) total += res;
else if (res == 0) break;
else perror("read()");
}
fprintf(stderr, "read %lu bytes\n", total);
close(sfd);
return 0;
}
----------------------------------------
Signed-off-by: David S. Miller <davem@davemloft.net>
In the merge phase of the CAN subsystem the
af_family_clock_key_strings[] have been added to sock.c in commit
443aef0edd
(lockdep: fixup sk_callback_lock annotation). This trivial patch adds
the missing name for address family 29 (AF_CAN).
Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
It isn't helping anything and we aren't going to be able to change all
the drivers that do queue wakeups in strange situations.
Just letting a noop_qdisc get scheduled will work because when
qdisc_run() executes via net_tx_work() it will simply find no packets
pending when it makes the ->dequeue() call in qdisc_restart.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (24 commits)
I/OAT: I/OAT version 3.0 support
I/OAT: tcp_dma_copybreak default value dependent on I/OAT version
I/OAT: Add watchdog/reset functionality to ioatdma
iop_adma: cleanup iop_chan_xor_slot_count
iop_adma: document how to calculate the minimum descriptor pool size
iop_adma: directly reclaim descriptors on allocation failure
async_tx: make async_tx_test_ack a boolean routine
async_tx: remove depend_tx from async_tx_sync_epilog
async_tx: export async_tx_quiesce
async_tx: fix handling of the "out of descriptor" condition in async_xor
async_tx: ensure the xor destination buffer remains dma-mapped
async_tx: list_for_each_entry_rcu() cleanup
dmaengine: Driver for the Synopsys DesignWare DMA controller
dmaengine: Add slave DMA interface
dmaengine: add DMA_COMPL_SKIP_{SRC,DEST}_UNMAP flags to control dma unmap
dmaengine: Add dma_client parameter to device_alloc_chan_resources
dmatest: Simple DMA memcpy test client
dmaengine: DMA engine driver for Marvell XOR engine
iop-adma: fix platform driver hotplug/coldplug
dmaengine: track the number of clients using a channel
...
Fixed up conflict in drivers/dca/dca-sysfs.c manually
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (82 commits)
ipw2200: Call netif_*_queue() interfaces properly.
netxen: Needs to include linux/vmalloc.h
[netdrvr] atl1d: fix !CONFIG_PM build
r6040: rework init_one error handling
r6040: bump release number to 0.18
r6040: handle RX fifo full and no descriptor interrupts
r6040: change the default waiting time
r6040: use definitions for magic values in descriptor status
r6040: completely rework the RX path
r6040: call napi_disable when puting down the interface and set lp->dev accordingly.
mv643xx_eth: fix NETPOLL build
r6040: rework the RX buffers allocation routine
r6040: fix scheduling while atomic in r6040_tx_timeout
r6040: fix null pointer access and tx timeouts
r6040: prefix all functions with r6040
rndis_host: support WM6 devices as modems
at91_ether: use netstats in net_device structure
sfc: Create one RX queue and interrupt per CPU package by default
sfc: Use a separate workqueue for resets
sfc: I2C adapter initialisation fixes
...
I/OAT DMA performance tuning showed different optimal values of
tcp_dma_copybreak for different I/OAT versions (4096 for 1.2 and 2048
for 2.0). This patch lets ioatdma driver set tcp_dma_copybreak value
according to these results.
[dan.j.williams@intel.com: remove some ifdefs]
Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Change icmp6_dst_gc to return the one value the caller cares about rather
than using call by reference.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Th fib_table_hash is an array, so use kcalloc.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now there is spin_trylock_bh, use it rather than open coding.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This timer normally happens once a minute, there is no need to cause an
early wakeup for it, so align it to next second boundary to safe power.
It can't be deferred because then it could take too long on cleanup or DoS.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FIB timer list is a trivial size structure, avoid indirection and just
put it in existing ns.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 20c2c1fd6c
(sctp: add sctp/remaddr table to complete RFC remote address table OID)
added an unused sctp_assoc_proc_exit() function that seems to have been
unintentionally created when copying the assocs code.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_outq_flush() can now become static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes the needlessly global qdisc_class_hash_alloc() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new address list lock needs to handle the same device layering
issues that the _xmit_lock one does.
This integrates work done by Patrick McHardy.
Signed-off-by: David S. Miller <davem@davemloft.net>
The function header comments have to go with the functions
they are documenting, or things go horribly wrong when we
try to process them with the docbook tools.
Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue'
Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq'
Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; '
Signed-off-by: Dave Jones <davej@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't create symlinks in a class to a device that is not owned by the
class. If the bluetooth subsystem really wants to point to all of the
devices it controls, it needs to create real devices, not fake symlinks.
Cc: Maxim Krasnyansky <maxk@qualcomm.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
As reported by Alexey Dobriyan:
CHECK net/ipv4/tcp_output.c
net/ipv4/tcp_output.c:475:7: warning: dubious: !x & y
And sparse is damn right!
if (unlikely(!OPTION_TS & opts->options))
^^^
size += TCPOLEN_SACKPERM_ALIGNED;
OPTION_TS is (1 << 1), so condition will never trigger.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch clamps the cscov setsockopt values to a maximum of 0xFFFF.
Setsockopt values greater than 0xffff can cause an unwanted
wrap-around. Further, IPv6 jumbograms are not supported (RFC 3838,
3.5), so that values greater than 0xffff are not even useful.
Further changes: fixed a typo in the documentation.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As suggested by Dave:
This patch adds a function to get the driver name from a struct net_device,
and consequently uses this in the watchdog timeout handler to print as
part of the message.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor nit, use size_t for allocation size and kcalloc to allocate
an array. Probably makes no actual code difference.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes the bridge reference count problem and cleanups ipv6 FIB
timer management. Don't use expires field, because it is not a proper
way to test, instead use timer_pending().
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduced by a258860e (netfilter: ctnetlink: add full support for SCTP to ctnetlink):
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: incorrect type in argument 1 (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:483:2: expected unsigned int [unsigned] [usertype] x
net/netfilter/nf_conntrack_proto_sctp.c:483:2: got restricted unsigned int const <noident>
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:483:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: incorrect type in argument 1 (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:487:2: expected unsigned int [unsigned] [usertype] x
net/netfilter/nf_conntrack_proto_sctp.c:487:2: got restricted unsigned int const <noident>
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:487:2: warning: cast from restricted type
net/netfilter/nf_conntrack_proto_sctp.c:532:42: warning: incorrect type in assignment (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:532:42: expected restricted unsigned int <noident>
net/netfilter/nf_conntrack_proto_sctp.c:532:42: got unsigned int
net/netfilter/nf_conntrack_proto_sctp.c:534:39: warning: incorrect type in assignment (different base types)
net/netfilter/nf_conntrack_proto_sctp.c:534:39: expected restricted unsigned int <noident>
net/netfilter/nf_conntrack_proto_sctp.c:534:39: got unsigned int
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RFC2327, the connection information is optional
in the session description since it can be specified in the
media description instead.
My provider does exactly that and does not provide any connection
information in the session description. As a result the new
kernel drops all invite responses.
This patch makes it optional as documented.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some fields to NFLOG to be able to send the complete
hardware header with all necessary informations.
It sends to userspace:
* the type of hardware link
* the lenght of hardware header
* the hardware header
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix netfilter xt_time's time_mt()'s use of do_div() on an s64 by using
div_s64() instead.
This was introduced by patch ee4411a1b1
("[NETFILTER]: x_tables: add xt_time match").
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Initially netfilter has had 64bit counters for conntrack-based accounting, but
it was changed in 2.6.14 to save memory. Unfortunately in-kernel 64bit counters are
still required, for example for "connbytes" extension. However, 64bit counters
waste a lot of memory and it was not possible to enable/disable it runtime.
This patch:
- reimplements accounting with respect to the extension infrastructure,
- makes one global version of seq_print_acct() instead of two seq_print_counters(),
- makes it possible to enable it at boot time (for CONFIG_SYSCTL/CONFIG_SYSFS=n),
- makes it possible to enable/disable it at runtime by sysctl or sysfs,
- extends counters from 32bit to 64bit,
- renames ip_conntrack_counter -> nf_conn_counter,
- enables accounting code unconditionally (no longer depends on CONFIG_NF_CT_ACCT),
- set initial accounting enable state based on CONFIG_NF_CT_ACCT
- removes buggy IPCT_COUNTER_FILLING event handling.
If accounting is enabled newly created connections get additional acct extend.
Old connections are not changed as it is not possible to add a ct_extend area
to confirmed conntrack. Accounting is performed for all connections with
acct extend regardless of a current state of "net.netfilter.nf_conntrack_acct".
Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit a0c80b80e0.
After discussions with Jamal and Herbert on netdev, we should
provide at least minimal prioritization at the qdisc level
even in multiqueue situations.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon feedback from Eric Dumazet and Andi Kleen.
Cure several deficiencies in simple_tx_hash() by using
jhash + reciprocol multiply.
1) Eliminates expensive modulus operation.
2) Makes hash less attackable by using random seed.
3) Eliminates endianness hash distribution issues.
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed unused variable 'skb' in the dev_deactivate_queue function
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-2.6.27' of git://linux-nfs.org/~bfields/linux: (51 commits)
nfsd: nfs4xdr.c do-while is not a compound statement
nfsd: Use C99 initializers in fs/nfsd/nfs4xdr.c
lockd: Pass "struct sockaddr *" to new failover-by-IP function
lockd: get host reference in nlmsvc_create_block() instead of callers
lockd: minor svclock.c style fixes
lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_lock
lockd: eliminate duplicate nlmsvc_lookup_host call from nlmsvc_testlock
lockd: nlm_release_host() checks for NULL, caller needn't
file lock: reorder struct file_lock to save space on 64 bit builds
nfsd: take file and mnt write in nfs4_upgrade_open
nfsd: document open share bit tracking
nfsd: tabulate nfs4 xdr encoding functions
nfsd: dprint operation names
svcrdma: Change WR context get/put to use the kmem cache
svcrdma: Create a kmem cache for the WR contexts
svcrdma: Add flush_scheduled_work to module exit function
svcrdma: Limit ORD based on client's advertised IRD
svcrdma: Remove unused wait q from svcrdma_xprt structure
svcrdma: Remove unneeded spin locks from __svc_rdma_free
svcrdma: Add dma map count and WARN_ON
...
The stab bits can't be referenced uniless the full
packet scheduler layer is enabled.
Reported by Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (1232 commits)
iucv: Fix bad merging.
net_sched: Add size table for qdiscs
net_sched: Add accessor function for packet length for qdiscs
net_sched: Add qdisc_enqueue wrapper
highmem: Export totalhigh_pages.
ipv6 mcast: Omit redundant address family checks in ip6_mc_source().
net: Use standard structures for generic socket address structures.
ipv6 netns: Make several "global" sysctl variables namespace aware.
netns: Use net_eq() to compare net-namespaces for optimization.
ipv6: remove unused macros from net/ipv6.h
ipv6: remove unused parameter from ip6_ra_control
tcp: fix kernel panic with listening_get_next
tcp: Remove redundant checks when setting eff_sacks
tcp: options clean up
tcp: Fix MD5 signatures for non-linear skbs
sctp: Update sctp global memory limit allocations.
sctp: remove unnecessary byteshifting, calculate directly in big-endian
sctp: Allow only 1 listening socket with SO_REUSEADDR
sctp: Do not leak memory on multiple listen() calls
sctp: Support ipv6only AF_INET6 sockets.
...
Move the line disciplines towards a conventional ->ops arrangement. For
the moment the actual 'tty_ldisc' struct in the tty is kept as part of
the tty struct but this can then be changed if it turns out that when it
all settles down we want to refcount ldiscs separately to the tty.
Pull the ldisc code out of /proc and put it with our ldisc code.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add size table functions for qdiscs and calculate packet size in
qdisc_enqueue().
Based on patch by Patrick McHardy
http://marc.info/?l=linux-netdev&m=115201979221729&w=2
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
The caller has alredy checked for them.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use sockaddr_storage{} for generic socket address storage
and ensures proper alignment.
Use sockaddr{} for pointers to omit several casts.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
# BUG: unable to handle kernel NULL pointer dereference at
0000000000000038
IP: [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3
PGD 11e4b9067 PUD 11d16c067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 3
Modules linked in: bridge ipv6 button battery ac loop dm_mod tg3 ext3
jbd edd fan thermal processor thermal_sys hwmon sg sata_svw libata dock
serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 3368, comm: slpd Not tainted 2.6.26-rc2-mm1-lxc4 #1
RIP: 0010:[<ffffffff821ed01e>] [<ffffffff821ed01e>]
listening_get_next+0x50/0x1b3
RSP: 0018:ffff81011e1fbe18 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8100be0ad3c0 RCX: ffff8100619f50c0
RDX: ffffffff82475be0 RSI: ffff81011d9ae6c0 RDI: ffff8100be0ad508
RBP: ffff81011f4f1240 R08: 00000000ffffffff R09: ffff8101185b6780
R10: 000000000000002d R11: ffffffff820fdbfa R12: ffff8100be0ad3c8
R13: ffff8100be0ad6a0 R14: ffff8100be0ad3c0 R15: ffffffff825b8ce0
FS: 00007f6a0ebd16d0(0000) GS:ffff81011f424540(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 000000011dc20000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slpd (pid: 3368, threadinfo ffff81011e1fa000, task
ffff81011f4b8660)
Stack: 00000000000002ee ffff81011f5a57c0 ffff81011f4f1240
ffff81011e1fbe90
0000000000001000 0000000000000000 00007fff16bf2590 ffffffff821ed9c8
ffff81011f5a57c0 ffff81011d9ae6c0 000000000000041a ffffffff820b0abd
Call Trace:
[<ffffffff821ed9c8>] ? tcp_seq_next+0x34/0x7e
[<ffffffff820b0abd>] ? seq_read+0x1aa/0x29d
[<ffffffff820d21b4>] ? proc_reg_read+0x73/0x8e
[<ffffffff8209769c>] ? vfs_read+0xaa/0x152
[<ffffffff82097a7d>] ? sys_read+0x45/0x6e
[<ffffffff8200bd2b>] ? system_call_after_swapgs+0x7b/0x80
Code: 31 a9 25 00 e9 b5 00 00 00 ff 45 20 83 7d 0c 01 75 79 4c 8b 75 10
48 8b 0e eb 1d 48 8b 51 20 0f b7 45 08 39 02 75 0e 48 8b 41 28 <4c> 39
78 38 0f 84 93 00 00 00 48 8b 09 48 85 c9 75 de 8b 55 1c
RIP [<ffffffff821ed01e>] listening_get_next+0x50/0x1b3
RSP <ffff81011e1fbe18>
CR2: 0000000000000038
This kernel panic appears with CONFIG_NET_NS=y.
How to reproduce ?
On the buggy host (host A)
* ip addr add 1.2.3.4/24 dev eth0
On a remote host (host B)
* ip addr add 1.2.3.5/24 dev eth0
* iptables -A INPUT -p tcp -s 1.2.3.4 -j DROP
* ssh 1.2.3.4
On host A:
* netstat -ta or cat /proc/net/tcp
This bug happens when reading /proc/net/tcp[6] when there is a req_sock
at the SYN_RECV state.
When a SYN is received the minisock is created and the sk field is set to
NULL. In the listening_get_next function, we try to look at the field
req->sk->sk_net.
When looking at how to fix this bug, I noticed that is useless to do
the check for the minisock belonging to the namespace. A minisock belongs
to a listen point and this one is per namespace, so when browsing the
minisock they are always per namespace.
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove redundant checks when setting eff_sacks and make the number of SACKs a
compile time constant. Now that the options code knows how many SACK blocks can
fit in the header, we don't need to have the SACK code guessing at it.
Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should fix the following bugs:
* Connections with MD5 signatures produce invalid packets whenever SACK
options are included
* MD5 signatures are counted twice in the MSS calculations
Behaviour changes:
* A SYN with MD5 + SACK + TS elicits a SYNACK with MD5 + SACK
This is because we can't fit any SACK blocks in a packet with MD5 + TS
options. There was discussion about disabling SACK rather than TS in
order to fit in better with old, buggy kernels, but that was deemed to
be unnecessary.
* SYNs with MD5 don't include a TS option
See above.
Additionally, it removes a bunch of duplicated logic for calculating options,
which should help avoid these sort of issues in the future.
Signed-off-by: Adam Langley <agl@imperialviolet.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the MD5 code assumes that the SKBs are linear and, in the case
that they aren't, happily goes off and hashes off the end of the SKB and
into random memory.
Reported by Stephen Hemminger in [1]. Advice thanks to Stephen and Evgeniy
Polyakov. Also includes a couple of missed route_caps from Stephen's patch
in [2].
[1] http://marc.info/?l=linux-netdev&m=121445989106145&w=2
[2] http://marc.info/?l=linux-netdev&m=121459157816964&w=2
Signed-off-by: Adam Langley <agl@imperialviolet.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update sctp global memory limit allocations to be the same as TCP.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple socket bind to the same port with SO_REUSEADDR,
only 1 can be listining.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP permits multiple listen call and on subsequent calls
we leak he memory allocated for the crypto transforms.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
valgrind reports uninizialized memory accesses when running
sctp inside the network simulation cradle simulator:
Conditional jump or move depends on uninitialised value(s)
at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324)
by 0x57427DA: sctp_packet_transmit (output.c:403)
by 0x5710EFF: sctp_outq_flush (outqueue.c:824)
by 0x5710B88: sctp_outq_uncork (outqueue.c:701)
by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548)
by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976)
by 0x5744460: sctp_do_sm (sm_sideeffect.c:945)
by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94)
by 0x5725C04: __sctp_connect (socket.c:1094)
by 0x57297DC: sctp_connect (socket.c:3297)
Conditional jump or move depends on uninitialised value(s)
at 0x575D3A5: mod_timer (timer.c:630)
by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555)
by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448)
by 0x5753607: sctp_side_effects (sm_sideeffect.c:976)
by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945)
by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474)
by 0x573347F: sctp_inq_push (inqueue.c:104)
by 0x572EF93: sctp_rcv (input.c:256)
by 0x5689623: ip_local_deliver_finish (ip_input.c:230)
by 0x5689759: ip_local_deliver (ip_input.c:268)
by 0x5689CAC: ip_rcv_finish (dst.h:246)
#1 is due to "if (t->pmtu_pending)".
8a4794914f "[SCTP] Flag a pmtu change request"
suggests it should be initialized to 0.
#2 is the heartbeat timer 'expires' value, which is uninizialised, but
test by mod_timer().
T3_rtx_timer seems to be affected by the same problem, so initialize it, too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This puts CONFIG_PROC_FS defines around the proc init/exit functions
and also avoids compiling proc.c if procfs is not supported.
Also make SCTP_DBG_OBJCNT depend on procfs.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the metrics (RTT, RTTVAR and RTAX_RTO_MIN) are stored in
kernel units (jiffies) and this leaks out through the netlink API to
user space where the units for jiffies are unknown.
This patches changes the kernel to convert to/from milliseconds. This
changes the ABI, but milliseconds seemed like the most natural unit
for these parameters. Values available via syscall in
/proc/net/rt_cache and netlink will be in milliseconds.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like noop_qdisc, it needs a dummy backpointer and
explicit qdisc->q.lock initialization.
Based upon a report by Stephen Hemminger.
Signed-off-by: David S. Miller <davem@davemloft.net>
Idea is from Patrick McHardy.
Instead of managing the list of qdiscs on the device level, manage it
in the root qdisc of a netdev_queue. This solves all kinds of
visibility issues during qdisc destruction.
The way to iterate over all qdiscs of a netdev_queue is to visit
the netdev_queue->qdisc, and then traverse it's list.
The only special case is to ignore builting qdiscs at the root when
dumping or doing a qdisc_lookup(). That was not needed previously
because builtin qdiscs were not added to the device's qdisc_list.
Signed-off-by: David S. Miller <davem@davemloft.net>
The u32_list is just an indirect way of maintaining a reference
to a U32 node on a per-qdisc basis.
Just add an explicit node pointer for u32 to struct Qdisc an do
away with this global list.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new sockopt to reserve some headroom in the mmaped ring frames in
front of the packet payload. This can be used f.i. when the VLAN header
needs to be (re)constructed to avoid moving the entire payload.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* This patch replaces the dangerous lvalue version of cpumask_of_cpu
with new cpumask_of_cpu_ptr macros. These are patterned after the
node_to_cpumask_ptr macros.
In general terms, if there is a cpumask_of_cpu_map[] then a pointer to
the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map
is provided when there is a large NR_CPUS count, reducing
greatly the amount of code generated and stack space used for
cpumask_of_cpu(). The pointer to the cpumask_t value is needed for
calling set_cpus_allowed_ptr() to reduce the amount of stack space
needed to pass the cpumask_t value.
If there isn't a cpumask_of_cpu_map[], then a temporary variable is
declared and filled in with value from cpumask_of_cpu(cpu) as well as
a pointer variable pointing to this temporary variable. Afterwards,
the pointer is used to reference the cpumask value. The compiler
will optimize out the extra dereference through the pointer as well
as the stack space used for the pointer, resulting in identical code.
A good example of the orthogonal usages is in net/sunrpc/svc.c:
case SVC_POOL_PERCPU:
{
unsigned int cpu = m->pool_to[pidx];
cpumask_of_cpu_ptr(cpumask, cpu);
*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, cpumask);
return 1;
}
case SVC_POOL_PERNODE:
{
unsigned int node = m->pool_to[pidx];
node_to_cpumask_ptr(nodecpumask, node);
*oldmask = current->cpus_allowed;
set_cpus_allowed_ptr(current, nodecpumask);
return 1;
}
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There are already 7 of them - time to kill some duplicate code.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After all this stuff is moved outside, this function can look better.
Besides, I tuned the error path in ip_proc_init_net to make it have
only 2 exit points, not 3.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one has become per-net long ago, but the appropriate file
is per-net only now.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the statistics shown in this file have been made per-net already.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all the shown in it statistics is netnsizated, time to
show it in appropriate net.
The appropriate net init/exit ops already exist - they make
the sockstat file per net - so just extend them.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After moving all the stuff outside this function it looks
a bit ugly - make it look better.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Proc temporary uses stats from init_net.
BTW, TCP_XXX_STATS are beautiful (w/o do { } while (0) facing) again :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These ones are currently empty, but stuff from init_ipv4_mibs will
sequentially migrate there.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of 'pfifo_fast' we have just plain 'fifo_fast'.
No priority queues, just a straight FIFO.
This is necessary in order to legally have a seperate
qdisc per queue in multi-TX-queue setups, and thus get
full parallelization.
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the destruction of the old queue into qdisc_graft().
When operating on a root qdisc (ie. "parent == NULL"), apply
the operation to all queues. The caller has grabbed a single
implicit reference for this graft, therefore when we apply the
change to more than one queue we must grab additional qdisc
references.
Otherwise, we are operating on a class of a specific parent qdisc, and
therefore no multiqueue handling is necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Lock the root of the qdisc being operated upon.
All explicit references to qdisc_tree_lock() are now gone.
The only remaining uses are via the sch_tree_{lock,unlock}()
and tcf_tree_{lock,unlock}() macros.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just wants the qdisc tree for the filter to be synchronized.
So just BH lock qdisc_root_lock(q) instead.
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows less strict control of access to the qdisc attached to a
netdev_queue. It is even allowed to enqueue into a qdisc which is
in the process of being destroyed. The RCU handler will toss out
those packets.
We will need this to handle sharing of a qdisc amongst multiple
TX queues. In such a setup the lock has to be shared, so will
be inside of the qdisc itself. At which point the netdev_queue
lock cannot be used to hard synchronize access to the ->qdisc
pointer.
One operation we have to keep inside of qdisc_destroy() is the list
deletion. It is the only piece of state visible after the RCU quiesce
period, so we have to undo it early and under the appropriate locking.
The operations in the RCU handler do not need any looking because the
qdisc tree is no longer visible to anything at that point.
Signed-off-by: David S. Miller <davem@davemloft.net>
We are registering the device, there is no way anyone can get
at this object's qdiscs yet in any meaningful way.
Signed-off-by: David S. Miller <davem@davemloft.net>
When we have shared qdiscs, packets come out of the qdiscs
for multiple transmit queues.
Therefore it doesn't make any sense to schedule the transmit
queue when logically we cannot know ahead of time the TX
queue of the SKB that the qdisc->dequeue() will give us.
Just for sanity I added a BUG check to make sure we never
get into a state where the noop_qdisc is scheduled.
Signed-off-by: David S. Miller <davem@davemloft.net>
When code wants to lock the qdisc tree state, the logic
operation it's doing is locking the top-level qdisc that
sits of the root of the netdev_queue.
Add qdisc_root_lock() to represent this and convert the
easiest cases.
In order for this to work out in all cases, we have to
hook up the noop_qdisc to a dummy netdev_queue.
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently it is associated with a netdev_queue, but when we have
qdisc sharing that no longer makes any sense.
Signed-off-by: David S. Miller <davem@davemloft.net>
We liberate any dangling gso_skb during qdisc destruction.
It really only matters for the root qdisc. But when qdiscs
can be shared by multiple netdev_queue objects, we can't
have the gso_skb in the netdev_queue any more.
Signed-off-by: David S. Miller <davem@davemloft.net>
It just xor hashes over IPv4/IPv6 addresses and ports of transport.
The only assumption it makes is that skb_network_header() is set
correctly.
With bug fixes from Eric Dumazet.
Signed-off-by: David S. Miller <davem@davemloft.net>
The only behavior change is that we do not drop packets under any
circumstances. If that is absolutely needed, we could easily add it
back.
With cleanups and help from Johannes Berg.
Signed-off-by: David S. Miller <davem@davemloft.net>
Devices or device layers can set this to control the queue selection
performed by dev_pick_tx().
This function runs under RCU protection, which allows overriding
functions to have some way of synchronizing with things like dynamic
->real_num_tx_queues adjustments.
This makes the spinlock prefetch in dev_queue_xmit() a little bit
less effective, but that's the price right now for correctness.
Signed-off-by: David S. Miller <davem@davemloft.net>
This effectively "flips the switch" by making the core networking
and multiqueue-aware drivers use the new TX multiqueue structures.
Non-multiqueue drivers need no changes. The interfaces they use such
as netif_stop_queue() degenerate into an operation on TX queue zero.
So everything "just works" for them.
Code that really wants to do "X" to all TX queues now invokes a
routine that does so, such as netif_tx_wake_all_queues(),
netif_tx_stop_all_queues(), etc.
pktgen and netpoll required a little bit more surgery than the others.
In particular the pktgen changes, whilst functional, could be largely
improved. The initial check in pktgen_xmit() will sometimes check the
wrong queue, which is mostly harmless. The thing to do is probably to
invoke fill_packet() earlier.
The bulk of the netpoll changes is to make the code operate solely on
the TX queue indicated by by the SKB queue mapping.
Setting of the SKB queue mapping is entirely confined inside of
net/core/dev.c:dev_pick_tx(). If we end up needing any kind of
special semantics (drops, for example) it will be implemented here.
Finally, we now have a "real_num_tx_queues" which is where the driver
indicates how many TX queues are actually active.
With IGB changes from Jeff Kirsher.
Signed-off-by: David S. Miller <davem@davemloft.net>
This actually fixes a bug added by the RR scheduler changes. The
->bands and ->prio2band parameters were being set outside of the
sch_tree_lock() and thus could result in strange behavior and
inconsistencies.
It might be possible, in the new design (where there will be one qdisc
per device TX queue) to allow similar functionality via a TX hash
algorithm for RR but I really see no reason to export this aspect of
how these multiqueue cards actually implement the scheduling of the
the individual DMA TX rings and the single physical MAC/PHY port.
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need for a feature bit for something that
can be tested by simply checking the TX queue count.
Signed-off-by: David S. Miller <davem@davemloft.net>
alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.
Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces. This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.
Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: David S. Miller <davem@davemloft.net>
Increase reliability by retrying to send JoinIn messages after memory
allocation failures on each TRANSMIT_PDU event until it succeeds.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
in __neigh_event_send, if we have a neighbour entry which is in
NUD_INCOMPLETE state, we enqueue any outbound frames to that neighbour
to the neighbours arp_queue, which is default capped to a length of 3
skbs. If that queue exceeds its set length, it will drop an skb on
the queue to enqueue the newly arrived skb. This results in a drop
for which we have no statistics incremented. This patch adds an
unresolved_discards stat to /proc/net/stat/ndisc_cache to track these
lost frames.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Done with NET_XXX_STATS macros :)
To be continued...
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one is tricky.
The thing is that this macro is only used when killing tw buckets,
but since this killer is promiscuous wrt to which net each particular
tw belongs to, I have to use it only when NET_NS is off. When the net
namespaces are on, I use the INET_INC_STATS_BH for each bucket.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
These places have a tcp_sock, but we'd prefer the sock itself to
get net from it. Fortunately, tcp_sk macro is just a type cast, so
this replace is really cheap.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tcp_enter_memory_pressure calls NET_INC_STATS, but doesn't
have where to get the net from.
I decided to add a sk argument, not the net itself, only to factor
all the required sock_net(sk) calls inside the enter_memory_pressure
callback itself.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Same as before - the sock is always there to get the net from,
but there are also some places with the net already saved on
the stack.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fortunately (almost) all the TCP code has a sock to get the net from :)
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This one sets TCP MIBs after zeroing them, and thus requires
the net.
The existing single caller can use init_net (temporarily).
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the same as the first patch in the set, but preparing
the net for TCP_XXX_STATS - save the struct net on the stack
where required and possible.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Very simple - only ip_evictor (fragments) requires such.
This patch ends up the IP_XXX_STATS patching.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the callers already have either the net itself, or the place
where to get it from.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some places, that deal with IP statistics already have where to
get a struct net from, but use it directly, without declaring
a separate variable on the stack.
So, save this net on the stack for future IP_XXX_STATS macros.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change PULLHUP to POLLHUP in tcp_poll comments and clean up another
comment for grammar and coding style.
Signed-off-by: Will Newton <will.newton@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/core/skbuff.c:1335:5: warning: symbol '__skb_splice_bits' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enhances the synchronization of the closing connections
between the master and the backup director. It prevents the closed
connections to expire with the 15 min timeout of the ESTABLISHED
state on the backup and makes them expire as they would do on the
master with much shorter timeouts.
Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
So that kthread_stop() can wake up the thread and we don't have to wait one
second in the worst case for the daemon to actually stop.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Instead of doing an endless loop with sleeping for one second, we now put the
backup thread onto the mcast socket wait queue and it gets woken up as soon as
we have data to process.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
This also moves the setup code out of the daemons, so that we're able to
return proper error codes to user space. The current code will return success
to user space when the daemon is started with an invald mcast interface. With
these changes we get an appropriate "No such device" error.
We longer need our own completion to be sure the daemons are actually running,
because they no longer contain code that can fail and kthread_run() takes care
of the rest.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
The additional information we now return to the caller is currently not used,
but will be used to return errors to user space.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
There's no need to do it at runtime, the values are constant.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Acked-by: Simon Horman <horms@verge.net.au>
Push it into those callback functions that actually need it.
Note that all the NFS operations use their own locking, so don't need the
BKL. Ditto for the rpcbind client.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce a new API to register RPC services on IPv6 interfaces to allow
the NFS server and lockd to advertise on IPv6 networks.
Unlike rpcb_register(), the new rpcb_v4_register() function uses rpcbind
protocol version 4 to contact the local rpcbind daemon. The version 4
SET/UNSET procedures allow services to register address families besides
AF_INET, register at specific network interfaces, and register transport
protocols besides UDP and TCP. All of this functionality is exposed via
the new rpcb_v4_register() kernel API.
A user-space rpcbind daemon implementation that supports version 4 of the
rpcbind protocol is required in order to make use of this new API.
Note that rpcbind version 3 is sufficient to support the new rpcbind
facilities listed above, but most extant implementations use version 4.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
rpcbind version 4 registration will reuse part of rpcb_register, so just
split it out into a separate function now.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: Callers that required a privileged source port now use
rpcb_create_local(), so we can remove the @privileged argument from
rpcb_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add rpcb_create_local() for use by rpcb_register() and upcoming IPv6
registration functions.
Ensure any errors encountered by rpcb_create_local() are properly
reported.
We can also use a statically allocated constant loopback socket address
instead of one allocated on the stack and initialized every time the
function is called.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The rpcbind versions 3 and 4 SET and UNSET procedures use the same
arguments as the GETADDR procedure.
While definitely a bug, this hasn't been a problem so far since the
kernel hasn't used version 3 or 4 SET and UNSET. But this will change
in just a moment.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits)
generic-ipi: more merge fallout
generic-ipi: merge fix
x86, visws: use mach-default/entry_arch.h
x86, visws: fix generic-ipi build
generic-ipi: fixlet
generic-ipi: fix s390 build bug
generic-ipi: fix linux-next tree build failure
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix: "smp_call_function: get rid of the unused nonatomic/retry argument"
fix "smp_call_function: get rid of the unused nonatomic/retry argument"
on_each_cpu(): kill unused 'retry' parameter
smp_call_function: get rid of the unused nonatomic/retry argument
sh: convert to generic helpers for IPI function calls
parisc: convert to generic helpers for IPI function calls
mips: convert to generic helpers for IPI function calls
m32r: convert to generic helpers for IPI function calls
arm: convert to generic helpers for IPI function calls
alpha: convert to generic helpers for IPI function calls
ia64: convert to generic helpers for IPI function calls
powerpc: convert to generic helpers for IPI function calls
...
Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
Fix memory leak in error path in CPU_UP_PREPARE notifier.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- move all of the details on offsets, lengths and buffers into a
single function instead of doing these operation from multiple places
- use a bottom up approach: try to avoid details in the high level
functions, introduce them gradually as we go deeper in the function
call stack
With helpful feedback from Jarek Poplawski.
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Acked-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have a specific lock to protect the network
device unicast and multicast lists, remove extraneous
grabs of the TX lock in cases where the code only needs
address list protection.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netif_addr_{lock,unlock}{,_bh}() helpers.
Use them to protect operations that operate on or read
the network device unicast and multicast address lists.
Also use them in cases where the code simply wants to
block calls into the driver's ->set_rx_mode() and
->set_multicast_list() methods.
Signed-off-by: David S. Miller <davem@davemloft.net>
This will be used to protect the per-device unicast and multicast
address lists, as well as the callbacks into the drivers which
configure such state such as ->set_rx_mode() and ->set_multicast_list().
Signed-off-by: David S. Miller <davem@davemloft.net>
Some places, that deal with ICMP statistics already have where
to get a struct net from, but use it directly, without declaring
a separate variable on the stack.
Since I will need this net soon, I declare a struct net on the
stack and use it in the existing places in a separate patch not
to spoil the future ones.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This routine deals with ICMP statistics, but doesn't have a
struct net at hands, so add one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove excessive comments and debugging, use NETDEV_TX codes,
remove some empty lines.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some debugging and excessive comments, merge the two
dev_hard_header calls into one.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow to query LRO settings of underlying device when VLAN RX
acceleration is used.
Suggested by Ben Hutchings <bhutchings@solarflare.com>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store the VLAN tag in the auxillary data/tpacket2_hdr so userspace can
properly deal with hardware VLAN tagging/stripping.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tpacket_hdr is not 64 bit clean due to use of an unsigned long
and can't be extended because the following struct sockaddr_ll needs
to be at a fixed offset.
Add support for a version 2 tpacket protocol that removes these
limitations.
Userspace can query the header size through a new getsockopt option
and change the protocol version through a setsockopt option. The
changes needed to switch to the new protocol version are:
1. replace struct tpacket_hdr by struct tpacket2_hdr
2. query header len and save
3. set protocol version to 2
- set up ring as usual
4. for getting the sockaddr_ll, use (void *)hdr + TPACKET_ALIGN(hdrlen)
instead of (void *)hdr + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
Steps 2 and 4 can be omitted if the struct sockaddr_ll isn't needed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VLAN header stripping is used, packets currently bypass packet
sockets (and other network taps) completely. For locally existing
VLANs, they appear directly on the VLAN device, for unknown VLANs
they are silently dropped.
Add a new function netif_nit_deliver() to deliver incoming packets
to all network interface taps and use it in __vlan_hwaccel_rx() to
make VLAN packets visible on the underlying device.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a real skb member to store the skb to avoid clashes with qdiscs,
which are allowed to use the cb area themselves. As currently only real
devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
invalidation is neccessary.
The new member fills a hole on 64 bit, the skb layout changes from:
__u32 mark; /* 172 4 */
sk_buff_data_t transport_header; /* 176 4 */
sk_buff_data_t network_header; /* 180 4 */
sk_buff_data_t mac_header; /* 184 4 */
sk_buff_data_t tail; /* 188 4 */
/* --- cacheline 3 boundary (192 bytes) --- */
sk_buff_data_t end; /* 192 4 */
/* XXX 4 bytes hole, try to pack */
to
__u32 mark; /* 172 4 */
__u16 vlan_tci; /* 176 2 */
/* XXX 2 bytes hole, try to pack */
sk_buff_data_t transport_header; /* 180 4 */
sk_buff_data_t network_header; /* 184 4 */
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch simplifies and speeds up TIPC's algorithm for identifying
on-node and off-node destinations that overlap a multicast name
sequence range. Rather than traversing the list of all known name
publications within the cluster, it now traverses the (potentially
much shorter) list of name publications made by the node itself, and
determines if any off-node destinations exist by comparing the sizes
of the two lists. (Since the node list must be a subset of the
cluster list, a difference in sizes means that at least one off-node
destination must exist.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that TIPC configuration commands that display info
about neighboring nodes and their links take the spinlocks that
protect the node list and link lists from changing while the lists
are being traversed.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensures that TIPC's multicast message name lookup
algorithm does individualized scope checking for each published
name it examines. Previously, scope checking was only done for
the first name in a name table publication list, which could
result in incoming multicast messages being delivered to ports
publishing names with "node" scope, or not being delivered to
ports publishing names with "cluster" or "zone" scope.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch corrects many places where TIPC routines indicated
successful completion by returning TIPC_OK instead of 0.
(The TIPC_OK symbol has the value 0, but it should only be used
in contexts that deal with the error code field of a TIPC
message header.)
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch ensurs that accept() returns successfully even when
the newly created socket is immediately disconnected by its peer.
Previously, accept() would fail if it was unable to pass back
the optional address info for the socket's peer before the
socket became disconnected; TIPC now allows accept() to gather
peer address information after disconnection. As a bonus, the
revised code accesses the socket's port more efficiently, without
the overhead incurred by a reference table lookup.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates an unnecessary pointer dereference when
accessing a stream-based socket's receive queue.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch eliminates an unneeded parameter when creating a low-level
TIPC port object. Instead of returning both the pointer to the port
structure and the port's reference ID, it now returns only the pointer
since the port structure contains the reference ID as one of its fields.
Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we are trying to place the information from the kernel to
1, 2, 3 and 4 pages sequentially. These pages are allocated via slab.
Though, from the slab point of view steps 3 and 4 are equivalent on
most architectures. So, lets skip 3 pages attempt.
By the way, should we switch from .doit to .dumpit interface here?
The amount of data seems quite big for me.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes references in drivers/net/wan/Kconfig and
net/wanrouter/Kconfig to Documentation/networking/wan-router.txt
which was removed in commit 99971e70fd
("[WANPIPE]: Forgotten bits of Sangoma drivers removal.").
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
Here, we check all positive increment for promiscuity and allmulti
to get error return.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
Here, we check the positive increment for allmulti to get error return.
PS: For unwinding tunnel creating, we let ipip->ioctl() to handle it.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
Here, we check the positive increment for allmulti to get error return.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
Here, we check the positive increment for promiscuity to get error return.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_promiscuity/allmulti might overflow. Commit: "netdevice: Fix
promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
In af_packet, we check all positive increment for promiscuity and
allmulti to get error return.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'core/softirq' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softirq: remove irqs_disabled warning from local_bh_enable
softirq: remove initialization of static per-cpu variable
Remove argument from open_softirq which is always NULL
This patch avoids adding STAs that don't belong to our IBSS
ieee80211_bssid_match matches also bcast address so also APs
were added
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Multiple issues:
- there are no "default" values needed
- cw_min/cw_max can be larger than documented
- restructure to decrease size
- use get_unaligned_le16
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch makes mac80211 assign proper sequence numbers to
QoS-data frames. It also removes the old sequence number code
because we noticed that only the driver or hardware can assign
sequence numbers to non-QoS-data and especially management
frames in a race-free manner because beacons aren't passed
through mac80211's TX path.
This patch also adds temporary code to the rt2x00 drivers to
not break them completely, that code will have to be reworked
for proper sequence numbers on beacons.
It also moves sequence number assignment down in the TX path
so no sequence numbers are assigned to frames that are dropped.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The /sys/class/net/*/wireless/ direcory is, as far as I know, not
used by anyone. Additionally, the same data is available via wext
ioctls. Hence the sysfs files are pretty much useless. This patch
makes them optional and schedules them for removal.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Jean Tourrilhes <jt@hpl.hp.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
According to 802.11-2007, we are doing the wrong thing in the
sequence number checks when receiving frames. This fixes it.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>