skb->truesize might be big even for a small packet.
Its even bigger after commit 87fb4b7b53 (net: more accurate skb
truesize) and big MTU.
We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.
Reported-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
flow_cach_flush() might sleep but can be called from
atomic context via the xfrm garbage collector. So add
a flow_cache_flush_deferred() function and use this if
the xfrm garbage colector is invoked from within the
packet path.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8ffd3208 voids the previous patches f6778aab and 810c0719 for
limiting the autoclose value. If userspace passes in -1 on 32-bit
platform, the overflow check didn't work and autoclose would be set
to 0xffffffff.
This patch defines a max_autoclose (in seconds) for limiting the value
and exposes it through sysctl, with the following intentions.
1) Avoid overflowing autoclose * HZ.
2) Keep the default autoclose bound consistent across 32- and 64-bit
platforms (INT_MAX / HZ in this patch).
3) Keep the autoclose value consistent between setsockopt() and
getsockopt() calls.
Suggested-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit a4a710c4a7 (pkt_sched: Change PSCHED_SHIFT from 10 to
6) it seems RED/GRED are broken.
red_calc_qavg_from_idle_time() computes a delay in us units, but this
delay is now 16 times bigger than real delay, so the final qavg result
smaller than expected.
Use standard kernel time services since there is no need to obfuscate
them.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now inetpeer is the place where we cache redirect information for ipv4
destinations, we must be able to invalidate informations when a route is
added/removed on host.
As inetpeer is not yet namespace aware, this patch adds a shared
redirect_genid, and a per inetpeer redirect_genid. This might be changed
later if inetpeer becomes ns aware.
Cache information for one inerpeer is valid as long as its
redirect_genid has the same value than global redirect_genid.
Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com>
Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We move all mtu handling from dst_mtu() down to the protocol
layer. So each protocol can implement the mtu handling in
a different manner.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We plan to invoke the dst_opt->default_mtu() method unconditioally
from dst_mtu(). So rename the method to dst_opt->mtu() to match
the name with the new meaning.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can not update iph->daddr in ip_options_rcv_srr(), It is too early.
When some exception ocurred later (eg. in ip_forward() when goto
sr_failed) we need the ip header be identical to the original one as
ICMP need it.
Add a field 'nexthop' in struct ip_options to save nexthop of LSRR
or SSRR option.
Signed-off-by: Li Wei <lw@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes an oops that can be triggered following this recipe:
0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
1) container is started.
2) connect to it via lxc-console.
3) generate some traffic with the container to create some conntrack
entries in its table.
4) stop the container: you hit one oops because the conntrack table
cleanup tries to report the destroy event to user-space but the
per-netns nfnetlink socket has already gone (as the nfnetlink
socket is per-netns but event callback registration is global).
To fix this situation, we make the ctnl_notifier per-netns so the
callback is registered/unregistered if the container is
created/destroyed.
Alex Bligh and Alexey Dobriyan originally proposed one small patch to
check if the nfnetlink socket is gone in nfnetlink_has_listeners,
but this is a very visited path for events, thus, it may reduce
performance and it looks a bit hackish to check for the nfnetlink
socket only to workaround this situation. As a result, I decided
to follow the bigger path choice, which seems to look nicer to me.
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Reported-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Two new struct members were not documented, fix that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Timers set by __set_chan_timer() should use miliseconds instead of
jiffies. Commit 942ecc9c46 updated
l2cap_set_timer() so it expects timeout to be specified in msecs
instead of jiffies. This makes timeouts unreliable when CONFIG_HZ
is not set to 1000.
Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@tieto.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
forcedeth: fix a few sparse warnings (variable shadowing)
forcedeth: Improve stats counters
forcedeth: remove unneeded stats updates
forcedeth: Acknowledge only interrupts that are being processed
forcedeth: fix race when unloading module
MAINTAINERS/rds: update maintainer
wanrouter: Remove kernel_lock annotations
usbnet: fix oops in usbnet_start_xmit
ixgbe: Fix compile for kernel without CONFIG_PCI_IOV defined
etherh: Add MAINTAINERS entry for etherh
bonding: comparing a u8 with -1 is always false
sky2: fix regression on Yukon Optima
netlink: clarify attribute length check documentation
netlink: validate NLA_MSECS length
i825xx:xscale:8390:freescale: Fix Kconfig dependancies
macvlan: receive multicast with local address
tg3: Update version to 3.121
tg3: Eliminate timer race with reset_task
tg3: Schedule at most one tg3_reset_task run
tg3: Obtain PCI function number from device
...
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
The documentation for how the length of attributes
is checked is wrong ("Exact length" isn't true, the
policy checks are for "minimum length") and a bit
misleading. Make it more complete and explain what
really happens.
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The warning really shouldn't happen, but until we
find the reason why it does don't spew it all the
time, just once is enough to know we've hit it.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.
the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to address the following error during the compilation:
In file included from kernel/sysctl_binary.c:6:
include/net/ip_vs.h:1406: error: expected identifier or ‘(’ before ‘{’ token
make[1]: *** [kernel/sysctl_binary.o] Error 1
make[1]: *** Waiting for unfinished jobs....
That manifests itself when CONFIG_IP_VS_NFCT is undefined in .config file.
Signed-off-by: Krzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
This patch exports several definitions that used to live under
include/net/netfilter/nf_nat.h. These definitions, although not
exported, have been used by iptables and other userspace
applications like miniupnpd since long time. Basically, these
userspace tools included some internal definition of the required
structures and they assume no changes in the binary representation
(which is OK indeed).
To resolve this situation, this patch makes public the required
structure and install them in INSTALL_HDR_PATH.
See: https://bugs.gentoo.org/376873, for more information.
This patch is heavily based on the initial patch sent by:
Anthony G. Basile <blueness@gentoo.org>
Which was entitled:
netfilter: export sanitized nf_nat.h to INSTALL_HDR_PATH
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
These two small inlines make calls to try_module_get() and
module_put() which would force us to keep module.h present
within yet another common include header. We can avoid this
by turning them into macros. The hci_dev_hold construct
is patterned off of raw_spin_trylock_irqsave() in spinlock.h
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This file was using the module get/put functions in two simple inline
functions. But module_get/put were only within scope because of
the implicit presence of module.h being everywhere.
Rather than add module.h to another file in include/ -- which is
exactly the thing we are trying to avoid, simply convert these
one-line functions into a define, as per what was done for the
device_schedule_callback() in commit 523ded71de.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The implicit presence of module.h everywhere meant that this header
also was getting moduleparam.h which defines struct kernel_param.
Since it only needs to know that kernel_param is a struct, call that
out instead of adding an include of moduleparam.h -- to get rid of this:
include/net/netfilter/nf_conntrack.h:316: warning: 'struct kernel_param' declared inside parameter list
include/net/netfilter/nf_conntrack.h:316: warning: its scope is only this definition or declaration, which is probably not what you want
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The <linux/module.h> pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used <linux/something.h>
files, as it results in a measureable increase on compile times.
The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.
There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.
Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This file has modular references, but they are limited to
those which are covered by the simple "struct module;"
declaration used in dozens of other places. In fact that
declaration is already there (just outside of the context
of this commit) so simply remove the include line.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
There is nothing module specific in this header, and removing
it doesn't seem to uncover any implicit dependencies either.
Must be simply a vestige of an ancient legacy.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
commit 66b13d99d9 (ipv4: tcp: fix TOS value in ACK messages sent from
TIME_WAIT) fixed IPv4 only.
This part is for the IPv6 side, adding a tclass param to ip6_xmit()
We alias tw_tclass and tw_tos, if socket family is INET6.
[ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill
TOS field, TCLASS is not taken into account ]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'for-linus' of git://github.com/ericvh/linux:
9p: fix 9p.txt to advertise msize instead of maxdata
net/9p: Convert net/9p protocol dumps to tracepoints
fs/9p: change an int to unsigned int
fs/9p: Cleanup option parsing in 9p
9p: move dereference after NULL check
fs/9p: inode file operation is properly initialized init_special_inode
fs/9p: Update zero-copy implementation in 9p
It was enabled by default and the messages guarded
by the define are useful.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this msize=4294967295 will result in a crash
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* remove lot of update to different data structure
* add a seperate callback for zero copy request.
* above makes non zero copy code path simpler
* remove conditionalizing TREAD/TREADDIR/TWRITE in the zero copy path
* Fix the dotu p9_check_errors with zero copy. Add sufficient doc around
* Add support for both in and output buffers in zero copy callback
* pin and unpin pages in the same context
* use helpers instead of defining page offset and rest of page ourself
* Fix mem leak in p9_check_errors
* Remove 'E' and 'F' in p9pdu_vwritef
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.
In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.
Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping
We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.
Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.
Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
INET_ECN_encapsulate() is better understood if we can read the official
statement.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.
For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added recovery check of CA wake status in case of wake up timeout.
Added check of CA wake status in case of wake down timeout.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added sanity check for length of CAIF frames, and tear down of
CAIF link-layer device upon protocol error.
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>