This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode. With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.
By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change. As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.
I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.
size net/ipv4/fib_trie.o
Before:
text data bss dec hex filename
15506 376 8 15890 3e12 net/ipv4/fib_trie.o
After:
text data bss dec hex filename
14827 376 8 15211 3b6b net/ipv4/fib_trie.o
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well. By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.
I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change pushes the tnode freeing down into the inflate and halve
functions. It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.
I believe this may address a bug in the freeing logic as well. For some
reason if the freelist got to a certain size we would call
synchronize_rcu(). I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu(). As such that is what I have updated the behavior to be.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize. By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change pulls the logic for if we should inflate/halve the nodes out
into separate functions. It also addresses what I believe is a bug where 1
full node is all that is needed to keep a node from ever being halved.
Simple script to reproduce the issue:
modprobe dummy; ifconfig dummy0 up
for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done
ifconfig dummy0:256 10.0.255.33/16 up
for i in `seq 0 254`; do ifconfig dummy0:$i down; done
Results from /proc/net/fib_triestat
Before:
Local:
Aver depth: 3.00
Max depth: 4
Leaves: 17
Prefixes: 18
Internal nodes: 11
1: 8 2: 2 10: 1
Pointers: 1048
Null ptrs: 1021
Total size: 11 kB
After:
Local:
Aver depth: 3.41
Max depth: 5
Leaves: 17
Prefixes: 18
Internal nodes: 12
1: 8 2: 3 3: 1
Pointers: 36
Null ptrs: 8
Total size: 3 kB
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change consists of a cut/paste of resize to behind inflate and halve
so that I could remove the two function prototypes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is to start cleaning up some of the rcu_read_lock/unlock
handling. I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.
A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that anything that can be shifted by, or compared
to a value shifted by bits is updated to be an unsigned long. This is
mostly a precaution against an insanely huge address space that somehow
starts coming close to the 2^32 root node size which would require
something like 1.5 billion addresses.
I chose unsigned long instead of unsigned long long since I do not believe
it is possible to allocate a 32 bit tnode on a 32 bit system as the memory
consumed would be 16GB + 28B which exceeds the addressible space for any
one process.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change moves the pos value to the other side of the "bits" field. By
doing this it actually simplifies a significant amount of code in the trie.
For example when halving a tree we know that the bit lost exists at
oldnode->pos, and if we inflate the tree the new bit being add is at
tn->pos. Previously to find those bits you would have to subtract pos and
bits from the keylength or start with a value of (1 << 31) and then shift
that.
There are a number of spots throughout the code that benefit from this. In
the case of the hot-path searches the main advantage is that we can drop 2
or more operations from the search path as we no longer need to compute the
value for the index to be shifted by and can instead just use the raw pos
value.
In addition the tkey_extract_bits is now defunct and can be replaced by
get_index since the two operations were doing the same thing, but now
get_index does it much more quickly as it is only an xor and shift versus a
pair of shifts and a subtraction.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the fib_table_insert function to take advantage of the
changes made to improve the performance of fib_table_lookup. As a result
the code should be smaller and run faster then the original.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes use of the same features I made use of for
fib_table_lookup to streamline fib_find_node. The resultant code should be
smaller and run faster than the original.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to reduce the complexity of fib_table_lookup by reducing
the number of variables to the bare minimum while still keeping the same if
not improved functionality versus the original.
Most of this change was started off by the desire to rid the function of
chopped_off and current_prefix_length as they actually added very little to
the function since they only applied when computing the cindex. I was able
to replace them mostly with just a check for the prefix match. As long as
the prefix between the key and the node being tested was the same we know
we can search the tnode fully versus just testing cindex 0.
The second portion of the change ended up being a massive reordering.
Originally the calls to check_leaf were up near the start of the loop, and
the backtracing and descending into lower levels of tnodes was later. This
didn't make much sense as the structure of the tree means the leaves are
always the last thing to be tested. As such I reordered things so that we
instead have a loop that will delve into the tree and only exit when we
have either found a leaf or we have exhausted the tree. The advantage of
rearranging things like this is that we can fully inline check_leaf since
there is now only one reference to it in the function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes it so that leaf and tnode are the same struct. As a
result there is no need for rt_trie_node anymore since everyting can be
merged into tnode.
On 32b systems this results in the leaf being 4 bytes larger, however I
don't know if that is really an issue as this and an eariler patch that
added bits & pos have increased the size from 20 to 28. If I am not
mistaken slub/slab allocate on power of 2 sizes so 20 was likely being
rounded up to 32 anyway.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both the leaf and the tnode had an rcu_head in them, but they had them in
slightly different places. Since we now have them in the same spot and
know that any node with bits == 0 is a leaf and the rest are either vmalloc
or kmalloc tnodes depending on the value of bits it makes it easy to combine
the functions and reduce overhead.
In addition I have taken advantage of the rcu_head pointer to go ahead and
put together a simple linked list instead of using the tnode pointer as
this way we can merge either type of structure for freeing.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes some fundamental changes to the way leaves and tnodes are
constructed. The big differences are:
1. Leaves now populate pos and bits indicating their full key size.
2. Trie nodes now mask out their lower bits to be consistent with the leaf
3. Both structures have been reordered so that rt_trie_node now consisists
of a much larger region including the pos, bits, and rcu portions of
the tnode structure.
On 32b systems this will result in the leaf being 4B larger as the pos and
bits values were added to a hole created by the key as it was only 4B in
length.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The trie usage stats were currently being shared by all threads that were
calling fib_table_lookup. As a result when multiple threads were
performing lookups simultaneously the trie would begin to cache bounce
between those threads.
In order to prevent this I have updated the usage stats to use a set of
percpu variables. By doing this we should be able to avoid the cache
bouncing and still make use of these stats.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GRE tap device supports Ethernet over GRE, but doesn't
care about the source address of the tunnel, therefore it
can be changed without bring device down.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously l2tp module did not provide any means for the user space to
get notified when tunnels/sessions are added/modified/deleted.
This change contains the following
- create a multicast group for the listeners to register.
- notify the registered listeners when the tunnels/sessions are
created/modified/deleted.
Signed-off-by: Bill Hong <bhong@brocade.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Sven-Thorsten Dietrich <sven@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warning:
net/tipc/link.c:1924:40: warning: Using plain integer as NULL pointer
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix double SKB free in bluetooth 6lowpan layer, from Jukka Rissanen.
2) Fix receive checksum handling in enic driver, from Govindarajulu
Varadarajan.
3) Fix NAPI poll list corruption in virtio_net and caif_virtio, from
Herbert Xu. Also, add code to detect drivers that have this mistake
in the future.
4) Fix doorbell endianness handling in mlx4 driver, from Amir Vadai.
5) Don't clobber IP6CB() before xfrm6_policy_check() is called in TCP
input path,f rom Nicolas Dichtel.
6) Fix MPLS action validation in openvswitch, from Pravin B Shelar.
7) Fix double SKB free in vxlan driver, also from Pravin.
8) When we scrub a packet, which happens when we are switching the
context of the packet (namespace, etc.), we should reset the
secmark. From Thomas Graf.
9) ->ndo_gso_check() needs to do more than return true/false, it also
has to allow the driver to clear netdev feature bits in order for
the caller to be able to proceed properly. From Jesse Gross.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
genetlink: A genl_bind() to an out-of-range multicast group should not WARN().
netlink/genetlink: pass network namespace to bind/unbind
ne2k-pci: Add pci_disable_device in error handling
bonding: change error message to debug message in __bond_release_one()
genetlink: pass multicast bind/unbind to families
netlink: call unbind when releasing socket
netlink: update listeners directly when removing socket
genetlink: pass only network namespace to genl_has_listeners()
netlink: rename netlink_unbind() to netlink_undo_bind()
net: Generalize ndo_gso_check to ndo_features_check
net: incorrect use of init_completion fixup
neigh: remove next ptr from struct neigh_table
net: xilinx: Remove unnecessary temac_property in the driver
net: phy: micrel: use generic config_init for KSZ8021/KSZ8031
net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding
openvswitch: fix odd_ptr_err.cocci warnings
Bluetooth: Fix accepting connections when not using mgmt
Bluetooth: Fix controller configuration with HCI_QUIRK_INVALID_BDADDR
brcmfmac: Do not crash if platform data is not populated
ipw2200: select CFG80211_WEXT
...
Users can request to bind to arbitrary multicast groups, so warning
when the requested group number is out of range is not appropriate.
And with the warning removed, and the 'err' variable properly given
an initial value, we can remove 'found' altogether.
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink families can exist in multiple namespaces, and for the most
part multicast subscriptions are per network namespace. Thus it only
makes sense to have bind/unbind notifications per network namespace.
To achieve this, pass the network namespace of a given client socket
to the bind/unbind functions.
Also do this in generic netlink, and there also make sure that any
bind for multicast groups that only exist in init_net is rejected.
This isn't really a problem if it is accepted since a client in a
different namespace will never receive any notifications from such
a group, but it can confuse the family if not rejected (it's also
possible to silently (without telling the family) accept it, but it
would also have to be ignored on unbind so families that take any
kind of action on bind/unbind won't do unnecessary work for invalid
clients like that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make the newly fixed multicast bind/unbind
functionality in generic netlink, pass them down to the
appropriate family.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, netlink_unbind() is only called when the socket
explicitly unbinds, which limits its usefulness (luckily
there are no users of it yet anyway.)
Call netlink_unbind() also when a socket is released, so it
becomes possible to track listeners with this callback and
without also implementing a netlink notifier (and checking
netlink_has_listeners() in there.)
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code is now confusing to read - first in one function down
(netlink_remove) any group subscriptions are implicitly removed
by calling __sk_del_bind_node(), but the subscriber database is
only updated far later by calling netlink_update_listeners().
Move the latter call to just after removal from the list so it
is easier to follow the code.
This also enables moving the locking inside the kernel-socket
conditional, which improves the normal socket destruction path.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no point to force the caller to know about the internal
genl_sock to use inside struct net, just have them pass the network
namespace. This doesn't really change code generation since it's
an inline, but makes the caller less magic - there's never any
reason to pass another socket.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new name is more expressive - this isn't a generic unbind
function but rather only a little undo helper for use only in
netlink_bind().
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
Here's one more bluetooth pull request for 3.19. We've got two fixes:
- Fix for accepting connections with old user space versions of BlueZ
- Fix for Bluetooth controllers that don't have a public address
Both of these are regressions that were introduced in 3.17, so the
appropriate Cc: stable annotations are provided.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
GSO isn't the only offload feature with restrictions that
potentially can't be expressed with the current features mechanism.
Checksum is another although it's a general issue that could in
theory apply to anything. Even if it may be possible to
implement these restrictions in other ways, it can result in
duplicate code or inefficient per-packet behavior.
This generalizes ndo_gso_check so that drivers can remove any
features that don't make sense for a given packet, similar to
netif_skb_features(). It also converts existing driver
restrictions to the new format, completing the work that was
done to support tunnel protocols since the issues apply to
checksums as well.
By actually removing features from the set that are used to do
offloading, it solves another problem with the existing
interface. In these cases, GSO would run with the original set
of features and not do anything because it appears that
segmentation is not required.
CC: Tom Herbert <therbert@google.com>
CC: Joe Stringer <joestringer@nicira.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Tom Herbert <therbert@google.com>
Fixes: 04ffcb255f ("net: Add ndo_gso_check")
Tested-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using VXLAN tunnels and a sky2 device, I have experienced
checksum failures of the following type:
[ 4297.761899] eth0: hw csum failure
[...]
[ 4297.765223] Call Trace:
[ 4297.765224] <IRQ> [<ffffffff8172f026>] dump_stack+0x46/0x58
[ 4297.765235] [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
[ 4297.765238] [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
[ 4297.765240] [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
[ 4297.765243] [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
[ 4297.765246] [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
These are reliably reproduced in a network topology of:
container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
When VXLAN encapsulated traffic is received from a similarly
configured peer, the above warning is generated in the receive
processing of the encapsulated packet. Note that the warning is
associated with the container eth0.
The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
because the packet is an encapsulated Ethernet frame, the checksum
generated by the hardware includes the inner protocol and Ethernet
headers.
The receive code is careful to update the skb->csum, except in
__dev_forward_skb, as called by dev_forward_skb. __dev_forward_skb
calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
to skip over the Ethernet header, but does not update skb->csum when
doing so.
This patch resolves the problem by adding a call to
skb_postpull_rcsum to update the skb->csum after the call to
eth_type_trans.
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/openvswitch/vport-gre.c:188:5-11: inconsistent IS_ERR and PTR_ERR, PTR_ERR on line 189
PTR_ERR should access the value just tested by IS_ERR
Semantic patch information:
There can be false positives in the patch case, where it is the call
IS_ERR that is wrong.
Generated by: scripts/coccinelle/tests/odd_ptr_err.cocci
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When connectable mode is enabled (page scan on) through some non-mgmt
method the HCI_CONNECTABLE flag will not be set. For backwards
compatibility with user space versions not using mgmt we should not
require HCI_CONNECTABLE to be set if HCI_MGMT is not set.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.17+
When controllers set the HCI_QUIRK_INVALID_BDADDR flag, it is required
by userspace to program a valid public Bluetooth device address into
the controller before it can be used.
After successful address configuration, the internal state changes and
the controller runs the complete initialization procedure. However one
small difference is that this is no longer the HCI_SETUP stage. The
HCI_SETUP stage is only valid during initial controller setup. In this
case the stack runs the initialization as part of the HCI_CONFIG stage.
The controller version information, default name and supported commands
are only stored during HCI_SETUP. While these information are static,
they are not read initially when HCI_QUIRK_INVALID_BDADDR is set. So
when running in HCI_CONFIG state, these information need to be updated
as well.
This especially impacts Bluetooth 4.1 and later controllers using
extended feature pages and second event mask page.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Cc: stable@vger.kernel.org # 3.17+
skb_scrub_packet() is called when a packet switches between a context
such as between underlay and overlay, between namespaces, or between
L3 subnets.
While we already scrub the packet mark, connection tracking entry,
and cached destination, the security mark/context is left intact.
It seems wrong to inherit the security context of a packet when going
from overlay to underlay or across forwarding paths.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When vlan tags are stacked, it is very likely that the outer tag is stored
in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
Currently netif_skb_features() first looks at skb->protocol even if there
is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
encapsulated by the inner vlan instead of the inner vlan protocol.
This allows GSO packets to be passed to HW and they end up being
corrupted.
Fixes: 58e998c6d2 ("offloading: Force software GSO for multiple vlan tags.")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today vport-send has complex error handling because it involves
freeing skb and updating stats depending on return value from
vport send implementation.
This can be simplified by delegating responsibility of freeing
skb to the vport implementation for all cases. So that
vport-send needs just update stats.
Fixes: 91b7514cdf ("openvswitch: Unify vport error stats
handling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MPLS GSO needs to know inner most protocol to process GSO packets.
Fixes: 25cd9ba0ab ("openvswitch: Add basic MPLS support to
kernel").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linux stack does not implement GSO for packet with multiple
encapsulations. Therefore there was check in MPLS action
validation to detect such case, But this check introduced
bug which deleted one or more actions from actions list.
Following patch removes this check to fix the validation.
Fixes: 25cd9ba0ab ("openvswitch: Add basic MPLS support to
kernel").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reported-by: Srinivas Neginhal <sneginha@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MPLS and Tunnel GSO does not work together. Reject packet which
request such GSO.
Fixes: 0d89d2035f ("MPLS: Add limited GSO support").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes MPLS GSO for case when mpls is compiled as kernel module.
Fixes: 0d89d2035f ("MPLS: Add limited GSO support").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch rearranges the loop in net_rx_action to reduce the
amount of jumping back and forth when reading the code.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should only perform the softnet_break check after we have polled
at least one device in net_rx_action. Otherwise a zero or negative
setting of netdev_budget can lock up the whole system.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit d75b1ade56 (net: less
interrupt masking in NAPI) required drivers to leave poll_list
empty if the entire budget is consumed.
We have already had two broken drivers so let's add a check for
this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch creates a new function napi_poll and moves the napi
polling code from net_rx_action into it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gateway having bandwidth_down equal to zero are not accepted
at all and so never added to the Gateway list.
For this reason checking the bandwidth_down member in
batadv_gw_out_of_range() is useless.
This is probably a copy/paste error and this check was supposed
to be "!gw_node" only. Moreover, the way the check is written
now may also lead to a NULL dereference.
Fix this by rewriting the if-condition properly.
Introduced by 414254e342
("batman-adv: tvlv - gateway download/upload bandwidth container")
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fragmentation code was replaced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge") by an implementation which
can handle up to 16 fragments of a packet. The packet is prepared for the split
in fragments by the function batadv_frag_send_packet and the actual split is
done by batadv_frag_create.
Both functions calculate the size of a fragment themself. But their calculation
differs because batadv_frag_send_packet also subtracts ETH_HLEN. Therefore,
the check in batadv_frag_send_packet "can a full fragment can be created?" may
return true even when batadv_frag_create cannot create a full fragment.
The function batadv_frag_create doesn't check the size of the skb before
splitting it and therefore might try to create a larger fragment than the
remaining buffer. This creates an integer underflow and an invalid len is given
to skb_split.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fragmentation code was replaced in 610bfc6bc9
("batman-adv: Receive fragmented packets and merge"). The new code provided a
mostly unused parameter skb for the merging function. It is used inside the
function to calculate the additionally needed skb tailroom. But instead of
increasing its own tailroom, it is only increasing the tailroom of the first
queued skb. This is not correct in some situations because the first queued
entry can be a different one than the parameter.
An observed problem was:
1. packet with size 104, total_size 1464, fragno 1 was received
- packet is queued
2. packet with size 1400, total_size 1464, fragno 0 was received
- packet is queued at the end of the list
3. enough data was received and can be given to the merge function
(1464 == (1400 - 20) + (104 - 20))
- merge functions gets 1400 byte large packet as skb argument
4. merge function gets first entry in queue (104 byte)
- stored as skb_out
5. merge function calculates the required extra tail as total_size - skb->len
- pskb_expand_head tail of skb_out with 64 bytes
6. merge function tries to squeeze the extra 1380 bytes from the second queued
skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out
Instead calculate the extra required tail bytes for skb_out also using skb_out
instead of using the parameter skb. The skb parameter is only used to get the
total_size from the last received packet. This is also the total_size used to
decide that all fragments were received.
Reported-by: Philipp Psurek <philipp.psurek@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit cecda693a9 ("net: keep original skb
which only needs header checking during software GSO") keeps the original
skb for packets that only needs header check, but it doesn't drop the
packet if software segmentation or header check were failed.
Fixes cecda693a9 ("net: keep original skb which only needs header checking during software GSO")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When xfrm6_policy_check() is used, _decode_session6() is called after some
intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
prepared after the call of xfrm6_policy_check().
Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.
Fixes: 971f10eca1 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Reported-by: Huaibin Wang <huaibin.wang@6wind.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>