Commit Graph

314 Commits

Author SHA1 Message Date
Jiri Benc
7d34fa75d3 vxlan: fix too large pskb_may_pull with remote checksum
vxlan_remcsum is called after iptunnel_pull_header and thus the skb has
vxlan header already pulled. Don't include vxlan header again in the
calculation.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-21 13:32:19 -04:00
Daniel Borkmann
eaa93bf4c6 vxlan: fix populating tclass in vxlan6_get_route
Jiri mentioned that flowi6_tos of struct flowi6 is never used/read
anywhere. In fact, rest of the kernel uses the flowi6's flowlabel,
where the traffic class _and_ the flowlabel (aka flowinfo) is encoded.

For example, for policy routing, fib6_rule_match() uses ip6_tclass()
that is applied on the flowlabel member for matching on tclass. Similar
fix is needed for geneve, where flowi6_tos is set as well. Installing
a v6 blackhole rule that f.e. matches on tos is now working with vxlan.

Fixes: 1400615d64 ("vxlan: allow setting ipv6 traffic class")
Reported-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-20 13:44:34 -04:00
Alexander Duyck
c194cf93c1 gro: Defer clearing of flush bit in tunnel paths
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that
we do not clear the flush bit until after we have called the next level GRO
handler.  Previously this was being cleared before parsing through the list
of frames, however this resulted in several paths where either the bit
needed to be reset but wasn't as in the case of FOU, or cases where it was
being set as in GENEVE.  By just deferring the clearing of the bit until
after the next level protocol has been parsed we can avoid any unnecessary
bit twiddling and avoid bugs.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-13 15:01:00 -04:00
Daniel Borkmann
e7f70af111 vxlan: support setting IPv6 flow label
This work adds support for setting the IPv6 flow label for vxlan per
device and through collect metadata (ip_tunnel_key) frontends. The
vxlan dst cache does not need any special considerations here, for
the cases where caches can be used, the label is static per cache.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:26 -05:00
Daniel Borkmann
134611446d ip_tunnel: add support for setting flow label via collect metadata
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label
from call sites. Currently, there's no such option and it's always set to
zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so
that flow-based tunnels via collect metadata frontends can make use of it.
vxlan and geneve will be converted to add flow label support separately.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-11 15:14:26 -05:00
Daniel Borkmann
1400615d64 vxlan: allow setting ipv6 traffic class
We can already do that for IPv4, but IPv6 support was missing. Add
it for vxlan, so it can be used with collect metadata frontends.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 13:58:47 -05:00
Daniel Borkmann
db3c6139e6 bpf, vxlan, geneve, gre: fix usage of dst_cache on xmit
The assumptions from commit 0c1d70af92 ("net: use dst_cache for vxlan
device"), 468dfffcd7 ("geneve: add dst caching support") and 3c1cb4d260
("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage
when ip_tunnel_info is used is unfortunately not always valid as assumed.

While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF
however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre
with different remote dsts, tos, etc, therefore they cannot be assumed as
packet independent.

Right now vxlan, geneve, gre would cache the dst for eBPF and every packet
would reuse the same entry that was first created on the initial route
lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have
a different one.

Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test
in vxlan needs to be handeled differently in this context as it is currently
inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable()
helper is added for the three tunnel cases, which checks if we can use dst
cache.

Fixes: 0c1d70af92 ("net: use dst_cache for vxlan device")
Fixes: 468dfffcd7 ("geneve: add dst caching support")
Fixes: 3c1cb4d260 ("net/ipv4: add dst cache support for gre lwtunnels")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 13:58:47 -05:00
David S. Miller
810813c47a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-08 12:34:12 -05:00
Zhang Shengju
6297b91c7f vxlan: use reset to set header pointers
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add operation
in set function.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-04 22:45:13 -05:00
Daniel Borkmann
4024fcf705 vxlan: fix missing options_len update on RX with collect metadata
When signalling to metadata consumers that the metadata_dst entry
carries additional GBP extension data for vxlan (TUNNEL_VXLAN_OPT),
the dst's vxlan_metadata information is populated, but options_len
is left to zero. F.e. in ovs, ovs_flow_key_extract() checks for
options_len before extracting the data through ip_tunnel_info_opts_get().

Geneve uses ip_tunnel_info_opts_set() helper in receive path, which
sets options_len internally, vxlan however uses ip_tunnel_info_opts(),
so when filling vxlan_metadata, we do need to update options_len.

Fixes: 4c22279848 ("ip-tunnel: Use API to access tunnel metadata options.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-03-03 17:10:31 -05:00
MINOURA Makoto / 箕浦 真
472681d57a net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump.
When the send skbuff reaches the end, nlmsg_put and friends returns
-EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called
within a for_each_netdev loop and the first fdb entry of a following
netdev could fit in the remaining skbuff.  This breaks the mechanism
of cb->args[0] and idx to keep track of the entries that are already
dumped, which results missing entries in bridge fdb show command.

Signed-off-by: Minoura Makoto <minoura@valinux.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-26 15:04:02 -05:00
Jiri Benc
10a5af238c vxlan: simplify metadata_dst usage in vxlan_rcv
Now when the packet is scrubbed early, the metadata_dst can be assigned to
the skb as soon as it is allocated. This simplifies the error cleanup path,
as the dst will be freed by kfree_skb. It is also not necessary to pass it
as a parameter to functions anymore.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:17:12 -05:00
Jiri Benc
f2d1968ec8 vxlan: consolidate rx handling to a single function
Now when both vxlan_udp_encap_recv and vxlan_rcv are much shorter, combine
them into a single function.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:17:12 -05:00
Jiri Benc
760c68054e vxlan: move ECN decapsulation to a separate function
It simplifies the vxlan_rcv function.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:17:12 -05:00
Jiri Benc
1ab016e237 vxlan: move inner L2 header processing to a separate function
This code will be different for VXLAN-GPE, so move it to a separate
function. It will also make the rx path less spaghetti-like.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:17:11 -05:00
Jiri Benc
64f87d3616 vxlan: consolidate GBP handling even more
Now when the packet is scrubbed early, skb->mark can be set in the GBP
handling code.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-25 15:17:11 -05:00
David S. Miller
b633353115 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-23 00:09:14 -05:00
Alexander Duyck
6ceb31ca5f VXLAN: Support outer IPv4 Tx checksums by default
This change makes it so that if UDP CSUM is not specified we will default
to enabling it.  The main motivation behind this is the fact that with the
use of outer checksum we can greatly improve the performance for VXLAN
tunnels on devices that don't know how to parse tunnel headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-21 22:05:50 -05:00
Jiri Benc
f468a729a2 vxlan: do not use fdb in metadata mode
In metadata mode, the vxlan interface is not supposed to use the fdb control
plane but an external one (openvswitch or static routes). With the current
code, packets may leak into the fdb handling code which usually causes them
to be dropped anyway but may have strange side effects.

Just drop the packets directly when in metadata mode if the destination data
are not correctly provided on egress.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 15:01:14 -05:00
Jiri Benc
82a0f6b4ab vxlan: clear IFF_TX_SKB_SHARING
ether_setup sets IFF_TX_SKB_SHARING but this is not supported by vxlan
as it modifies the skb on xmit.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:43:47 -05:00
Jiri Benc
7f290c9435 iptunnel: scrub packet in iptunnel_pull_header
Part of skb_scrub_packet was open coded in iptunnel_pull_header. Let it call
skb_scrub_packet directly instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Jiri Benc
c9e78efb6f vxlan: move vxlan device lookup before iptunnel_pull_header
This is in preparation for iptunnel_pull_header calling skb_scrub_packet.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 14:34:54 -05:00
Jiri Benc
07dabf20d9 vxlan: tun_id is 64bit, not 32bit
The tun_id field in struct ip_tunnel_key is __be64, not __be32. We need to
convert the vni to tun_id correctly.

Fixes: 54bfd872bf ("vxlan: keep flags and vni in network byte order")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Tested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-18 13:55:24 -05:00
Jiri Benc
b9167b2e77 vxlan: treat vni in metadata based tunnels consistently
For metadata based tunnels, VNI is ignored when doing vxlan device lookups
(because such tunnel receives all VNIs). However, this was not honored by
vxlan_xmit_one when doing encapsulation bypass. Move the check for metadata
based tunnel to the common place where it belongs.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:12 -05:00
Jiri Benc
288b01c8c4 vxlan: clean up rx error path
When there are unrecognized flags present in the vxlan header, it doesn't
make much sense to return the packet for further UDP processing, especially
considering that for other invalid flag combinations we drop the packet
because of previous checks.

This means we return positive value only at the beginning of the function
where tun_dst is not yet allocated. This allows us to get rid of the
bad_flags and error jump labels.

When we're dropping packet, we need to free tun_dst now.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:12 -05:00
Jiri Benc
f14ecebb3a vxlan: clean up extension handling on rx
Bring the extension handling to a single place and move the actual handling
logic out of vxlan_udp_encap_recv as much as possible.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:12 -05:00
Jiri Benc
3288af0892 vxlan: move GBP header parsing to a separate function
To make vxlan_udp_encap_recv shorter and more comprehensible.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:12 -05:00
Jiri Benc
be5cfeab8f vxlan: simplify vxlan_remcsum
Part of the parameters is not needed. Simplify the caller of this function
in preparation of making vxlan rx more comprehensible.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:11 -05:00
Jiri Benc
54bfd872bf vxlan: keep flags and vni in network byte order
Prevent repeated conversions from and to network order in the fast path.

To achieve this, define all flag constants in big endian order and store VNI
as __be32. To prevent confusion between the actual VNI value and the VNI
field from the header (which contains additional reserved byte), strictly
distinguish between "vni" and "vni_field".

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:11 -05:00
Jiri Benc
d4ac05ff36 vxlan: introduce vxlan_hdr
Currently, pointer to the vxlan header is kept in a local variable. It has
to be reloaded whenever the pskb pull operations are performed which usually
happens somewhere deep in called functions.

Create a vxlan_hdr function and use it to reference the vxlan header
instead.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-17 23:52:11 -05:00
Paolo Abeni
d71785ffc7 net: add dst_cache to ovs vxlan lwtunnel
In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Paolo Abeni
0c1d70af92 net: use dst_cache for vxlan device
In case of UDP traffic with datagram length
below MTU this give about 3% performance increase
when tunneling over ipv4 and about 70% when
tunneling over ipv6.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-16 20:21:48 -05:00
Edward Cree
6fa79666e2 net: ip_tunnel: remove 'csum_help' argument to iptunnel_handle_offloads
All users now pass false, so we can remove it, and remove the code that
 was conditional upon it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:16 -05:00
Edward Cree
b57085019d net: vxlan: enable local checksum offload
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-12 05:52:15 -05:00
stephen hemminger
40d29af057 vxlan: udp_tunnel duplicate include net/udp_tunnel.h
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11 09:45:24 -05:00
David Wragg
7e059158d5 vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could
transmit vxlan packets of any size, constrained only by the ability to
send out the resulting packets.  4.3 introduced netdevs corresponding
to tunnel vports.  These netdevs have an MTU, which limits the size of
a packet that can be successfully encapsulated.  The default MTU
values are low (1500 or less), which is awkwardly small in the context
of physical networks supporting jumbo frames, and leads to a
conspicuous change in behaviour for userspace.

Instead, set the MTU on openvswitch-created netdevs to be the relevant
maximum (i.e. the maximum IP packet size minus any relevant overhead),
effectively restoring the behaviour prior to 4.3.

Signed-off-by: David Wragg <david@weave.works>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10 05:50:03 -05:00
David Wragg
72564b59ff vxlan: Relax MTU constraints
Allow the MTU of vxlan devices without an underlying device to be set
to larger values (up to a maximum based on IP packet limits and vxlan
overhead).

Previously, their MTUs could not be set to higher than the
conventional ethernet value of 1500.  This is a very arbitrary value
in the context of vxlan, and prevented vxlan devices from being able
to take advantage of jumbo frames etc.

The default MTU remains 1500, for compatibility.

Signed-off-by: David Wragg <david@weave.works>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10 05:50:03 -05:00
Jiri Benc
f491e56dba vxlan: consolidate vxlan_xmit_skb and vxlan6_xmit_skb
There's a lot of code duplication. Factor out the duplicate code to a new
function shared between IPv4 and IPv6 xmit path.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 13:51:00 -05:00
Jiri Benc
b4ed5cad24 vxlan: consolidate csum flag handling
The flag for tx checksumming for tunneling over IPv4 and IPv6 is different.
Decide whether to do tx checksumming in vxlan_xmit_one and pass it on as
a separate flag. This will allow for tx path consolidation in the next
patch.

Unfortunately, gcc is not clever enough to see that udp_sum is always
initialized and gives an uninitialized variable warning. Set it to false to
silence the warning.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 13:51:00 -05:00
Jiri Benc
1a8496ba40 vxlan: consolidate output route calculation
The code for output route lookup is duplicated for ndo_start_xmit and
ndo_fill_metadata_dst. Move it to a common function.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-07 13:51:00 -05:00
Li RongQing
7256eac13b vxlan: fix a out of bounds access in __vxlan_find_mac
The size of all_zeros_mac is 6 byte, but eth_hash() will access the
8 byte, and KASan reported the below bug:

[ 8596.479031] BUG: KASan: out of bounds access in __vxlan_find_mac+0x24/0x100 at addr ffffffff841514c0
[ 8596.487647] Read of size 8 by task ip/52820
[ 8596.490818] Address belongs to variable all_zeros_mac+0x0/0x40
[ 8596.496051] CPU: 0 PID: 52820 Comm: ip Tainted: G WC 4.1.15 #1
[ 8596.503520] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 02/10/2014
[ 8596.509365] ffffffff841514c0 ffff88007450f0b8 ffffffff822fa5e1 0000000000000032
[ 8596.516112] ffff88007450f150 ffff88007450f138 ffffffff812dd58c ffff88007450f1d8
[ 8596.522856] ffffffff81113b80 0000000000000282 0000000000000001 ffffffff8101ee4d
[ 8596.529599] Call Trace:
[ 8596.530858] [<ffffffff822fa5e1>] dump_stack+0x4f/0x7b
[ 8596.535080] [<ffffffff812dd58c>] kasan_report_error+0x3bc/0x3f0
[ 8596.540258] [<ffffffff81113b80>] ? __lock_acquire+0x90/0x2140
[ 8596.545245] [<ffffffff8101ee4d>] ? save_stack_trace+0x2d/0x80
[ 8596.550234] [<ffffffff812dda70>] kasan_report+0x40/0x50
[ 8596.554647] [<ffffffff81b211e4>] ? __vxlan_find_mac+0x24/0x100
[ 8596.559729] [<ffffffff812dc399>] __asan_load8+0x69/0xa0
[ 8596.564141] [<ffffffff81b211e4>] __vxlan_find_mac+0x24/0x100
[ 8596.569033] [<ffffffff81b2683d>] vxlan_fdb_create+0x9d/0x570

it can be fixed by enlarging the all_zeros_mac to 8 byte, although it is
harmless; eth_hash() will be called in other place with the memory which
is larger and equal to 8 byte.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-29 20:11:16 -08:00
Jesse Gross
35e2d1152b tunnels: Allow IPv6 UDP checksums to be correctly controlled.
When configuring checksums on UDP tunnels, the flags are different
for IPv4 vs. IPv6 (and reversed). However, when lightweight tunnels
are enabled the flags used are always the IPv4 versions, which are
ignored in the IPv6 code paths. This uses the correct IPv6 flags, so
checksums can be controlled appropriately.

Fixes: a725e514 ("vxlan: metadata based tunneling for IPv6")
Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-21 11:10:40 -08:00
David S. Miller
9d367eddf3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_main.c
	drivers/net/ethernet/mellanox/mlxsw/spectrum.h
	drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c

The bond_main.c and mellanox switch conflicts were cases of
overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-11 23:55:43 -05:00
Hannes Frederic Sowa
787d7ac308 udp: restrict offloads to one namespace
udp tunnel offloads tend to aggregate datagrams based on inner
headers. gro engine gets notified by tunnel implementations about
possible offloads. The match is solely based on the port number.

Imagine a tunnel bound to port 53, the offloading will look into all
DNS packets and tries to aggregate them based on the inner data found
within. This could lead to data corruption and malformed DNS packets.

While this patch minimizes the problem and helps an administrator to find
the issue by querying ip tunnel/fou, a better way would be to match on
the specific destination ip address so if a user space socket is bound
to the same address it will conflict.

Cc: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-10 17:28:24 -05:00
Nicolas Dichtel
07b9b37c22 vxlan: fix test which detect duplicate vxlan iface
When a vxlan interface is created, the driver checks that there is not
another vxlan interface with the same properties. To do this, it checks
the existing vxlan udp socket. Since commit 1c51a9159d, the creation of
the vxlan socket is done only when the interface is set up, thus it breaks
that test.

Example:
$ ip l a vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip l a vxlan11 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
$ ip -br l | grep vxlan
vxlan10          DOWN           f2:55:1c:6a:fb:00 <BROADCAST,MULTICAST>
vxlan11          DOWN           7a:cb:b9:38:59:0d <BROADCAST,MULTICAST>

Instead of checking sockets, let's loop over the vxlan iface list.

Fixes: 1c51a9159d ("vxlan: fix race caused by dropping rtnl_unlock")
Reported-by: Thomas Faivre <thomas.faivre@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-09 21:04:20 -05:00
Pravin B Shelar
039f50629b ip_tunnel: Move stats update to iptunnel_xmit()
By moving stats update into iptunnel_xmit(), we can simplify
iptunnel_xmit() usage. With this change there is no need to
call another function (iptunnel_xmit_stats()) to update stats
in tunnel xmit code path.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-25 23:32:23 -05:00
Jiri Benc
ce212d0f6f vxlan: interpret IP headers for ECN correctly
When looking for outer IP header, use the actual socket address family, not
the address family of the default destination which is not set for metadata
based interfaces (and doesn't have to match the address family of the
received packet even if it was set).

Fix also the misleading comment.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07 17:06:33 -05:00
Jiri Benc
239e944ff5 vxlan: support ndo_fill_metadata_dst also for IPv6
Fill the metadata correctly even when tunneling over IPv6. Also, check that
the provided metadata is of an address family that is supported by the
tunnel.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07 16:31:25 -05:00
Jiri Benc
e5d4b29fe8 vxlan: move IPv6 outpute route calculation to a function
Will be used also for ndo_fill_metadata_dst.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-07 16:31:24 -05:00
David S. Miller
ba3e2084f2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/xfrm6_output.c
	net/openvswitch/flow_netlink.c
	net/openvswitch/vport-gre.c
	net/openvswitch/vport-vxlan.c
	net/openvswitch/vport.c
	net/openvswitch/vport.h

The openvswitch conflicts were overlapping changes.  One was
the egress tunnel info fix in 'net' and the other was the
vport ->send() op simplification in 'net-next'.

The xfrm6_output.c conflicts was also a simplification
overlapping a bug fix.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-24 06:54:12 -07:00