Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.
This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.
dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse
l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only
user and keep the l3mdev_get_rt6_dst name for consistency with other
hooks.
A follow-on patch adds more code to these functions making them long
for inlined functions.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Lamparter noted a use case where the source address selection fails
to pick an address from a VRF interface - unnumbered interfaces.
Relevant commands from his script:
ip addr add 9.9.9.9/32 dev lo
ip link set lo up
ip link add name vrf0 type vrf table 101
ip rule add oif vrf0 table 101
ip rule add iif vrf0 table 101
ip link set vrf0 up
ip addr add 10.0.0.3/32 dev vrf0
ip link add name dummy2 type dummy
ip link set dummy2 master vrf0 up
--> note dummy2 has no address - unnumbered device
ip route add 10.2.2.2/32 dev dummy2 table 101
ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
tcpdump -ni dummy2 &
And using ping instead of his socat example:
$ ping -I vrf0 -c1 10.2.2.2
ping: Warning: source address might be selected on device other than vrf0.
PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
>From tcpdump:
12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
Note the source address is from lo and is not a VRF local address. With
this patch:
$ ping -I vrf0 -c1 10.2.2.2
PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
>From tcpdump:
12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
Now the source address comes from vrf0.
The ipv4 function for selecting source address takes a const argument.
Removing the const requires touching a lot of places, so instead
l3mdev_master_ifindex_rcu is changed to take a const argument and then
do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
This is similar to what l3mdev_fib_table_rcu does.
IPv6 for unnumbered interfaces appears to be selecting the addresses
properly.
Cc: David Lamparter <david@opensourcerouting.org>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commands run in a vrf context are not failing as expected on a route lookup:
root@kenny:~# ip ro ls table vrf-red
unreachable default
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
ping: Warning: source address might be selected on device other than vrf-red.
PING 10.100.1.254 (10.100.1.254) from 0.0.0.0 vrf-red: 56(84) bytes of data.
--- 10.100.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms
Since the vrf table does not have a route for 10.100.1.254 the ping
should have failed. The saddr lookup causes a full VRF table lookup.
Propogating a lookup failure to the user allows the command to fail as
expected:
root@kenny:~# ping -I vrf-red -c1 -w1 10.100.1.254
connect: No route to host
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add helper to lookup l3mdev master index given a device index.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add operations to retrieve cached IPv6 dst entry from l3mdev device
and lookup IPv6 source address.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add operation to l3mdev to lookup source address for a given flow.
Add support for the operation to VRF driver and convert existing
IPv4 hooks to use the new lookup.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L3 master devices allow users of the abstraction to influence FIB lookups
for enslaved devices. Current API provides a means for the master device
to return a specific FIB table for an enslaved device, to return an
rtable/custom dst and influence the OIF used for fib lookups.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>