When a parent macvlan device is destroyed we end up purging its
broadcast queue without dropping the device reference count on
the packet source device. This causes the source device to linger.
This patch drops that reference count.
Fixes: 260916dfb4 ("macvlan: Fix potential use-after free for...")
Reported-by: Joe Ghalam <Joe.Ghalam@dell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan object is re-structured to hold tap related elements in a separate
entity, tap_dev. Upon NETDEV_REGISTER device_event, tap_dev is registered with
idr and fetched again on tap_open. Few of the tap functions are modified to
accepted tap_dev as argument. tap_dev object includes callbacks to be used by
underlying virtual interface to take care of tx and rx accounting.
Signed-off-by: Sainath Grandhi <sainath.grandhi@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev_is_rx_handler_busy() check is a superset of netif_is_ipvlan_port()
check and hence should be preferred.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When free macvlan_port in macvlan_port_destroy, it is safe to free
directly because netdev_rx_handler_unregister could enforce one
grace period.
So it is unnecessary to use kfree_rcu for macvlan_port.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
udplite conflict is resolved by taking what 'net-next' did
which removed the backlog receive method assignment, since
it is no longer necessary.
Two entries were added to the non-priv ethtool operations
switch statement, one in 'net' and one in 'net-next, so
simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
When dev_set_promiscuity failed in macvlan_open, it always invokes
dev_set_allmulti without checking if necessary.
Now check the IFF_ALLMULTI flag firstly before rollback the multicast
setting in the error handler.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function macvlan_forward_source_one has already checked the flag
IFF_UP, so needn't check it outside in macvlan_forward_source too.
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return value of function macvlan_addr_busy is used as bool value,
so use bool value instead of integer number "1" and "0".
Signed-off-by: Gao Feng <gfree.wind@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no existing macvlan port in lowdev, one new macvlan port
would be created. But it doesn't be destoried when something failed later.
It casues some memleak.
Now add one flag to indicate if new macvlan port is created.
Signed-off-by: Gao Feng <fgao@ikuai8.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
closer inspection and testing
macvlan:
- set min/max_mtu
tun:
- set min/max_mtu, remove tun_net_change_mtu
vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function
- This one is also not as straight-forward and could use closer inspection
and testing from vxlan folks
bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function
openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
is the largest possible size supported
sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
macsec:
- min_mtu = 0, max_mtu = 65535
macvlan:
- min_mtu = 0, max_mtu = 65535
ntb_netdev:
- min_mtu = 0, max_mtu = 65535
veth:
- min_mtu = 68, max_mtu = 65535
8021q:
- min_mtu = 0, max_mtu = 65535
CC: netdev@vger.kernel.org
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Tom Herbert <tom@herbertland.com>
CC: Daniel Borkmann <daniel@iogearbox.net>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Paolo Abeni <pabeni@redhat.com>
CC: Jiri Benc <jbenc@redhat.com>
CC: WANG Cong <xiyou.wangcong@gmail.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Pravin B Shelar <pshelar@ovn.org>
CC: Sabrina Dubroca <sd@queasysnail.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Pravin Shelar <pshelar@nicira.com>
CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:
eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
However, this doesn't prevent a warning on a configuration such as:
eth0 <--- macvlan0 <--- vlan0
eth1 <--- vlan1 <--- macvlan1
In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:
- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
take (macvlan_netdev_addr_lock_key, 1)
By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case a qdisc is used on a macvlan device, we need to use different
lockdep classes to avoid false positives.
Use the new netdev_lockdep_set_classes() generic helper.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we always queue a multicast packet for further processing,
even if none of the macvlan devices are subscribed to the address.
This patch optimises this by adding a global multicast filter for
a macvlan_port.
Note that this patch doesn't handle the broadcast addresses of the
individual macvlan devices correctly, if they are not all identical
to vlan->lowerdev. However, this is already broken because there
is no mechanism in place to update the individual multicast filters
when you change the broadcast address.
If someone cares enough they should fix this by collecting all
broadcast addresses for a macvlan as we do for multicast and unicast.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we postpone a broadcast packet we save the source port in
the skb if it is local. However, the source port can disappear
before we get a chance to process the packet.
This patch fixes this by holding a ref count on the netdev.
It also delays the skb->cb modification until after we allocate
the new skb as you should not modify shared skbs.
Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
If macvlan_common_newlink fails in register_netdevice after macvlan_init
then it decrements port->count twice, first in macvlan_uninit (from
register_netdevice or rollback_registered) and then again in
macvlan_common_newlink.
A similar problem may exist in the ipvlan driver.
This patch consolidates modifications to port->count into macvlan_init
and macvlan_uninit (thanks to Eric Biederman for suggesting this approach).
v3: remove macvtap specific bits.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vlan drivers lack proper propagation of gso_max_segs from
lower device.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use IFF_NO_QUEUE to indicate that a device can run without a qdisc.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when a macvlan is being initialized and the lower device is
netif_carrier_ok(), the macvlan device doesn't run through
rfc2863_policy() and is left with UNKNOWN operstate. Fix it by adding an
unconditional linkwatch event for the new macvlan device. Similar fix is
already used by the 8021q device (see register_vlan_dev()). Also fix the
inconsistent state when the lower device has been down and its carrier
was changed (when a device is down NETDEV_CHANGE doesn't get generated).
The second issue can be seen f.e. when we have a macvlan on top of a 8021q
device which has been down and its real device has been changing carrier
states, after setting the 8021q device up, the macvlan device will have
the same carrier state as it was before even though the 8021q can now
have a different state.
Example for case 1:
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
$ ip l add l eth2 macvl0 type macvlan
$ ip l set macvl0 up
$ ip l sh macvl0
72: macvl0@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
noqueue state UNKNOWN mode DEFAULT group default
link/ether f6:0b:54:0a:9d:a3 brd ff:ff:ff:ff:ff:ff
Example for case 2 (order is important):
Prestate: eth2 UP/CARRIER, vlan1 down, vlan1-macvlan down
$ ip l set vlan1-macvlan up
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
[ eth2 loses CARRIER before vlan1 has been UP-ed ]
$ ip l sh eth2
4: eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
$ ip l set vlan1 up
$ ip l sh vlan1
70: vlan1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether 52:54:00:bf:57:16 brd ff:ff:ff:ff:ff:ff
$ ip l sh vlan1-macvlan
71: vlan1-macvlan@vlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500
qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 4a:b8:44:56:b9:b9 brd ff:ff:ff:ff:ff:ff
vlan1-macvlan is still UP, still has carrier and is still in the same
operstate as before. After the patch in case 1 macvl0 has state UP as it
should and in case 2 vlan1-macvlan has state LOWERLAYERDOWN again as it
should. Note that while the lower macvlan device is down their carrier
and thus operstate can go out of sync but that will be fixed once the
lower device goes up again.
This behaviour seems to have been present since beginning of git history.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.
This patch also:
- Cleans up can_checksum_protocol
- Simplifies netdev_intersect_features
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
set of features for offloading all checksums. This is a mask of the
checksum offload related features bits. It is incorrect to set both
NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
features of a device.
This patch:
- Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
NETIF_F_ALL_CSUM is being used as a mask).
- Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.
Fixes: 8a4eb5734e ("net: introduce rx_handler results and logic around that")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.
So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvlan/macvtap devices don't need to segment multiple tagged packets
since the lower devices can segment them.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a macvlan device is placed in promiscuous mode, it currently
just sets it's multicast mask to permissive, but doesn't change
the state of the lower device. As a result, not all multicast
traffic can be received on such device. Additionally, none of
a vlan traffic can be received on such device as well.
This patch propagates the promiscuous mode setting to lower device
so that lower device may receive all packets that macvlan may
be interested in.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't use dev->iflink anymore.
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that there are no more users kill dev_rebuild_header and all of it's
implementations.
This is long overdue.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If device is already used as an ipvlan port then refuse to
use it as a macvlan port at early stage of port creation.
thost1:~# ip link add link eth0 ipvl0 type ipvlan
thost1:~# echo $?
0
thost1:~# ip link add link eth0 mvl0 type macvlan
RTNETLINK answers: Device or resource busy
thost1:~# echo $?
2
thost1:~#
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit fbe168ba91 ("net: generic dev_disable_lro() stacked
device handling"), dev_disable_lro() zeroes NETIF_F_LRO feature flag
first for a macvlan device and then for its lower device. As an attempt
to set NETIF_F_LRO to zero is ignored, dev_disable_lro() issues a
warning and taints kernel.
Allowing NETIF_F_LRO to be set independently of the lower device
consists of three parts:
- add the flag to hw_features to allow toggling it
- allow setting it to 0 even if lower device has the flag set
- add the flag to MACVLAN_FEATURES to restore copying from lower
device on macvlan creation
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do the work of parsing NDA_VLAN directly in rtnetlink code, pass simple
u16 vid to drivers from there.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do header check twice for a dodgy packet. One is done before
macvlan_start_xmit(), another is done before lower device's
ndo_start_xmit(). The first one seems redundant so this patch tries to
delay header check until a packet reaches its lower device (or macvtap)
through always enabling NETIF_F_GSO_ROBUST for macvlan device.
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to cancel the work queue after rcu grace period,
otherwise it can be rescheduled by incoming packets.
We need to purge queue if some skbs are still in it.
We can use __skb_queue_head_init() variant in
macvlan_process_broadcast()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 412ca1550c ("macvlan: Move broadcasts into a work queue")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netif_rx() call on the fast path of macvlan_handle_frame() appears to
be there to ensure that we properly throttle incoming packets. However, it
would appear as though the proper throttling is already in place for all
possible ingress paths, and that the call is redundant. If packets are arriving
from the physical NIC, we've already throttled them by this point. Otherwise,
if they are coming via macvlan_queue_xmit(), it calls either
'dev_forward_skb()', which ends up calling netif_rx_internal(), or else in
the broadcast case, we are throttling via macvlan_broadcast_enqueue().
The test results below are from off the box to an lxc instance running macvlan.
Once the tranactions/sec stop increasing, the cpu idle time has gone to 0.
Results are from a quad core Intel E3-1270 V2@3.50GHz box with bnx2x 10G card.
for i in {10,100,200,300,400,500};
do super_netperf $i -H $ip -t TCP_RR; done
Average of 5 runs.
trans/sec trans/sec
(3.17-rc7-net-next) (3.17-rc7-net-next + this patch)
---------- ----------
208101 211534 (+1.6%)
839493 850162 (+1.3%)
845071 844053 (-.12%)
816330 819623 (+.4%)
778700 789938 (+1.4%)
735984 754408 (+2.5%)
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pass last argument to macvlan_count_rx() as the correct bool type.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.
Current handling of IFF_XMIT_DST_RELEASE is not optimal.
Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y
The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.
As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.
This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.
Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :
dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
netif_keep_dst(dev);
Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.
The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a new mode of operation to macvlan, called "source".
It allows one to set a list of allowed mac address, which is used
to match against source mac address from received frames on underlying
interface.
This enables creating mac based VLAN associations, instead of standard
port or tag based. The feature is useful to deploy 802.1x mac based
behavior, where drivers of underlying interfaces doesn't allows that.
Configuration is done through the netlink interface using e.g.:
ip link add link eth0 name macvlan0 type macvlan mode source
ip link add link eth0 name macvlan1 type macvlan mode source
ip link set link dev macvlan0 type macvlan macaddr add 00:11:11:11:11:11
ip link set link dev macvlan0 type macvlan macaddr add 00:22:22:22:22:22
ip link set link dev macvlan0 type macvlan macaddr add 00:33:33:33:33:33
ip link set link dev macvlan1 type macvlan macaddr add 00:33:33:33:33:33
ip link set link dev macvlan1 type macvlan macaddr add 00:44:44:44:44:44
This allows clients with MAC addresses 00:11:11:11:11:11,
00:22:22:22:22:22 to be part of only VLAN associated with macvlan0
interface. Clients with MAC addresses 00:44:44:44:44:44 with only VLAN
associated with macvlan1 interface. And client with MAC address
00:33:33:33:33:33 to be associated with both VLANs.
Based on work of Stefan Gula <steweg@gmail.com>
v8: last version of Stefan Gula for Kernel 3.2.1
v9: rework onto linux-next 2014-03-12 by Michael Braun
add MACADDR_SET command, enable to configure mac for source mode
while creating interface
v10:
- reduce indention level
- rename source_list to source_entry
- use aligned 64bit ether address
- use hash_64 instead of addr[5]
v11:
- rebase for 3.14 / linux-next 20.04.2014
v12
- rebase for linux-next 2014-09-25
Signed-off-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 412ca1550c ("macvlan: Move broadcasts into a work queue"), the
driver uses tx_queue_len of the master device as the limit of packets enqueuing.
Problem is that virtual drivers have this value set to 0, thus all broadcast
packets were rejected.
Because tx_queue_len was arbitrarily chosen, I replace it with a static limit
of 1000 (also arbitrarily chosen).
CC: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Thibaut Collet <thibaut.collet@6wind.com>
Suggested-by: Thibaut Collet <thibaut.collet@6wind.com>
Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I cannot move a macvlan interface created on top of a bonding interface
to a different namespace:
% ip netns add dummy0
% ip link add link bond0 mac0 type macvlan
% ip link set mac0 netns dummy0
RTNETLINK answers: Invalid argument
%
The problem seems to be that commit f939981492 ("bonding: Don't allow
bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
on bonding interfaces, and commit 797f87f83b ("macvlan: fix netdev
feature propagation from lower device") causes macvlan interfaces
to inherit its features from the lower device.
NETIF_F_NETNS_LOCAL should not be inherited from the lower device
by a macvlan.
Patch tested on 3.16.
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, macvlan code restricts multicast and unicast
filter setting only to passthru devices. As a result,
if a guest using macvtap wants to receive multicast
traffic, it has to set IFF_ALLMULTI or IFF_PROMISC.
This patch makes it possible to use the fdb interface
to add multicast addresses to the filter thus allowing
a guest to receive only targeted multicast traffic.
CC: John Fastabend <john.r.fastabend@intel.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvlan devices do not initialize vlan_features. As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level
device.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
net/core/rtnetlink.c
net/core/skbuff.c
Both conflicts were very simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
unregister_netdevice_many() API is error prone and we had too
many bugs because of dangling LIST_HEAD on stacks.
See commit f87e6f4793 ("net: dont leave active on stack LIST_HEAD")
In fact, instead of making sure no caller leaves an active list_head,
just force a list_del() in the callee. No one seems to need to access
the list after unregister_netdevice_many()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding and team drivers generate specific events during failover
that trigger switch updates. When a macvlan device is configured
on top of bonding, we want switches to learn about the macvlan
devices as well. This patch adds a handler to macvlan driver to
propagate these events to all macvlan devices. We let the generic
inetdev event handler do the work.
This allows macvlan to operated correctly over active-backup
mode bond.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add netpoll support to macvlan devices. Based on the netpoll support in the 802.1q vlan code.
Tested and macvlan could work well with netconsole.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macvlan dev should always have the same mac address like lowerdev
when in the passthru mode, change the mac address alone will break the
work mechanism, so when the lowerdev or macvlan mac address changes,
we should propagate the changes to another dev.
v1->v2: Allow macvlan dev to change mac address for passthru mode and propagate to
lowerdev.
v2->v3: Don't set the mac address to the lower dev's unicast address for
passthru mode when mac address changes.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
The port->count was used to count the number of macvlan devs
in the same port, but the list vlans could play the same role
to do that, so free the port if the list vlans is empty and
no need to use the parameter count.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the physical MTU changes we should ensure that all existing MACVLAN
dev MTU do not exceed the new lowerdev MTU. This patch adds that
propagation.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti
overflow on the underlying interface.
Attempting the set IFF_ALLMULTI on the underlying interface would cause an
error and the log message:
"allmulti touches root, set allmulti failed."
Signed-off-by: Peter Christensen <pch@ordbogen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 12a2856b60.
The commit above doesn't appear to be necessary any more as the
checksums appear to be correctly computed/validated.
Additionally the above commit breaks kvm configurations where
one VM is using a device that support checksum offload (virtio) and
the other VM does not.
In this case, packets leaving virtio device will have CHECKSUM_PARTIAL
set. The packets is forwarded to a macvtap that has offload features
turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not
update the checksum and thus a bad checksum is passed up to
the guest.
CC: Daniel Lezcano <daniel.lezcano@free.fr>
CC: Patrick McHardy <kaber@trash.net>
CC: Andrian Nord <nightnord@gmail.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Michael S. Tsirkin <mst@redhat.com>
CC: Jason Wang <jasowang@redhat.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent patch that moved broadcasts to process context added
a couple of bugs on the error path where we may dereference NULL
or leak an skb. This patch fixes them.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently broadcasts are handled in network RX context, where
the packets are sent through netif_rx. This means that the number
of macvlans will be constrained by the capacity of netif_rx.
For example, setting up 4096 macvlans practically causes all
broadcast packets to be dropped as the default netif_rx queue
size simply can't handle 4096 skbs being stuffed into it all
at once.
Fundamentally, we need to ensure that the amount of work handled
in each netif_rx backlog run is constrained. As broadcasts are
anything but constrained, it either needs to be limited per run
or moved to process context.
This patch picks the second option and moves all broadcast handling
bar the trivial case of packets going to a single interface into
a work queue. Obviously there also needs to be a limit on how
many broadcast packets we postpone in this way. I've arbitrarily
chosen tx_queue_len of the master device as the limit (act_mirred
also happens to use this parameter in a similar way).
In order to ensure we don't exceed the backlog queue we will use
netif_rx_ni instead of netif_rx for broadcast packets.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the bh safe variant with the hard irq safe variant.
We need a hard irq safe variant to deal with netpoll transmitting
packets from hard irq context, and we need it in most if not all of
the places using the bh safe variant.
Except on 32bit uni-processor the code is exactly the same so don't
bother with a bh variant, just have a hard irq safe variant that
everyone can use.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/recv.c
drivers/net/wireless/mwifiex/pcie.c
net/ipv6/sit.c
The SIT driver conflict consists of a bug fix being done by hand
in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper
was created (netdev_alloc_pcpu_stats()) which takes care of this.
The two wireless conflicts were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvlan currently inherits all of its features from the lower
device. When lower device disables offload support, this causes
macvlan to disable offload support as well. This causes
performance regression when using macvlan/macvtap in bridge
mode.
It can be easily demonstrated by creating 2 namespaces using
macvlan in bridge mode and running netperf between them:
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 20.00 1204.61
To restore the performance, we add software offload features
to the list of "always_on" features for macvlan. This way
when a namespace or a guest using macvtap initially sends a
packet, this packet will not be segmented at macvlan level.
It will only be segmented when macvlan sends the packet
to the lower device.
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 20.00 5507.35
Fixes: 6acf54f1cf (macvtap: Add support of packet capture on macvtap device.)
Fixes: 797f87f83b (macvlan: fix netdev feature propagation from lower device)
CC: Florian Westphal <fw@strlen.de>
CC: Christian Borntraeger <borntraeger@de.ibm.com>
CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.c
Two minor conflicts in bonding, both of which were overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rtnl_newlink() doesn't unregister it for us on failure.
Cc: Patrick McHardy <kaber@trash.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:
- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
instead of lower device which misses the necessary txq synchronization for
lower device such as txq stopping or frozen required by dev watchdog or
control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
when tso is disabled for lower device.
Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.
With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.
In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
L2 fowarding offload will bypass the rx handler of real device. This will make
the packet could not be forwarded to macvtap device. Another problem is the
dev_hard_start_xmit() called for macvtap does not have any synchronization.
Fix this by forbidding L2 forwarding for macvtap.
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: John Fastabend <john.r.fastabend@intel.com.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only used in one file, no need to expose
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are inconsistencies wrt. feature propagation/inheritance between
macvlan and the underlying interface.
When a feature is turned off on the real device before a macvlan is
created on top, these will remain enabled on the macvlan device, whereas
turning off the feature on the lower device after macvlan creation the
kernel will propagate the changes to the macvlan.
The second issue is that, when propagating changes from underlying device
to the macvlan interface, macvlan can erronously lose its NETIF_F_LLTX flag,
as features are anded with the underlying device.
However, LLTX should be kept since it has no dependencies on physical
hardware (LLTX is set on macvlan creation regardless of the lower
device properties, see 8ffab51b3d
(macvlan: lockless tx path).
The LLTX flag is now forced regardless of user settings in absence of
layer2 hw acceleration (a6cc0cfa72,
net: Add layer 2 hardware acceleration operations for macvlan devices).
Use netdev_increment_features to rebuild the feature set on capability
changes on either the lower device or on the macvlan interface.
As pointed out by Ben Hutchings, use netdev_update_features on
NETDEV_FEAT_CHANGE event (it calls macvlan_fix_features/netdev_features_change
if needed).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now macvlan and macvtap use the same receive and
forward handlers, we can remove them completely and use
netif_rx and dev_forward_skb() directly.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running in a network namespace whose only link to the outside
world is a macvlan device, not being able to create a macvtap off of
it is a real pain.
So modify macvtap creation to automatically forward a creation of a
macvtap on a macvlan to become a creation of a macvtap on the
underlying network device, just like is currently done with
macvlan-on-macvlan devices.
v2: Use netif_is_macvlan and macvlan_dev_real_dev helpers to make it
more clear what we're doing.
Signed-off-by: Kevin Wallace <kevin@pentabarf.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull core locking changes from Ingo Molnar:
"The biggest changes:
- add lockdep support for seqcount/seqlocks structures, this
unearthed both bugs and required extra annotation.
- move the various kernel locking primitives to the new
kernel/locking/ directory"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
block: Use u64_stats_init() to initialize seqcounts
locking/lockdep: Mark __lockdep_count_forward_deps() as static
lockdep/proc: Fix lock-time avg computation
locking/doc: Update references to kernel/mutex.c
ipv6: Fix possible ipv6 seqlock deadlock
cpuset: Fix potential deadlock w/ set_mems_allowed
seqcount: Add lockdep functionality to seqcount/seqlock structures
net: Explicitly initialize u64_stats_sync structures for lockdep
locking: Move the percpu-rwsem code to kernel/locking/
locking: Move the lglocks code to kernel/locking/
locking: Move the rwsem code to kernel/locking/
locking: Move the rtmutex code to kernel/locking/
locking: Move the semaphore core to kernel/locking/
locking: Move the spinlock code to kernel/locking/
locking: Move the lockdep code to kernel/locking/
locking: Move the mutex code to kernel/locking/
hung_task debugging: Add tracepoint to report the hang
x86/locking/kconfig: Update paravirt spinlock Kconfig description
lockstat: Report avg wait and hold times
lockdep, x86/alternatives: Drop ancient lockdep fixup message
...
Add a operations structure that allows a network interface to export
the fact that it supports package forwarding in hardware between
physical interfaces and other mac layer devices assigned to it (such
as macvlans). This operaions structure can be used by virtual mac
devices to bypass software switching so that forwarding can be done
in hardware more efficiently.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to enable lockdep on seqcount/seqlock structures, we
must explicitly initialize any locks.
The u64_stats_sync structure, uses a seqcount, and thus we need
to introduce a u64_stats_init() function and use it to initialize
the structure.
This unfortunately adds a lot of fairly trivial initialization code
to a number of drivers. But the benefit of ensuring correctness makes
this worth while.
Because these changes are required for lockdep to be enabled, and the
changes are quite trivial, I've not yet split this patch out into 30-some
separate patches, as I figured it would be better to get the various
maintainers thoughts on how to best merge this change along with
the seqcount lockdep enablement.
Feedback would be appreciated!
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Mirko Lindner <mlindner@marvell.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Roger Luethi <rl@hellgate.ch>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Simon Horman <horms@verge.net.au>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After the commit below attempting to create macvlan devices was
resulting in ENOENT errors,
# ip link add link p3p2 type macvlan
RTNETLINK answers: Invalid argument
This happens because netdev_upper_dev_link() is called before
register_netdevice() in the macvlan code. Through a call chain
this results in a call to __netdev_adjacent_dev_insert() and
finally a sysfs_create_link(). This requires the kobject of
the macvlan to be registered which is done in register_netdevice().
If there is no kobject which is the case here the ENOENT error
is seen on the command line.
To resolve this move the netdev_upper_dev_link() call below
the register_netdevice() call. This aligns with vlan driver
flow.
Regression introduced here,
commit 5831d66e80
Author: Veaceslav Falico <vfalico@redhat.com>
Date: Wed Sep 25 09:20:32 2013 +0200
net: create sysfs symlinks for neighbour devices
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently macvlan calls skb_clone in macvlan_broadcast but checks
for a NULL return in macvlan_broadcast_one instead. This is
needlessly confusing and may lead to bugs introduced later.
This patch moves the error check to where the skb_clone call is.
The only other caller of macvlan_broadcast_one never passes in a
NULL value so it doesn't need the check either.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Thanks,
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
A device inheriting a random or set address should reflect this in
its addr_assign_type.
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit 3b04ddde02
"[NET]: Move hardware header operations out of netdevice."
moved the handling into macvlan setup adding
dev->header_ops = &macvlan_hard_header_ops,
At the end of the line the ',' should have been a ';'
Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit df8ef8f3aa
macvlan: add FDB bridge ops and macvlan flags
added a flags field to macvlan, which can be
controlled from userspace.
The idea is to make the interface future-proof
so we can add flags and not new fields.
However, flags value isn't validated, as a result,
userspace can't detect which flags are supported.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's quite unlikely that dev_set_promiscuity will fail,
but worth checking just in case.
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan passthrough mode is special: it's not possible to switch to or
from it through a netlink command.
But if you try, the command will succeed, which is
confusing.
Validate input and return error to user.
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for iproute2 command 'bridge fdb replace ...'.
The rtnletlink call back function ndo_fdb_add will be called
with the NLM_F_REPLACE flag set.
Simply return -EOPNOTSUP.
Resubmitted because net-next was closed last week.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user issues TUNSETOFFLOAD ioctl, macvtap does not do
anything other then to verify arguments. This patch adds
functionality to allow users to actually control offload features.
NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the
features can be controlled.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/wireless/ath/ath9k/Kconfig
drivers/net/xen-netback/netback.c
net/batman-adv/bat_iv_ogm.c
net/wireless/nl80211.c
The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.
The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().
Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.
The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.
However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.
To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.
Signed-off-by: David S. Miller <davem@davemloft.net>
commit df8ef8f3aa
"macvlan: add FDB bridge ops and macvlan flags"
added a way to control NOPROMISC macvlan flag through netlink.
However, with a non passthrough device we never set promisc on open,
even if NOPROMISC is off. As a result:
If userspace clears NOPROMISC on open, then does not clear it on a
netlink command, promisc counter is not decremented on stop and there
will be no way to clear it once macvlan is detached.
If userspace does not clear NOPROMISC on open, then sets NOPROMISC on a
netlink command, promisc counter will be decremented from 0 and overflow
to fffffffff with no way to clear promisc.
To fix, simply ignore NOPROMISC flag in a netlink command for
non-passthrough devices, same as we do at open/close.
Since we touch this code anyway - check dev_set_promiscuity return code
and pass it to users (though an error here is unlikely).
Cc: "David S. Miller" <davem@davemloft.net>
Reviewed-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, only net_device * could be passed along with netdevice notifier
event. This patch provides a possibility to pass custom structure
able to provide info that event listener needs to know.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
v2->v3: fix typo on simeth
shortened dev_getter
shortened notifier_info struct name
v1->v2: fix notifier_call parameter in call_netdevice_notifier()
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if macvlan in passthru mode is created and data are rxed and
you remove this device, following panic happens:
NULL pointer dereference at 0000000000000198
IP: [<ffffffffa0196058>] macvlan_handle_frame+0x153/0x1f7 [macvlan]
I'm using following script to trigger this:
<script>
while [ 1 ]
do
ip link add link e1 name macvtap0 type macvtap mode passthru
ip link set e1 up
ip link set macvtap0 up
IFINDEX=`ip link |grep macvtap0 | cut -f 1 -d ':'`
cat /dev/tap$IFINDEX >/dev/null &
ip link del dev macvtap0
done
</script>
I run this script while "ping -f" is running on another machine to send
packets to e1 rx.
Reason of the panic is that list_first_entry() is blindly called in
macvlan_handle_frame() even if the list was empty. vlan is set to
incorrect pointer which leads to the crash.
I'm fixing this by protecting port->vlans list by rcu and by preventing
from getting incorrect pointer in case the list is empty.
Introduced by: commit eb06acdc85 "macvlan: Introduce 'passthru' mode to takeover the underlying device"
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- macvlan: propagate STAG filtering capabilities from underlying device
- ifb: announce STAG tagging support in addition to CTAG tagging support
- veth: announce STAG tagging/stripping support in addition to CTAG support
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the rx_{add,kill}_vid callbacks to take a protocol argument in
preparation of 802.1ad support. The protocol argument used so far is
always htons(ETH_P_8021Q).
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we use proper API to fetch dev->rx_handler_data,
instead of ugly casts.
Rename macvlan_port_get() to macvlan_port_get_rtnl() to document fact
that we hold RTNL when needed, with lockdep support.
This removes sparse warnings as well (CONFIG_SPARSE_RCU_POINTER=y)
CHECK drivers/net/macvlan.c
drivers/net/macvlan.c:706:37: warning: cast removes address space of expression
drivers/net/macvlan.c:775:16: warning: cast removes address space of expression
drivers/net/macvlan.c:924:16: warning: cast removes address space of expression
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Macvlan already supports hw address filters. Set the IFF_UNICAST_FLT
so that it doesn't needlesly enter PROMISC mode when macvlans are
stacked.
Signed-of-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When a user adds bridge neighbors, allow him to specify VLAN id.
If the VLAN id is not specified, the neighbor will be added
for VLANs currently in the ports filter list. If no VLANs are
configured on the port, we use vlan 0 and only add 1 entry.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some multicast addresses are common to all macvlans,
so if a multicast message has a hash value collision, we
have to deliver a copy to all macvlans, adding significant
latency and possible packet drops if netdev_max_backlog
limit is hit.
Having a per macvlan hash function permits to reduce the
impact of hash collisions.
Suggested-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit cd431e7385 (macvlan: add multicast filter) forgot
the broadcast case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
SIgned-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>