Only use the primary address of the bond device
for master_ip. This will prevent changing the ARP source
address in Active-Backup mode whenever a secondry address
is added to the bond device.
Signed-off-by: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
When changing mode via bonding's sysfs, the slaves are not initialized
correctly. Forbid to change modes with slaves present to ensure that every
slave is initialized correctly via bond_enslave().
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
forcedeth: fix a few sparse warnings (variable shadowing)
forcedeth: Improve stats counters
forcedeth: remove unneeded stats updates
forcedeth: Acknowledge only interrupts that are being processed
forcedeth: fix race when unloading module
MAINTAINERS/rds: update maintainer
wanrouter: Remove kernel_lock annotations
usbnet: fix oops in usbnet_start_xmit
ixgbe: Fix compile for kernel without CONFIG_PCI_IOV defined
etherh: Add MAINTAINERS entry for etherh
bonding: comparing a u8 with -1 is always false
sky2: fix regression on Yukon Optima
netlink: clarify attribute length check documentation
netlink: validate NLA_MSECS length
i825xx:xscale:8390:freescale: Fix Kconfig dependancies
macvlan: receive multicast with local address
tg3: Update version to 3.121
tg3: Eliminate timer race with reset_task
tg3: Schedule at most one tg3_reset_task run
tg3: Obtain PCI function number from device
...
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include <linux/module.h>
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include <linux/module.h>
net: sch_generic remove redundant use of <linux/module.h>
net: inet_timewait_sock doesnt need <linux/module.h>
...
Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h
slave->duplex is a u8 type so the in bond_info_show_slave() when we
check "if (slave->duplex == -1)", it's always false.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zheng Liang(lzheng@redhat.com) found a bug that if we config bonding with
arp monitor, sometimes bonding driver cannot get the speed and duplex from
its slaves, it will assume them to be 100Mb/sec and Full, please see
/proc/net/bonding/bond0.
But there is no such problem when uses miimon.
(Take igb for example)
I find that the reason is that after dev_open() in bond_enslave(),
bond_update_speed_duplex() will call igb_get_settings()
, but in that function,
it runs ethtool_cmd_speed_set(ecmd, -1); ecmd->duplex = -1;
because igb get an error value of status.
So even dev_open() is called, but the device is not really ready to get its
settings.
Maybe it is safe for us to call igb_get_settings() only after
this message shows up, that is "igb: p4p1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: RX".
So I prefer to update the speed and duplex for a slave when reseices
NETDEV_CHANGE/NETDEV_UP event.
Changelog
V2:
1 remove the "fake 100/Full" logic in bond_update_speed_duplex(),
set speed and duplex to -1 when it gets error value of speed and duplex.
2 delete the warning in bond_enslave() if bond_update_speed_duplex() returns
error.
3 make bond_info_show_slave() handle bad values of speed and duplex.
Signed-off-by: Weiping Pan <wpan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These were getting the macros from an implicit module.h
include via device.h, but we are planning to clean that up.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
drivers/net: Add export.h to wireless/brcm80211/brcmfmac/bcmsdh.c
This relatively recently added file uses EXPORT_SYMBOL and hence
needs export.h included so that it is compatible with the module.h
split up work.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch resolves two sets of race conditions.
Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> reported the
first, as follows:
The bond_close() calls cancel_delayed_work() to cancel delayed works.
It, however, cannot cancel works that were already queued in workqueue.
The bond_open() initializes work->data, and proccess_one_work() refers
get_work_cwq(work)->wq->flags. The get_work_cwq() returns NULL when
work->data has been initialized. Thus, a panic occurs.
He included a patch that converted the cancel_delayed_work calls
in bond_close to flush_delayed_work_sync, which eliminated the above
problem.
His patch is incorporated, at least in principle, into this
patch. In this patch, we use cancel_delayed_work_sync in place of
flush_delayed_work_sync, and also convert bond_uninit in addition to
bond_close.
This conversion to _sync, however, opens new races between
bond_close and three periodically executing workqueue functions:
bond_mii_monitor, bond_alb_monitor and bond_activebackup_arp_mon.
The race occurs because bond_close and bond_uninit are always
called with RTNL held, and these workqueue functions may acquire RTNL to
perform failover-related activities. If bond_close or bond_uninit is
waiting in cancel_delayed_work_sync, deadlock occurs.
These deadlocks are resolved by having the workqueue functions
acquire RTNL conditionally. If the rtnl_trylock() fails, the functions
reschedule and return immediately. For the cases that are attempting to
perform link failover, a delay of 1 is used; for the other cases, the
normal interval is used (as those activities are not as time critical).
Additionally, the bond_mii_monitor function now stores the delay
in a variable (mimicing the structure of activebackup_arp_mon).
Lastly, all of the above renders the kill_timers sentinel moot,
and therefore it has been removed.
Tested-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes a network namespace misfeature that bonding_masters looked at
current instead of the remembering the context where in which
/sys/class/net/bonding_masters was opened in to see which network
namespace to act upon.
This removes the need for sysfs to handle tagged directories with
untagged members allowing for a conceptually simpler sysfs
implementation.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.
The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem. However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond->recv_probe is called in bond_handle_frame() when
a packet is received, but bond_close() sets it to NULL. So,
a panic occurs when both functions work in parallel.
Why this happen:
After null pointer check of bond->recv_probe, an sk_buff is
duplicated and bond->recv_probe is called in bond_handle_frame.
So, a panic occurs when bond_close() is called between the
check and call of bond->recv_probe.
Patch:
This patch uses a local function pointer of bond->recv_probe
in bond_handle_frame(). So, it can avoid the null pointer
dereference.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a test where a pair of bonding interfaces using ARP monitoring
were both brought up and torn down (with an rmmod) repeatedly, a panic
in the timer code was noticed. I tracked this down and determined that
any of the bonding functions that ran as workqueue handlers and requeued
more work might not properly exit when the module was removed.
There was a flag protected by the bond lock called kill_timers that is
set when the interface goes down or the module is removed, but many of
the functions that monitor link status now unlock the bond lock to take
rtnl first. There is a chance that another CPU running the rmmod could
get the lock and set kill_timers after the first check has passed.
This patch does not allow any function to queue work that will make
itself run unless kill_timers is not set. I also noticed while doing
this work that bond_resend_igmp_join_requests did not have a check for
kill_timers, so I added the needed call there as well.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Liang Zheng <lzheng@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does several things:
- introduces __ethtool_get_settings which is called from ethtool code and
from drivers as well. Put ASSERT_RTNL there.
- dev_ethtool_get_settings() is replaced by __ethtool_get_settings()
- changes calling in drivers so rtnl locking is respected. In
iboe_get_rate was previously ->get_settings() called unlocked. This
fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same
problem. Also fixed by calling __dev_get_by_index() instead of
dev_get_by_index() and holding rtnl_lock for both calls.
- introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create()
so bnx2fc_if_create() and fcoe_if_create() are called locked as they
are from other places.
- use __ethtool_get_settings() in bonding code
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
v2->v3:
-removed dev_ethtool_get_settings()
-added ASSERT_RTNL into __ethtool_get_settings()
-prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock
around it and __ethtool_get_settings() call
v1->v2:
add missing export_symbol
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits]
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from use of ndo_change_rx_flags in handling change of promisc
and allmulti. No need to store previous state locally.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.
As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.
But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.
I'd like to reset "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.
As for bond_is_active_slave(), I'd like to prepare another patch to handle it.
V2:
Use C style comment.
Move read_lock(&bond->curr_slave_lock).
Replace restore with reset, for active backup mode, it means "restore",
but for other modes, it means "reset".
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bonding device is created via rtnl, it is created with default number
of rx/tx queues. This patch implements callback in bonding so the
correct value (previously specified by bonding module param) is used.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Tue, Jul 26, 2011 at 05:40:27PM -0700, Joe Perches wrote:
> On Tue, 2011-07-26 at 17:37 -0700, Jay Vosburgh wrote:
> > Joe Perches <joe@perches.com> wrote:
> > >I'd prefer you don't separate the format string
> > >into multiple pieces.
> > Why not? To me, it looks easier to read split into sections
> > that don't wrap lines.
>
> Harder to grep for a dmesg and the
> defect rate of these split formats is
> typically higher than single strings
> because of bad spacing between string
> segments.
>
I noticed that you took some time back in late 2009 to 'consolidate' the
split format-strings present in the bonding driver at the time and I've
decided I'm fine to leave them the way they are. The main point of my
patch was to change the output and I would like to get that included.
Here is my updated patch...
Subject: [PATCH net-next-2.6 v2] bonding: reduce noise during init
Many are using sysfs to configure bonding rather than module options, so
there is no need for bonding to throw this warning in normal cases.
Keep the message around when debugging is enabled as it might be useful
for someone desperate enough to enable debugging, but eliminate it
otherwise.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bond contains a device where one name is the subset of another
(eth1 and eth10, for example), one cannot properly set the primary
device or the currently active device.
This was reported and based on work by Takuma Umeya. I also verified
the problem and tested that this fix resolves it.
V2: A few did not like the the current code or my changes, so I
refactored bonding_store_primary and bonding_store_active_slave to be a
bit cleaner, dropped the use of strnicmp since we did not really need
the comparison to be case insensitive, and formatted the input string
from sysfs so a comparison to IFNAMSIZ could be used.
I also discovered an error in bonding_store_active_slave that would
modify bond->primary_slave rather than bond->curr_active_slave before
forcing the bonding driver to choose a new active slave.
V3: Actually sending the proper patch....
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Takuma Umeya <tumeya@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the last patch, We are left in a state in which only drivers calling
ether_setup have IFF_TX_SKB_SHARING set (we assume that drivers touching real
hardware call ether_setup for their net_devices and don't hold any state in
their skbs. There are a handful of drivers that violate this assumption of
course, and need to be fixed up. This patch identifies those drivers, and marks
them as not being able to support the safe transmission of skbs by clearning the
IFF_TX_SKB_SHARING flag in priv_flags
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Karsten Keil <isdn@linux-pingi.de>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Patrick McHardy <kaber@trash.net>
CC: Krzysztof Halasa <khc@pm.waw.pl>
CC: "John W. Linville" <linville@tuxdriver.com>
CC: Greg Kroah-Hartman <gregkh@suse.de>
CC: Marcel Holtmann <marcel@holtmann.org>
CC: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now when all devices are cleaned up, bond can be cleaned up as well
- remove bond->vlgrp
- remove bond_vlan_rx_register
- substitute necessary occurences of vlan_group_get_device
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no software fallback implemented for SCTP or FCoE checksumming,
and so it should not be passed on by software devices like bridge or bonding.
For VLAN devices, this is different. First, the driver for underlying device
should be prepared to get offloaded packets even when the feature is disabled
(especially if it advertises it in vlan_features). Second, devices under
VLANs do not get replaced without tearing down the VLAN first.
This fixes a mess I accidentally introduced while converting bonding to
ndo_fix_features.
NETIF_F_SOFT_FEATURES are removed from BOND_VLAN_FEATURES because they
are unused as of commit 712ae51afd.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for a configuring the minimum number of links that
must be active before asserting carrier. It is similar to the Cisco
EtherChannel min-links feature. This allows setting the minimum number
of member ports that must be up (link-up state) before marking the
bond device as up (carrier on). This is useful for situations where
higher level services such as clustering want to ensure a minimum
number of low bandwidth links are active before switchover.
See:
http://bugzilla.vyatta.com/show_bug.cgi?id=7196
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are enough instances of this:
iph->frag_off & htons(IP_MF | IP_OFFSET)
that a helper function is probably warranted.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we will not see the name of the slave dev in error
message:
[ 388.469446] (null): doesn't support polling, aborting.
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter found that there was a dereference before a check,
added in 56d00c677de0(bonding:delete lacp_fast from ad_bond_info).
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@conan.davemloft.net>
1) the setting of NETIF_F_VLAN_CHALLENGED in bond_del_vlan() is
useless since commit b2a103e6 because bond_fix_features() now
sets NETIF_F_VLAN_CHALLENGED whenever the last slave is being
removed.
2) the code never triggers anyway as vlan_list is never empty
since ad1afb00.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now all received packets are handled by bond_handle_frame,
and arp_mon_pt isn't used any more.
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we use agg_select_timer and ad_work.
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_params->ad_select and ad_bond_info->agg_select_mode have the same
meaning, they are duplicate and need extra synchronization.
__get_agg_selection_mode() get ad_select from bond_params directly.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These is also a bug, that if you modify lacp_rate via sysfs,
and add new slaves in bonding, new slaves won't use the latest lacp_rate,
since ad_bond_info->lacp_fast is initialized only once,
in bond_3ad_initialize().
Since both struct bond_params and ad_bond_info have lacp_fast,
they are duplicate and need extra synchronization.
bond_3ad_bind_slave() can use bond_params->lacp_fast to initialize port.
So we can just remove lacp_fast from struct ad_bond_info.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is bug that when you modify lacp_rate via sysfs,
802.3ad won't use the new value of lacp_rate to transmit packets.
This is because port->actor_oper_port_state isn't changed.
Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding driver is multiqueue enabled, in which each queue represents a slave
to enable optional steering of output frames to given slaves against the default
output policy. However, it needs to reset the skb->queue_mapping prior to
queuing to the physical device or the physical slave (if it is multiqueue) could
wind up transmitting on an unintended tx queue
Change Notes:
v2) Based on first pass review, updated the patch to restore the origional queue
mapping that was found in bond_select_queue, rather than simply resetting to
zero. This preserves the value of queue_mapping when it was set on receive in
the forwarding case which is desireable.
v3) Fixed spelling an casting error in skb->cb
v4) fixed to store raw queue_mapping to avoid double decrement
v5) Eric D requested that ->cb access be wrapped in a macro.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to check for 10, 100, 1000, 10000 explicitly. Just make this
generic and check for invalid values only (similar check is in ethtool
userspace app). This enables correct speed handling for slave devices
with "nonstandard" speeds.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Weiping Pan noticed that the module option description for
xmit_hash_policy was incorrect and was nice enough to post a patch to
fix it. The text was correct, but created a line over 80 characters and
I would rather not add those. I realized I could take a few minutes and
clean up all the descriptions and things would look much better. This
is the result.
Based on patch from Weiping Pan <panweiping3@gmail.com>.
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
CC: Weiping Pan <panweiping3@gmail.com>
Reviewed-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improves the documentation about how IGMP resend parameter
works, fix two missing checks and coding style issues.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This soft lockup was recently reported:
[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
bonding bond5: master_dev is not up in bond_enslave
[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
bonding: bond5: doing slave updates when interface is down.
BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
CPU 12:
Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
be2d
Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
RIP: 0010:[<ffffffff80064bf0>] [<ffffffff80064bf0>]
.text.lock.spinlock+0x26/00
RSP: 0018:ffff810113167da8 EFLAGS: 00000286
RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
Call Trace:
[<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
[<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
[<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
[<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
[<ffffffff8006457b>] __down_write_nested+0x12/0x92
[<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
[<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
[<ffffffff80016b87>] vfs_write+0xce/0x174
[<ffffffff80017450>] sys_write+0x45/0x6e
[<ffffffff8005d28d>] tracesys+0xd5/0xe0
It occurs because we are able to change the slave configuarion of a bond while
the bond interface is down. The bonding driver initializes some data structures
only after its ndo_open routine is called. Among them is the initalization of
the alb tx and rx hash locks. So if we add or remove a slave without first
opening the bond master device, we run the risk of trying to lock/unlock a
spinlock that has garbage for data in it, which results in our above softlock.
Note that sometimes this works, because in many cases an unlocked spinlock has
the raw_lock parameter initialized to zero (meaning that the kzalloc of the
net_device private data is equivalent to calling spin_lock_init), but thats not
true in all cases, and we aren't guaranteed that condition, so we need to pass
the relevant spinlocks through the spin_lock_init function.
Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
the ndo_init path, so they are ready for use by the bond_store_slaves path.
Change notes:
v2) Based on conversation with Jay and Nicolas it seems that the ability to
enslave devices while the bond master is down should be safe to do. As such
this is an outlier bug, and so instead we'll just initalize the errant spinlocks
in the init path rather than the open path, solving the problem. We'll also
remove the warnings about the bond being down during enslave operations, since
it should be safe
v3) Fix spelling error
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: jtluka@redhat.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: nicolas.2p.debian@gmail.com
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/NETDEV_BONDING_DESLAVE/NETDEV_RELEASE/ as Andy suggested.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
V3: rename NETDEV_ENSLAVE to NETDEV_JOIN
Currently we do nothing when we enslave a net device which is running netconsole.
Neil pointed out that we may get weird results in such case, so let's disable
netpoll on the device being enslaved. I think it is too harsh to prevent
the device being ensalved if it is running netconsole.
By the way, this patch also removes the NETDEV_GOING_DOWN from netconsole
netdev notifier, because netpoll will check if the device is running or not
and we don't handle NETDEV_PRE_UP neither.
This patch is based on net-next-2.6.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With some combinations of arch/compiler (e.g. arm-linux-gcc) the sizeof
operator on structure returns value greater than expected. In cases when the
structure is used for mapping PDU fields it may lead to unexpected results
(such as holes and alignment problems in skb data). __packed prevents this
undesired behavior.
Signed-off-by: Vitalii Demianets <vitas@nppfactor.kiev.ua>
Signed-off-by: David S. Miller <davem@davemloft.net>
This should also fix updating of vlan_features and propagating changes to
VLAN devices on the bond.
Side effect: it allows user to force-disable some offloads on the bond
interface.
Note: NETIF_F_VLAN_CHALLENGED is managed by bond_fix_features() now.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>