linux_dsm_epyc7002/drivers/net/ethernet
Jon Maxwell f37bd0cced cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure
We recently had a system crash in the cnic module. Vmcore analysis confirmed
that "ip link up" was executed which failed due to an allocation failure
because of memory fragmentation. Futher analysis revealed that the cnic irq
vector was still allocated after the "ip link up" that failed. When
"ip link down" was executed it called free_msi_irqs() which crashed the system
because the cnic irq was still inuse.

PANIC: "kernel BUG at drivers/pci/msi.c:411!"

The code execution was:

cnic_netdev_event()
if (event == NETDEV_UP) {
.
.
       ▹       if (!cnic_start_hw(dev))
cnic_start_hw()
calls cnic_cm_open() which failed with -ENOMEM
cnic_start_hw() then took the err1 path:

err1:↩
       cp->free_resc(dev);↩ <---- frees resources but not irq vector
       pci_dev_put(dev->pcidev);↩
       return err;↩
}↩

This returns control back to cnic_netdev_event() but now the cnic irq vector
is still allocated even although cnic_cm_open() failed. The next
"ip link down" while trigger the crash.

The cnic_start_hw() routine is not handling the allocation failure correctly.
Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating
that the hardware has been started in cnic_start_hw(). If it has then call
cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise
just maintain the previous behaviour and free cnic resources.

I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
code.

# ip link set dev enpX down
# ip link set dev enpX up <--- hit's allocation failure
# ip link set dev enpX down <--- crashes here

With this patch I confirmed there was no crash in the reproducer.

Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 15:44:54 -04:00
..
3com treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
8390 treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
adaptec treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
adi treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
aeroflex
agere treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
allwinner treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
alteon
altera net: eth: altera: do not free array priv->mdio->irq 2016-03-06 22:59:18 -05:00
amd treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
apm drivers: net: xgene: constify xgene_cle_ops structure 2016-05-03 13:03:05 -04:00
apple
arc net: arc: trivial: cleanup the emac driver 2016-03-16 19:28:01 -04:00
atheros treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
aurora net: ethernet: nb8800: support fixed-link DT node 2016-02-24 11:32:11 -05:00
broadcom cnic: call cp->stop_hw() in cnic_start_hw() on allocation failure 2016-05-06 15:44:54 -04:00
brocade bna: fix list corruption 2016-03-01 15:19:43 -05:00
cadence Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-05-04 00:52:29 -04:00
calxeda
cavium treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
chelsio treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
cirrus
cisco enic: set netdev->vlan_features 2016-04-18 14:53:21 -04:00
davicom treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
dec treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
dlink treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
emulex benet: be_resume needs to protect be_open with rtnl_lock 2016-04-21 15:35:07 -04:00
ezchip net: ezchip: adapt driver to little endian architecture 2016-03-03 17:20:08 -05:00
faraday net: ethernet: faraday: Use phy_find_first() instead of open coding it 2016-01-10 22:05:30 -05:00
freescale treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
fujitsu treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
hisilicon treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
hp treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
i825xx treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
ibm treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
intel Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 2016-05-04 17:13:34 -04:00
marvell treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
mediatek net: mediatek: do not set the QID field in the TX DMA descriptors 2016-04-12 22:41:33 -04:00
mellanox net/mlx4: Avoid wrong virtual mappings 2016-05-05 23:23:05 -04:00
micrel treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
microchip treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
moxa treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
myricom myri10ge: fix sleeping with bh disabled 2016-04-28 14:21:14 -04:00
natsemi treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
neterion drivers: net: remove NETDEV_TX_LOCKED 2016-04-26 15:53:05 -04:00
netronome nfp: add async reconfiguration mechanism 2016-04-16 22:34:40 -04:00
nuvoton treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
nvidia forcedeth: Use setup_timer() 2016-02-25 16:51:05 -05:00
nxp net: lpc_eth: Remove unused variables 2016-01-10 22:50:14 -05:00
oki-semi pch_gbe: replace private tx ring lock with common netif_tx_lock 2016-04-28 17:19:58 -04:00
packetengines treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
pasemi pasemi_mac: Replace LRO with GRO 2016-02-17 16:15:45 -05:00
qlogic treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
qualcomm treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
rdc
realtek treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
renesas ravb: Remove rx buffer ALIGN 2016-05-03 12:45:13 -04:00
rocker rocker: move ageing_time from struct rocker to struct ofdpa 2016-03-12 20:11:13 -05:00
samsung net: sxgbe: fix error paths in sxgbe_platform_probe() 2016-03-27 22:39:22 -04:00
seeq treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
sfc sfc: disable RSS when unsupported 2016-04-28 14:21:15 -04:00
sgi treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
silan
sis treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
smsc treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
stmicro stmmac: dwmac-socfpga: kill init() and rename setup() to set_phy_mode() 2016-05-03 15:22:20 -04:00
sun treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
synopsys treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
tehuti treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
ti treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
tile drivers: fix dev->trans_start removal fallout 2016-05-04 17:07:14 -04:00
toshiba treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
tundra net: tsi108: use NULL for pointer-typed argument 2016-04-26 01:10:26 -04:00
via treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
wiznet treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
xilinx treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
xircom treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
xscale
dnet.c
dnet.h
ec_bhf.c
ethoc.c net/ethoc: do not free array priv->mdio->irq 2016-03-06 22:58:51 -05:00
fealnx.c treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
jme.c drivers/net/ethernet/jme.c: Deinline jme_reset_mac_processor, save 2816 bytes 2016-04-13 22:57:00 -04:00
jme.h
Kconfig netdev: Move octeon/octeon_mgmt driver to cavium directory. 2016-03-18 18:25:30 -04:00
korina.c treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
lantiq_etop.c treewide: replace dev->trans_start update with helper 2016-05-04 14:16:49 -04:00
Makefile netdev: Move octeon/octeon_mgmt driver to cavium directory. 2016-03-18 18:25:30 -04:00
netx-eth.c