Commit Graph

18793 Commits

Author SHA1 Message Date
Arkadi Sharshevsky
af06137892 mlxsw: spectrum_switchdev: Add support for learning FDB through notification
Add support for learning FDB through notification. The driver defers
the hardware update via ordered work queue. Support for stacked devices
is also provided. In case of a successful FDB add a notification is
sent back to bridge.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
1b40dc3d86 mlxsw: spectrum_switchdev: Change switchdev notifier API
The current API for sending switchdev notifications implies only FDB
add/del. In order to support notification about successful FDB offload
the API is changed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
10e23eb299 mlxsw: spectrum: Remove support for bypass bridge port attributes/vlan set
The bridge port attributes/vlan for mlxsw devices should be set only
from bridge code. The vlans are synced totally with the bridge so
there is no need to special dump support.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:27 -04:00
Arkadi Sharshevsky
c7b566cd2e mlxsw: spectrum_switchdev: Add support for querying supported bridge flags
Add support for querying supported bridge flags.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:26 -04:00
Arkadi Sharshevsky
a989cdb473 mlxsw: spectrum: Remove support for bridge FDB learning sync
Currently the mlxsw driver supports an option for disabling syncing
the hardware learned FDBs with the software bridge. This behavior
breaks the bridge offload model and thus it is removed.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:26 -04:00
Arkadi Sharshevsky
6b26b51b1d net: bridge: Add support for notifying devices about FDB add/del
Currently the bridge doesn't notify the underlying devices about new
FDBs learned. The FDB sync is placed on the switchdev notifier chain
because devices may potentially learn FDB that are not directly related
to their ports, for example:

1. Mixed SW/HW bridge - FDBs that point to the ASICs external devices
                        should be offloaded as CPU traps in order to
			perform forwarding in slow path.
2. EVPN - Externally learned FDBs for the vtep device.

Notification is sent only about static FDB add/del. This is done due
to fact that currently this is the only scenario supported by switch
drivers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 14:16:25 -04:00
Jiri Pirko
a5fcf8a6c9 net: propagate tc filter chain index down the ndo_setup_tc call
We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-08 09:55:53 -04:00
Tariq Toukan
808df6a209 net/mlx4_en: Bump driver version
Remove date and bump version for mlx4_en driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:33:01 -04:00
Tariq Toukan
cea2a6d81d net/mlx4_core: Bump driver version
Remove date and bump version for mlx4_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:33:00 -04:00
Ganesh Goudar
1dec4cec9f cxgb4: Fix tids count for ipv6 offload connection
the adapter consumes two tids for every ipv6 offload
connection be it active or passive, calculate tid usage
count accordingly.

Also change the signatures of relevant functions to get
the address family.

Signed-off-by: Rizwan Ansari <rizwana@chelsio.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:14:34 -04:00
Jakub Kicinski
f9380629fa nfp: advertise support for NFD ABI 0.5
NFD ABI 0.5 is equivalent to NFD ABI 3.0 but requires that the
driver checks the APP id symbol and makes sure it can support
given app.  Most advanced apps will likely require control vNIC
(ability to exchange control messages between the driver and
app FW).  Detailed app version checking and capability exchange
is left to app-specific code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:42 -04:00
Jakub Kicinski
02082701b9 nfp: create control vNICs and wire up rx/tx
When driver encounters an nfp_app which has a control message handler
defined, allocate a control vNIC.  This control channel will be used
to exchange data with the application FW such as flow table programming,
statistics and global datapath control.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:42 -04:00
Jakub Kicinski
2c7e41c0b2 nfp: allow non-equal distribution of IRQs
Thus far the code assumed all vNICs will request similar number of IRQs.
This will be no longer true with control vNICs (where 1 IRQ will suffice).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:41 -04:00
Jakub Kicinski
6d4b0d8ed6 nfp: slice the netdev spawning function
We want to be able to create a special vNIC for control messages.
This vNIC should be created before any netdev is registered to allow
nfp_app logic to exchange messages with the FW app before any netdev
is visible to user space.  Unfortunately we can't enable IRQs until
we know how many vNICs we will need to spawn.

Divide the function which spawns netdevs for vNICs into three parts:
 - vNIC/memory allocation;
 - IRQ allocation;
 - netdev init and register.

This will help us insert the initialization of the control channel
after IRQ allocation but before netdev init and register.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:41 -04:00
Jakub Kicinski
21537bc701 nfp: don't clutter init code passing fw_ver around
Reading fw version from the BAR is trivial.  Don't pass it around
through layers of init functions, simply read it again where needed.

This commit has the side effect of each vNIC having the exact NFD
version from its own control memory, rather than all data vNICs
assuming the version of the first one.  This should not result in
user-visible changes, though.  Capabilities of data vNICs of trival
apps are identical.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:41 -04:00
Jakub Kicinski
73e253f0e5 nfp: map all queue controllers at once
RX and TX queue controllers are interleaved.  Instead of creating
two mappings which map the same area at slightly different offset,
create only one mapping.  Always map all queue controllers to simplify
the code and allow reusing the mapping for non-data vNICs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:40 -04:00
Jakub Kicinski
c24ca95ff6 nfp: make vNIC ctrl memory mapping function reusable
We will soon need to map control vNIC PCI memory as well as data vNIC
memory.  Make the function for mapping areas pointed to by an RTsym
reusable.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:40 -04:00
Jakub Kicinski
77ece8d5f1 nfp: add control vNIC datapath
Since control vNICs don't have a netdev, they can't use napi and
queuing stack provides.  Add simple tasklet-based data receive
and send of control messages with queuing on a skb_list.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:40 -04:00
Jakub Kicinski
5c0dbe9ecf nfp: prepare config and enable for working without netdevs
Out of the three stages of ifup/ifdown (allocate, configure, start)
- this commit prepares the configuration stage for working with
control vNICs.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:39 -04:00
Jakub Kicinski
a7b1ad0875 nfp: allow allocation and initialization of netdev-less vNICs
vNICs used for sending and receiving control messages shouldn't
really have a netdev.  Add the ability to initialize vNICs for
netdev-less operation.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:39 -04:00
Jakub Kicinski
042f4ba62f nfp: make sure debug accesses don't depend on netdevs
We want to be able to inspect the state of descriptor rings of
the control vNIC, so it will use the same interface as data vNICs.

Make sure the code doesn't use netdevs to determine state
of the rings and names things appropriately.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:39 -04:00
Jakub Kicinski
c821e61789 nfp: prepare print macros for use without netdev
To be able to reuse print macros easily with control vNICs make the
macros check if netdev pointer is populated and use dev_* print
functions otherwise.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:38 -04:00
Jakub Kicinski
cd083ce158 nfp: move nfp_net_vecs_init()
Move nfp_net_vecs_init() after all datapath functions.  We will need
to init poll() callbacks from this function soon.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:38 -04:00
Jakub Kicinski
4621199dbf nfp: reuse ring free code on close
On the close path reuse the ring free helpers introduced for runtime
reconfiguration.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:38 -04:00
Jakub Kicinski
ee26756d01 nfp: split out the allocation part of open
Our open/close implementations have 3 stages:
 - allocation/freeing of ring resources, irqs etc.,
 - device config,
 - device/stack enable (can't fail).

Right now all of those stages are placed in separate functions,
apart from allocation during open.  Fix that.  It will make it
easier for us to allocate resources for netdev-less vNICs.
Because we want to reuse allocation code in netdev-less vNICs
leave the netif_set_real_num_[rt]x_queues() calls inside open.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:37 -04:00
Jakub Kicinski
d00ca2f378 nfp: reorder open and close functions
We will soon reuse parts of .ndo_stop() for clean up after errors
in .ndo_open().  Reorder the associated functions to make that possible.

No functional changes.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 12:51:37 -04:00
Andrew Lunn
2b30842b23 net: fec: Clear and enable MIB counters on imx51
Both the IMX51 and IMX53 datasheet indicates that the MIB counters
should be cleared during setup. Otherwise random numbers are returned
via ethtool -S.  Add a quirk and a function to do this.

Tested on an IMX51.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 10:06:52 -04:00
David S. Miller
216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Colin Ian King
1d3028f4c1 net: stmmac: fix a broken u32 less than zero check
The check that queue is less or equal to zero is always true
because queue is a u32; queue is decremented and will wrap around
and never go -ve. Fix this by making queue an int.

Detected by CoverityScan, CID#1428988 ("Unsigned compared against 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:26:28 -04:00
Niklas Cassel
426849e661 net: stmmac: fix completely hung TX when using TSO
stmmac_tso_allocator can fail to set the Last Descriptor bit
on a descriptor that actually was the last descriptor.

This happens when the buffer of the last descriptor ends
up having a size of exactly TSO_MAX_BUFF_SIZE.

When the IP eventually reaches the next last descriptor,
which actually has the bit set, the DMA will hang.

When the DMA hangs, we get a tx timeout, however,
since stmmac does not do a complete reset of the IP
in stmmac_tx_timeout, we end up in a state with
completely hung TX.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:24:09 -04:00
Max Filippov
d220b942a4 net: ethoc: enable NAPI before poll may be scheduled
ethoc_reset enables device interrupts, ethoc_interrupt may schedule a
NAPI poll before NAPI is enabled in the ethoc_open, which results in
device being unable to send or receive anything until it's closed and
reopened. In case the device is flooded with ingress packets it may be
unable to recover at all.
Move napi_enable above ethoc_reset in the ethoc_open to fix that.

Fixes: a170285772 ("net: Add support for the OpenCores 10/100 Mbps Ethernet MAC.")
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:22:51 -04:00
Eugeniu Rosca
79514ef670 ravb: Fix use-after-free on ifconfig eth0 down
Commit a47b70ea86 ("ravb: unmap descriptors when freeing rings") has
introduced the issue seen in [1] reproduced on H3ULCB board.

Fix this by relocating the RX skb ringbuffer free operation, so that
swiotlb page unmapping can be done first. Freeing of aligned TX buffers
is not relevant to the issue seen in [1]. Still, reposition TX free
calls as well, to have all kfree() operations performed consistently
_after_ dma_unmap_*()/dma_free_*().

[1] Console screenshot with the problem reproduced:

salvator-x login: root
root@salvator-x:~# ifconfig eth0 up
Micrel KSZ9031 Gigabit PHY e6800000.ethernet-ffffffff:00: \
       attached PHY driver [Micrel KSZ9031 Gigabit PHY]   \
       (mii_bus:phy_addr=e6800000.ethernet-ffffffff:00, irq=235)
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
root@salvator-x:~#
root@salvator-x:~# ifconfig eth0 down

==================================================================
BUG: KASAN: use-after-free in swiotlb_tbl_unmap_single+0xc4/0x35c
Write of size 1538 at addr ffff8006d884f780 by task ifconfig/1649

CPU: 0 PID: 1649 Comm: ifconfig Not tainted 4.12.0-rc4-00004-g112eb07287d1 #32
Hardware name: Renesas H3ULCB board based on r8a7795 (DT)
Call trace:
[<ffff20000808f11c>] dump_backtrace+0x0/0x3a4
[<ffff20000808f4d4>] show_stack+0x14/0x1c
[<ffff20000865970c>] dump_stack+0xf8/0x150
[<ffff20000831f8b0>] print_address_description+0x7c/0x330
[<ffff200008320010>] kasan_report+0x2e0/0x2f4
[<ffff20000831eac0>] check_memory_region+0x20/0x14c
[<ffff20000831f054>] memcpy+0x48/0x68
[<ffff20000869ed50>] swiotlb_tbl_unmap_single+0xc4/0x35c
[<ffff20000869fcf4>] unmap_single+0x90/0xa4
[<ffff20000869fd14>] swiotlb_unmap_page+0xc/0x14
[<ffff2000080a2974>] __swiotlb_unmap_page+0xcc/0xe4
[<ffff2000088acdb8>] ravb_ring_free+0x514/0x870
[<ffff2000088b25dc>] ravb_close+0x288/0x36c
[<ffff200008aaf8c4>] __dev_close_many+0x14c/0x174
[<ffff200008aaf9b4>] __dev_close+0xc8/0x144
[<ffff200008ac2100>] __dev_change_flags+0xd8/0x194
[<ffff200008ac221c>] dev_change_flags+0x60/0xb0
[<ffff200008ba2dec>] devinet_ioctl+0x484/0x9d4
[<ffff200008ba7b78>] inet_ioctl+0x190/0x194
[<ffff200008a78c44>] sock_do_ioctl+0x78/0xa8
[<ffff200008a7a128>] sock_ioctl+0x110/0x3c4
[<ffff200008365a70>] vfs_ioctl+0x90/0xa0
[<ffff200008365dbc>] do_vfs_ioctl+0x148/0xc38
[<ffff2000083668f0>] SyS_ioctl+0x44/0x74
[<ffff200008083770>] el0_svc_naked+0x24/0x28

The buggy address belongs to the page:
page:ffff7e001b6213c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
raw: 4000000000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffff7e001b6213e0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8006d884f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>ffff8006d884f780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                   ^
 ffff8006d884f800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff8006d884f880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
==================================================================
Disabling lock debugging due to kernel taint
root@salvator-x:~#

Fixes: a47b70ea86 ("ravb: unmap descriptors when freeing rings")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 16:02:22 -04:00
Ganesh Goudar
8ea4fae926 cxgb4: implement ndo_set_vf_rate()
Implement ndo_set_vf_rate() for mgmt interface to support rate-limiting
of VF traffic using 'ip' command.

Based on the original work of Kumar Sanghvi <kumaras@chelsio.com>

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:17:10 -04:00
Colin Ian King
594238158b net: stmmac: ensure jumbo_frm error return is correctly checked for -ve value
The current comparison of entry < 0 will never be true since entry is an
unsigned integer. Make entry an int to ensure -ve error return values
from the call to jumbo_frm are correctly being caught.

Detected by CoverityScan, CID#1238760 ("Macro compares unsigned to 0")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 15:13:46 -04:00
Jiri Pirko
bd5ddba52d spectrum_flower: Implement gact trap TC action offload
Just use the previously prepared infrastructure and offload the gact
trap action to ACL.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
df7eea963e acl: Introduce ACL trap action
Use trap/discard flex action to implement trap.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
0db7b386f5 mlxsw: spectrum: Introduce ACL trap
Introduce an ACL trap and put it into ip2me trap group.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:24 -04:00
Jiri Pirko
be8408e144 mlxsw: pci: Fix size of trap_id field in CQE
The "trap_id" is 9bits long. So far, this was not a problem since we
used only traps with ids that fit into 8bits. But the ACL traps that are
going to be introduced use the 9th bit.

Fixes: eda6500a98 ("mlxsw: Add PCI bus implementation")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:45:23 -04:00
Colin Ian King
928a759593 net/mlxfw: remove redundant goto on error check
The check to see of err is set and the subsequent goto is extraneous
as the next statement is where the goto is jumping to. Remove this
redundant check and goto.

Detected by CoverityScan, CID#1437734 ("Identical code for
different branches")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 12:21:29 -04:00
Björn Töpel
2aae918c7a i40e/i40evf: proper update of the page_offset field
In f8b45b74cc ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74cc ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 02:49:15 -07:00
Mauro S. M. Rodrigues
9e6c9c0f2c i40e: Fix state flags for bit set and clean operations of PF
Commit 0da36b9774 ("i40e: use DECLARE_BITMAP for state fields")
introduced changes in the way i40e works with state flags converting
them to bitmaps using kernel bitmap API. This change introduced a
regression due to a mistaken substitution using __I40E_VSI_DOWN instead
of __I40E_DOWN when testing state of a PF at i40e_reset_subtask()
function. This caused a flood in the kernel log with the follow message:

[49.013] i40e 0002:01:00.0: bad reset request 0x00000020

Commit d19cb64b92 ("i40e: separate PF and VSI state flags")
also introduced some misuse of the VSI and PF flags, so both could be
considered as the offenders.

This patch simply fixes the flags where it makes sense by changing
__I40E_VSI_DOWN to __I40E_DOWN.

Fixes: 0da36b9774 ("i40e: use DECLARE_BITMAP for state fields")
Fixes: d19cb64b92 ("i40e: separate PF and VSI state flags")

Reviewed-by: "Guilherme G. Piccoli" <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 02:45:32 -07:00
Konstantin Khlebnikov
fd8e597ba4 e1000e: use disable_hardirq() also for MSIX vectors in e1000_netpoll()
Replace disable_irq() which waits for threaded irq handlers with
disable_hardirq() which waits only for hardirq part.

Fixes: 3111912971 ("e1000: use disable_hardirq() for e1000_netpoll()")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:06:06 -07:00
Benjamin Poirier
24ad2a9209 e1000e: Don't return uninitialized stats
Some statistics passed to ethtool are garbage because e1000e_get_stats64()
doesn't write them, for example: tx_heartbeat_errors. This leaks kernel
memory to userspace and confuses users.

Do like ixgbe and use dev_get_stats() which first zeroes out
rtnl_link_stats64.

Fixes: 5944701df9 ("net: remove useless memset's in drivers get_stats64")
Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:05:13 -07:00
Benjamin Poirier
81e3f64a9b igb: Remove useless argument
Given that all callers of igb_update_stats() pass the same two arguments:
(adapter, &adapter->stats64), the second argument can be removed.

Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:04:13 -07:00
Jacob Keller
e5f36ad14c igb: check for Tx timestamp timeouts during watchdog
The igb driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an igb_ptp_tx_hang() function similar to the already existing
igb_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:03:17 -07:00
Jacob Keller
c3b8f85ec2 igb: add statistic indicating number of skipped Tx timestamps
The igb driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:02:05 -07:00
Jacob Keller
cff5714145 e1000e: add statistic indicating number of skipped Tx timestamps
The e1000e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 01:01:27 -07:00
Jacob Keller
74344e32fc igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS
The igb driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.

The state bit lock is not properly cleaned up during
igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.

Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.

Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:53:48 -07:00
Jacob Keller
4ccdc013b0 igb: fix race condition with PTP_TX_IN_PROGRESS bits
Hardware related to the igb driver has a limitation of only handling one
Tx timestamp at a time. Thus, the driver uses a state bit lock to
enforce that only one timestamp request is honored at a time.

Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying the stack of
a new Tx timestamp. Even a well behaved application which sends only one
timestamp request at once and waits for a response might wake up and
send a new packet before the bit lock is cleared. This results in
needlessly dropping some Tx timestamp requests.

We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.

To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.

This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:53:07 -07:00
Jacob Keller
5012863b73 e1000e: fix race condition around skb_tstamp_tx()
The e1000e driver and related hardware has a limitation on Tx PTP
packets which requires we limit to timestamping a single packet at once.
We do this by verifying that we never request a new Tx timestamp while
we still have a tx_hwtstamp_skb pointer.

Unfortunately the driver suffers from a race condition around this. The
tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx()
is called. This function notifies the stack and applications of a new
timestamp. Even a well behaved application that only sends a new request
when the first one is finished might be woken up and possibly send
a packet before we can free the timestamp in the driver again. The
result is that we needlessly ignore some Tx timestamp requests in this
corner case.

Fix this by assigning the tx_hwtstamp_skb pointer prior to calling
skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb
until that function finishes. This ensures that the application is not
woken up until the driver is ready to begin timestamping a new packet.

This ensures that well behaved applications do not accidentally race
with condition to skip Tx timestamps. Obviously an application which
sends multiple Tx timestamp requests at once will still only timestamp
one packet at a time. Unfortunately there is nothing we can do about
this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 00:52:17 -07:00