Commit Graph

4547 Commits

Author SHA1 Message Date
Jack Morgenstein
ffe4cfc3da net/mlx4_core: Fix error handling when initializing CQ bufs in the driver
Procedure mlx4_init_user_cqes() handles returns by copy_to_user
incorrectly. copy_to_user() returns the number of bytes not copied.
Thus, a non-zero return should be treated as a -EFAULT error
(as is done elsewhere in the kernel). However, mlx4_init_user_cqes()
error handling simply returns the number of bytes not copied
(instead of -EFAULT).

Note, though, that this is a harmless bug: procedure mlx4_alloc_cq()
(which is the only caller of mlx4_init_user_cqes()) treats any
non-zero return as an error, but that returned error value is processed
internally, and not passed further up the call stack.

In addition, fixes the following sparse warning:
warning: incorrect type in argument 1 (different address spaces)
   expected void [noderef] <asn:1>*to
   got void *buf

Fixes: e45678973d ("{net, IB}/mlx4: Initialize CQ buffers in the driver when possible")
Reported by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:48:26 -08:00
Aya Levin
a40ded6043 net/mlx4_core: Add masking for a few queries on HCA caps
Driver reads the query HCA capabilities without the corresponding masks.
Without the correct masks, the base addresses of the queues are
unaligned.  In addition some reserved bits were wrongly read.  Using the
correct masks, ensures alignment of the base addresses and allows future
firmware versions safe use of the reserved bits.

Fixes: ab9c17a009 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Fixes: 0ff1fb654b ("{NET, IB}/mlx4: Add device managed flow steering firmware API")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-24 21:48:26 -08:00
David S. Miller
8a7fa0c350 mlx5-fixes-2019-01-18
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcQmwjAAoJEEg/ir3gV/o+aBEIAIE8q3gEonMEPLnRoYSBlgUy
 ZBl7/51yzC8CVdsdO7hx5dyyiI+gYq10Jp6TVpqA2i/V8P871T/u6+xRhP3T7dOc
 Fq2YE70NdkCqFkpyc0QONzMC7ypwVqIEJrX7KnuLi5Ybsm9+tDia8AgC0NX/0H3U
 bRgfzpupfK0TiLgtBe7kqC2WTo+bPzu0cEUu7xjQqUgZZie5QPyzW/6cEtn/V75+
 A3btMNS5w0cfYK4mR4MMuc+UT4rSbsdyIZQrzICo75C2UnVMlojDf90+RuW4JFVw
 tuOMA1LCB5l3lphgEEEerV2dZvs0ZJEuoYoFUkfgu2QEFSOMhod71t8/rurih0s=
 =f23b
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-01-18

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v4.18
('net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames')

The patch doesn't apply cleanly to 4.18.y, but it is very simple to
resolve, what should be the procedure here ?
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 18:23:23 -08:00
Eli Britstein
25f2d0e779 net/mlx5e: Fix cb_ident duplicate in indirect block register
Previously the identifier used for indirect block callback registry
and for block rule cb registry (when done via indirect blocks) was the
pointer to the tunnel netdev we were interested in receiving updates on.
This worked fine if a single PF existed that registered one callback for
the tunnel netdev of interest. However, if multiple PFs are in place then
the 2nd PF tries to register with the same tunnel netdev identifier. This
leads to EEXIST errors and/or incorrect cb deletions.

Prevent this conflict by using the rpriv pointer as the identifier for
netdev indirect block cb registry, allowing each PF to register a unique
callback per tunnel netdev. For block cb registry, the same PF may
register multiple cbs to the same block if using TC shared blocks.
Instead of the rpriv, use the pointer to the allocated indr_priv data as
the identifier here. This means that there can be a unique block callback
for each PF/tunnel netdev combo.

Fixes: f5bc2c5de1 ("net/mlx5e: Support TC indirect block notifications
for eswitch uplink reprs")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18 16:15:31 -08:00
Tariq Toukan
7fdc1adc52 net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
For representors, the TX dropped counter is not folded from the
per-ring counters. Fix it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18 16:15:31 -08:00
Shay Agroskin
2eb1e42551 net/mlx5e: Fix wrong error code return on FEC query failure
Advertised and configured FEC query failure resulted in printing
wrong error code.

Fixes: 6cfa946050 ("net/mlx5e: Ethtool driver callback for query/set FEC policy")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18 16:15:30 -08:00
Cong Wang
e8c8b53cca net/mlx5e: Force CHECKSUM_UNNECESSARY for short ethernet frames
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, we have a switch which pads non-zero octets, this
causes kernel hardware checksum fault repeatedly.

Prior to:
commit '88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE ...")'
skb checksum was forced to be CHECKSUM_NONE when padding is detected.
After it, we need to keep skb->csum updated, like what we do for RXFCS.
However, fixing up CHECKSUM_COMPLETE requires to verify and parse IP
headers, it is not worthy the effort as the packets are so small that
CHECKSUM_COMPLETE can't save anything.

Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"),
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-18 16:15:30 -08:00
Ido Schimmel
64254a2054 mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
The driver currently treats static FDB entries as both static and
sticky. This is incorrect and prevents such entries from being roamed to
a different port via learning.

Fix this by configuring static entries with ageing disabled and roaming
enabled.

In net-next we can add proper support for the newly introduced 'sticky'
flag.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 15:12:16 -08:00
Nir Dotan
a11dcd6497 mlxsw: spectrum_fid: Update dummy FID index
When using a tc flower action of egress mirred redirect, the driver adds
an implicit FID setting action. This implicit action sets a dummy FID to
the packet and is used as part of a design for trapping unmatched flows
in OVS.  While this implicit FID setting action is supposed to be a NOP
when a redirect action is added, in Spectrum-2 the FID record is
consulted as the dummy FID index is an 802.1D FID index and the packet
is dropped instead of being redirected.

Set the dummy FID index value to be within 802.1Q range. This satisfies
both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
as an 802.1Q FID and will then follow the redirect action.

Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 15:12:16 -08:00
Nir Dotan
67c14cc9b3 mlxsw: pci: Return error on PCI reset timeout
Return an appropriate error in the case when the driver timeouts on waiting
for firmware to go out of PCI reset.

Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 15:12:16 -08:00
Nir Dotan
d2f372ba09 mlxsw: pci: Increase PCI SW reset timeout
Spectrum-2 PHY layer introduces a calibration period which is a part of the
Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
the firmware to come out of boot. This does not increase system boot time
in cases where the firmware PHY calibration process is done quickly.

Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 15:12:16 -08:00
Ido Schimmel
c9ebea04cb mlxsw: pci: Ring CQ's doorbell before RDQ's
When a packet should be trapped to the CPU the device consumes a WQE
(work queue element) from an RDQ (receive descriptor queue) and copies
the packet to the address specified in the WQE. The device then tries to
post a CQE (completion queue element) that contains various metadata
(e.g., ingress port) about the packet to a CQ (completion queue).

In case the device managed to consume a WQE, but did not manage to post
the corresponding CQE, it will get stuck. This unlikely situation can be
triggered due to the scheme the driver is currently using to process
CQEs.

The driver will consume up to 512 CQEs at a time and after processing
each corresponding WQE it will ring the RDQ's doorbell, letting the
device know that a new WQE was posted for it to consume. Only after
processing all the CQEs (up to 512), the driver will ring the CQ's
doorbell, letting the device know that new ones can be posted.

Fix this by having the driver ring the CQ's doorbell for every processed
CQE, but before ringing the RDQ's doorbell. This guarantees that
whenever we post a new WQE, there is a corresponding CQE available. Copy
the currently processed CQE to prevent the device from overwriting it
with a new CQE after ringing the doorbell.

Note that the driver still arms the CQ only after processing all the
pending CQEs, so that interrupts for this CQ will only be delivered
after the driver finished its processing.

Before commit 8404f6f2e8 ("mlxsw: pci: Allow to use CQEs of version 1
and version 2") the issue was virtually impossible to trigger since the
number of CQEs was twice the number of WQEs and the number of CQEs
processed at a time was equal to the number of available WQEs.

Fixes: 8404f6f2e8 ("mlxsw: pci: Allow to use CQEs of version 1 and version 2")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Semion Lisyansky <semionl@mellanox.com>
Tested-by: Semion Lisyansky <semionl@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-18 15:12:16 -08:00
Linus Torvalds
e8746440bf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix regression in multi-SKB responses to RTM_GETADDR, from Arthur
    Gautier.

 2) Fix ipv6 frag parsing in openvswitch, from Yi-Hung Wei.

 3) Unbounded recursion in ipv4 and ipv6 GUE tunnels, from Stefano
    Brivio.

 4) Use after free in hns driver, from Yonglong Liu.

 5) icmp6_send() needs to handle the case of NULL skb, from Eric
    Dumazet.

 6) Missing rcu read lock in __inet6_bind() when operating on mapped
    addresses, from David Ahern.

 7) Memory leak in tipc-nl_compat_publ_dump(), from Gustavo A. R. Silva.

 8) Fix PHY vs r8169 module loading ordering issues, from Heiner
    Kallweit.

 9) Fix bridge vlan memory leak, from Ido Schimmel.

10) Dev refcount leak in AF_PACKET, from Jason Gunthorpe.

11) Infoleak in ipv6_local_error(), flow label isn't completely
    initialized. From Eric Dumazet.

12) Handle mv88e6390 errata, from Andrew Lunn.

13) Making vhost/vsock CID hashing consistent, from Zha Bin.

14) Fix lack of UMH cleanup when it unexpectedly exits, from Taehee Yoo.

15) Bridge forwarding must clear skb->tstamp, from Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  bnxt_en: Fix context memory allocation.
  bnxt_en: Fix ring checking logic on 57500 chips.
  mISDN: hfcsusb: Use struct_size() in kzalloc()
  net: clear skb->tstamp in bridge forwarding path
  net: bpfilter: disallow to remove bpfilter module while being used
  net: bpfilter: restart bpfilter_umh when error occurred
  net: bpfilter: use cleanup callback to release umh_info
  umh: add exit routine for UMH process
  isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
  vhost/vsock: fix vhost vsock cid hashing inconsistent
  net: stmmac: Prevent RX starvation in stmmac_napi_poll()
  net: stmmac: Fix the logic of checking if RX Watchdog must be enabled
  net: stmmac: Check if CBS is supported before configuring
  net: stmmac: dwxgmac2: Only clear interrupts that are active
  net: stmmac: Fix PCI module removal leak
  tools/bpf: fix bpftool map dump with bitfields
  tools/bpf: test btf bitfield with >=256 struct member offset
  bpf: fix bpffs bitfield pretty print
  net: ethernet: mediatek: fix warning in phy_start_aneg
  tcp: change txhash on SYN-data timeout
  ...
2019-01-16 05:13:36 +12:00
Ido Schimmel
674bed5df4 mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion
When a VLAN is deleted from a bridge port we should not change the PVID
unless the deleted VLAN is the PVID.

Fixes: fe9ccc785d ("mlxsw: spectrum_switchdev: Don't batch VLAN operations")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:54 -05:00
Ido Schimmel
412283eedc mlxsw: spectrum_nve: Replace error code with EINVAL
Adding a VLAN on a port can trigger the offload of a VXLAN tunnel which
is already a member in the VLAN. In case the configuration of the VXLAN
is not supported, the driver would return -EOPNOTSUPP.

This is problematic since bridge code does not interpret this as error,
but rather that it should try to setup the VLAN using the 8021q driver
instead of switchdev.

Fixes: d70e42b22d ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:54 -05:00
Ido Schimmel
457e20d659 mlxsw: spectrum_switchdev: Avoid returning errors in commit phase
Drivers are not supposed to return errors in switchdev commit phase if
they returned OK in prepare phase. Otherwise, a WARNING is emitted.
However, when the offloading of a VXLAN tunnel is triggered by the
addition of a VLAN on a local port, it is not possible to guarantee that
the commit phase will succeed without doing a lot of work.

In these cases, the artificial division between prepare and commit phase
does not make sense, so simply do the work in the prepare phase.

Fixes: d70e42b22d ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:54 -05:00
Ido Schimmel
143a8e038a mlxsw: spectrum: Add VXLAN dependency for spectrum
When VXLAN is a loadable module, MLXSW_SPECTRUM must not be built-in:

drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:2547: undefined
reference to `vxlan_fdb_find_uc'

Add Kconfig dependency to enforce usable configurations.

Fixes: 1231e04f5b ("mlxsw: spectrum_switchdev: Add support for VxLAN encapsulation")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:54 -05:00
Jiri Pirko
8adbe212a1 mlxsw: spectrum: Disable lag port TX before removing it
Make sure that lag port TX is disabled before mlxsw_sp_port_lag_leave()
is called and prevent from possible EMAD error.

Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:54 -05:00
Nir Dotan
04d075b7aa mlxsw: spectrum_acl: Remove ASSERT_RTNL()s in module removal flow
Removal of the mlxsw driver on Spectrum-2 platforms hits an ASSERT_RTNL()
in Spectrum-2 ACL Bloom filter and in ERP removal paths. This happens
because the multicast router implementation in Spectrum-2 relies on ACLs.
Taking the RTNL lock upon driver removal is useless since the driver first
removes its ports and unregisters from notifiers so concurrent writes
cannot happen at that time. The assertions were originally put as a
reminder for future work involving ERP background optimization, but having
these assertions only during addition serves this purpose as well.

Therefore remove the ASSERT_RTNL() in both places related to ERP and Bloom
filter removal.

Fixes: cf7221a4f5 ("mlxsw: spectrum_router: Add Multicast routing support for Spectrum-2")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:53 -05:00
Nir Dotan
ff0db43cd6 mlxsw: spectrum_acl: Add cleanup after C-TCAM update error condition
When writing to C-TCAM, mlxsw driver uses cregion->ops->entry_insert().
In case of C-TCAM HW insertion error, the opposite action should take
place.
Add error handling case in which the C-TCAM region entry is removed, by
calling cregion->ops->entry_remove().

Fixes: a0a777b940 ("mlxsw: spectrum_acl: Start using A-TCAM")
Signed-off-by: Nir Dotan <nird@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-08 16:53:53 -05:00
Luis Chamberlain
750afb08ca cross-tree: phase out dma_zalloc_coherent()
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.

This change was generated with the following Coccinelle SmPL patch:

@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@

-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-01-08 07:58:37 -05:00
Stephen Warren
01cd364a15 net/mlx4: replace pci_{,un}map_sg with dma_{,un}map_sg
pci_{,un}map_sg are deprecated and replaced by dma_{,un}map_sg. This is
especially relevant since the rest of the driver uses the DMA API. Fix
the driver to use the replacement APIs.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07 05:14:17 -08:00
Stephen Warren
f65e192af3 net/mlx4: Get rid of page operation after dma_alloc_coherent
This patch solves a crash at the time of mlx4 driver unload or system
shutdown. The crash occurs because dma_alloc_coherent() returns one
value in mlx4_alloc_icm_coherent(), but a different value is passed to
dma_free_coherent() in mlx4_free_icm_coherent(). In turn this is because
when allocated, that pointer is passed to sg_set_buf() to record it,
then when freed it is re-calculated by calling
lowmem_page_address(sg_page()) which returns a different value. Solve
this by recording the value that dma_alloc_coherent() returns, and
passing this to dma_free_coherent().

This patch is roughly equivalent to commit 378efe798e ("RDMA/hns: Get
rid of page operation after dma_alloc_coherent").

Based-on-code-from: Christoph Hellwig <hch@lst.de>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-07 05:14:17 -08:00
Linus Torvalds
5d24ae67a9 4.21 merge window pull request
This has been a fairly typical cycle, with the usual sorts of driver
 updates. Several series continue to come through which improve and
 modernize various parts of the core code, and we finally are starting to
 get the uAPI command interface cleaned up.
 
 - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5,
   qib, rxe, usnic
 
 - Rework the entire syscall flow for uverbs to be able to run over
   ioctl(). Finally getting past the historic bad choice to use write()
   for command execution
 
 - More functional coverage with the mlx5 'devx' user API
 
 - Start of the HFI1 series for 'TID RDMA'
 
 - SRQ support in the hns driver
 
 - Support for new IBTA defined 2x lane widths
 
 - A big series to consolidate all the driver function pointers into
   a big struct and have drivers provide a 'static const' version of the
   struct instead of open coding initialization
 
 - New 'advise_mr' uAPI to control device caching/loading of page tables
 
 - Support for inline data in SRPT
 
 - Modernize how umad uses the driver core and creates cdev's and sysfs
   files
 
 - First steps toward removing 'uobject' from the view of the drivers
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlwhV2oACgkQOG33FX4g
 mxpF8A/9EkRCg6wCDC59maA53b5PjuNmD//9hXbycQPQSlxntI2PyYtxrzBqc0+2
 yIaFFMehL41XNN6y1zfkl7ndl62McCH2TpiidU8RyTxVw/e3KsDD5sU6++atfHRo
 M82RNfedDtxPG8TcCPKVLof6JHADApGSR1r4dCYfAnu7KFMyvlLmeYyx4r/2E6yC
 iQPmtKVOdbGkuWGeX+brGEA0vg7FUOAvaysnxddjyh9hyem4h0SUR3Af/Ik0N5ME
 PYzC+hMKbkPVBLoCWyg7QwUaqK37uWwguMQLtI2byF7FgbiK/lBQt6TsidR4Fw3p
 EalL7uqxgCTtLYh918vxLFjdYt6laka9j7xKCX8M8d06sy/Lo8iV4hWjiTESfMFG
 usqs7D6p09gA/y1KISji81j6BI7C92CPVK2drKIEnfyLgY5dBNFcv9m2H12lUCH2
 NGbfCNVaTQVX6bFWPpy2Bt2y/Litsfxw5RviehD7jlG0lQjsXGDkZzsDxrMSSlNU
 S79iiTJyK4kUZkXzrSSlN58pLBlbupJwm5MDjKmM+irsrsCHjGIULvc902qtnC3/
 8ImiTtW6XvqLbgWXyy2Th8/ZgRY234p1ybhog+DFaGKUch0XqB7VXTV2OZm0GjcN
 Fp4PUeBt+/gBgYqjpuffqQc1rI4uwXYSoz7wq9RBiOpw5zBFT1E=
 =T0p1
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a fairly typical cycle, with the usual sorts of driver
  updates. Several series continue to come through which improve and
  modernize various parts of the core code, and we finally are starting
  to get the uAPI command interface cleaned up.

   - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4,
     mlx5, qib, rxe, usnic

   - Rework the entire syscall flow for uverbs to be able to run over
     ioctl(). Finally getting past the historic bad choice to use
     write() for command execution

   - More functional coverage with the mlx5 'devx' user API

   - Start of the HFI1 series for 'TID RDMA'

   - SRQ support in the hns driver

   - Support for new IBTA defined 2x lane widths

   - A big series to consolidate all the driver function pointers into a
     big struct and have drivers provide a 'static const' version of the
     struct instead of open coding initialization

   - New 'advise_mr' uAPI to control device caching/loading of page
     tables

   - Support for inline data in SRPT

   - Modernize how umad uses the driver core and creates cdev's and
     sysfs files

   - First steps toward removing 'uobject' from the view of the drivers"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits)
  RDMA/srpt: Use kmem_cache_free() instead of kfree()
  RDMA/mlx5: Signedness bug in UVERBS_HANDLER()
  IB/uverbs: Signedness bug in UVERBS_HANDLER()
  IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported
  IB/umad: Start using dev_groups of class
  IB/umad: Use class_groups and let core create class file
  IB/umad: Refactor code to use cdev_device_add()
  IB/umad: Avoid destroying device while it is accessed
  IB/umad: Simplify and avoid dynamic allocation of class
  IB/mlx5: Fix wrong error unwind
  IB/mlx4: Remove set but not used variable 'pd'
  RDMA/iwcm: Don't copy past the end of dev_name() string
  IB/mlx5: Fix long EEH recover time with NVMe offloads
  IB/mlx5: Simplify netdev unbinding
  IB/core: Move query port to ioctl
  RDMA/nldev: Expose port_cap_flags2
  IB/core: uverbs copy to struct or zero helper
  IB/rxe: Reuse code which sets port state
  IB/rxe: Make counters thread safe
  IB/mlx5: Use the correct commands for UMEM and UCTX allocation
  ...
2018-12-28 14:57:10 -08:00
Julia Lawall
61988bd281 net/mlx4_core: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares has never
been used.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: c82e9aa0a8 ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24 14:22:14 -08:00
Julia Lawall
d0863792f8 mlxsw: spectrum: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares is never used.

The uses were removed in 244cd96adb ("net_sched: remove
list_head from tc_action"), but not the declaration.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 244cd96adb ("net_sched: remove list_head from tc_action")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24 14:22:14 -08:00
Julia Lawall
2534f14a94 net/mlx5e: drop useless LIST_HEAD
Drop LIST_HEAD where the variable it declares is never used.

These became useless in 244cd96adb ("net_sched: remove list_head
from tc_action")

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 244cd96adb ("net_sched: remove list_head from tc_action")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24 14:22:14 -08:00
kbuild test robot
5d1f7354fa net/mlx5e: fix semicolon.cocci warnings
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1339:57-58: Unneeded semicolon

 Remove unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

Fixes: 4c8fb2986d ("net/mlx5e: Increase VF representors' SQ size to 128")
CC: Gavi Teitz <gavi@mellanox.com>
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24 14:15:43 -08:00
Tariq Toukan
6277053afa net/mlx5e: XDP, Add user control for XDP TX MPWQE feature
Add ethtool private flag 'xdp_tx_mpwqe' to control the feature
from userspace.
Feature is set ON by default, if supported.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:20 -08:00
Tariq Toukan
5e0d2eef77 net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQE
Add support for the HW feature of multi-packet WQE in XDP
xmit flow.

The conventional TX descriptor (WQE, Work Queue Element) serves
a single packet. Our HW has support for multi-packet WQE (MPWQE)
in which a single descriptor serves multiple TX packets.

This reduces both the PCI overhead and the CPU cycles wasted on
writing them.

In this patch we add support for the HW feature, which is supported
starting from ConnectX-5.

Performance:
Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

XDP_TX:
We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
milestone.
* Single-port HCA:
	Before:   70 Mpps
	After:   100 Mpps (+42.8%)

* Dual-port HCA:
	Before: 51.7 Mpps
	After:  57.3 Mpps (+10.8%)

* In both cases we tested traffic on one port and for now On Dual-port HCAs
  we see only small gain, we are working to overcome this bottleneck, but
  for the moment only with experimental firmware on dual port HCAs we can
  reach the wanted numbers as seen on Single-port HCAs.

XDP_REDIRECT:
Redirect from (A) ConnectX-5 to (B) ConnectX-5.
Due to a setup limitation, (A) and (B) are on different NUMA nodes,
so absolute performance numbers are not optimal.
Note:
  Below is the transmit rate of (B), not the redirect rate of (A)
  which is in some cases higher.

* (B) is single-port:
	Before:   77 Mpps
	After:    90 Mpps (+16.8%)

* (B) is dual-port:
	Before:  61 Mpps
	After:   72 Mpps (+18%)

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:19 -08:00
Tariq Toukan
1feeab8007 net/mlx5e: XDP, Add array for WQE info descriptors
Each xdp_wqe_info instance describes the number of data-segments
and WQEBBs of the WQE.
This is useful for a downstream patch that adds support for
Multi-Packet TX WQE feature.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:19 -08:00
Tariq Toukan
fea28dd6a2 net/mlx5e: XDP, Maintain a FIFO structure for xdp_info instances
This provides infrastructure to have multiple xdp_info instances
for the same consumer index.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:19 -08:00
Tariq Toukan
b8180392ed net/mlx5e: XDP, Replace boolean doorbell indication with segment pointer
Instead of calculating the control segment to be used upon an
XDP xmit doorbell, save it in SQ structure.
Nullify when no pending doorbell.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:18 -08:00
Tariq Toukan
db02a308cd net/mlx5e: XDP, Warn upon polling an error CQE
Do not ignore the CQE opcode.
This helps expose issues and debug them.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:18 -08:00
Tariq Toukan
feb2ff9d74 net/mlx5e: XDP, Change the XDP SQ redirect indication
Do not maintain an SQ state bit to indicate whether an
XDP SQ serves redirect operations.

Instead, rely on the fact that such an XDP SQ doesn't reside
in an RQ instance, while the others do.
This info is not known to the XDP SQ functions themselves,
and they rely on their callers to distinguish between the cases.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:18 -08:00
Tariq Toukan
4fb2f51618 net/mlx5e: XDP, Precede XDP-related operations in RQ poll by a loaded program check
At the end of the RQ polling loop, some XDP-related operations
might be required. Before checking them one by one, check if
an XDP program is even loaded.
Combine all the checks and operations in a single function in xdp files.

This saves unnecessary checks for non-XDP flows.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:17 -08:00
Tariq Toukan
e05b8d4fc3 net/mlx5e: TX, Print opcode in error CQE warning
The opcode indicates about the error reason.
Printing it helps in debug.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 22:54:17 -08:00
David S. Miller
e716431356 mlx5-updates-2018-12-19
This series adds some misc updates and the support for tunnels over VLAN
 tc offloads.
 
 From Miroslav Lichvar, patches #1,2
 1) Update timecounter at least twice per counter overflow
 2) Extend PTP gettime function to read system clock
 
 From Gavi Teitz, patch #3
 3) Increase VF representors' SQ size to 128
 
 From Eli Britstein and Or Gerlitz, patches #4-10
 4) Adds the capability to support tunnels over VLAN device.
 
 Patch 4 avoids crash for TC flow with egress upper devices
 
 Patch 5 refactors tunnel routing devs into a helper function
 
 Patch 6 avoids crash for TC encap flows with vlan on underlay
 
 Patches 7-8 refactor encap tunnel header preparing code.
 
 Patch 9 adds support for building VLAN tagged ETH header.
 
 Patch 10 adds support for tunnel routing to VLAN device.
 
 From Aviv, patches 11,12 to fix earlier VF lag series
 5) Fix query_nic_sys_image_guid() error during init
 6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcG5O8AAoJEEg/ir3gV/o+v0AH/ja1bJ6PkAlw2bCRMIuI6HK/
 1vh9n8D74tIOAlvUi6QiLOgJ7CLdAgiAVFppDWzJSBUsY2XNAtRdYvLDDt9hGafO
 QGysBWNVcX2aUVp+pLDCCVEYBWyyIzW416CWHx2IUgdAg9S6cvJK6P/81wd+l4Zp
 8YlPstEUANP/JKZxHHjSelMcnY+Bj0JrDzuyyyaQdwmcHo5I7Ht0tNex14yFfFbl
 gd7YvfPr1PtovPX2w9hMt3y3ml4mommB+jtxc0+59D+A7680hBOpAnH5pONms9rv
 OKKKV8KqpxL1m32/TGbd7XSG+llLJwsSM6RtD4oUdiPA4iCiVDVdreP5vac52C4=
 =0QQ5
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-12-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-12-19

This series adds some misc updates and the support for tunnels over VLAN
tc offloads.

From Miroslav Lichvar, patches #1,2
1) Update timecounter at least twice per counter overflow
2) Extend PTP gettime function to read system clock

From Gavi Teitz, patch #3
3) Increase VF representors' SQ size to 128

From Eli Britstein and Or Gerlitz, patches #4-10
4) Adds the capability to support tunnels over VLAN device.

Patch 4 avoids crash for TC flow with egress upper devices

Patch 5 refactors tunnel routing devs into a helper function

Patch 6 avoids crash for TC encap flows with vlan on underlay

Patches 7-8 refactor encap tunnel header preparing code.

Patch 9 adds support for building VLAN tagged ETH header.

Patch 10 adds support for tunnel routing to VLAN device.

From Aviv, patches 11,12 to fix earlier VF lag series
5) Fix query_nic_sys_image_guid() error during init
6) Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:51:55 -08:00
Ido Schimmel
d8a1f7ab2c mlxsw: spectrum: Remove limitation regarding VID 1
VID 1 is not reserved anymore, so remove the check that prevented the
creation of VLAN devices with this VID over mlxsw ports.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
0417d25e7d mlxsw: spectrum: Switch to VID 4095 as default VID
There is no need to abuse VID 1 anymore and we can instead use VID 4095
as the default VLAN, which will be configured on the port throughout its
lifetime.

The OVS join / leave functions are changed to enable VIDs 1-4094
(inclusive) instead of 2-4095. This because VID 4095 is now the default
VLAN instead of 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
16f6aceb72 mlxsw: spectrum: Add an helper function to cleanup VLAN entries
VLAN entries on a port can be associated with either a bridge VLAN or a
router port. Before the VLAN entry is destroyed these associations need
to be cleaned up.

Currently, this is always invoked from the function which destroys the
VLAN entry, but next patch is going to skip the destruction of the
default entry when a port in unlinked from a LAG.

The above does not mean that the associations should not be cleaned up,
so add a helper that will be invoked from both call sites.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
346fca3b58 mlxsw: spectrum: Store pointer to default port VLAN in port struct
Subsequent patches will need to access the default port VLAN. Since this
VLAN will exist throughout the lifetime of the port, simply store it in
the port's struct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
ab6c3b79ec mlxsw: spectrum: Allow controlling destruction of default port VLAN
The function allows flushing all the existing VLAN entries on a port. It
is invoked when a port is destroyed and when it is unlinked from a LAG.
In the latter case, when moving to the new default VLAN, there will not
be a need to destroy the default VLAN entry.

Therefore, add an argument that allows to control whether the default
port VLAN should be destroyed or not. Currently it is always set to
'true'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
262e1ff91c mlxsw: spectrum: Set PVID during port initialization
Currently, the driver does not set the port's PVID when initializing a
new port. This is because the driver is using VID 1 as PVID which is the
firmware default.

Subsequent patches are going to change the PVID the driver is setting
when initializing a new port.

Prepare for that by explicitly setting the port's PVID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
a2d2a20553 mlxsw: spectrum: Replace hard-coded default VID with a define
Subsequent patches are going to replace the current default VID (1) with
VLAN_N_VID - 1 (4095).

Prepare for this conversion by replacing the hard-coded '1' with a
define.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Ido Schimmel
f40be47a3e mlxsw: spectrum_router: Do not force specific configuration order
In symmetric routing, the only two members in the VLAN corresponding to
the L3 VNI are the router port and the VXLAN tunnel.

In case the VXLAN device is already enslaved to the bridge and only
later the VLAN interface is configured, the tunnel will not be
offloaded.

The reason for this is that when the router interface (RIF)
corresponding to the VLAN interface is configured, it calls the core
fid_get() API which does not check if NVE should be enabled on the FID.

Instead, call into the bridge code which will check if NVE should be
enabled on the FID.

This effectively means that the same code path is used to retrieve a FID
when either a local port or a router port joins the FID.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 15:48:54 -08:00
Jason Gunthorpe
ed50edfb72 Merge branch 'mlx5-next' into rdma.git
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5 updates taken for dependencies on following patches.

* branche 'mlx5-next': (23 commits)
  IB/mlx5: Introduce uid as part of alloc/dealloc transport domain
  net/mlx5: Add shared Q counter bits
  net/mlx5: Continue driver initialization despite debugfs failure
  net/mlx5: Fold the modify lag code into function
  net/mlx5: Add lag affinity info to log
  net/mlx5: Split the activate lag function into two routines
  net/mlx5: E-Switch, Introduce flow counter affinity
  IB/mlx5: Unify e-switch representors load approach between uplink and VFs
  net/mlx5: Use lowercase 'X' for hex values
  net/mlx5: Remove duplicated include from eswitch.c
  net/mlx5: Remove the get protocol device interface entry
  net/mlx5: Support extended destination format in flow steering command
  net/mlx5: E-Switch, Change vhca id valid bool field to bit flag
  net/mlx5: Introduce extended destination fields
  net/mlx5: Revise gre and nvgre key formats
  net/mlx5: Add monitor commands layout and event data
  net/mlx5: Add support for plugged-disabled cable status in PME
  net/mlx5: Add support for PCIe power slot exceeded error in PME
  net/mlx5: Rework handling of port module events
  net/mlx5: Move flow counters data structures from flow steering header
  ...
2018-12-20 13:24:50 -07:00
David S. Miller
2be09de7d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.

Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-20 11:53:36 -08:00
Aviv Heller
a64917446e net/mlx5: Fix LAG requirement when CONFIG_MLX5_ESWITCH is off
If CONFIG_MLX5_ESWITCH is not defined, test for SR-IOV being disabled,
instead of calling e-switch LAG prereq routine.

Since LAG with SRIOV is allowed only when switchdev mode is on.

Fixes: eff849b2c6 ("net/mlx5: Allow/disallow LAG according to pre-req only")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00
Aviv Heller
0a5b589111 net/mlx5: Fix query_nic_sys_image_guid() error during init
vport system image guid should be queried using vport nic API for
Ethernet ports, and vport hca API for Infiniband ports.

Fixes: fadd59fc50 ("net/mlx5: Introduce inter-device communication mechanism")
Signed-off-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-12-20 05:06:03 -08:00