Commit Graph

2655 Commits

Author SHA1 Message Date
Ido Schimmel
45bfe6b433 mlxsw: spectrum_switchdev: Don't batch STP operations
Simplify the code by using the common function that sets an STP state
for a Port-VLAN and remove the existing one that tries to batch it for
several VLANs.

This will help us in a follow-up patchset to introduce a unified
infrastructure for bridge ports, regardless if the bridge is VLAN-aware
or not.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
fe9ccc785d mlxsw: spectrum_switchdev: Don't batch VLAN operations
switchdev's VLAN object has the ability to describe a range of VLAN IDs,
but this is only used when VLAN operations are done using the SELF flag,
which is something we would like to remove as it allows one to bypass
the bridge driver.

Do VLAN operations on a per-VLAN basis, thereby simplifying the code and
preparing it for refactoring in a follow-up patchset.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
d341e2ce6b mlxsw: spectrum_switchdev: Remove redundant check
Since commit 97c242902c ("switchdev: Execute bridge ndos only for
bridge ports") switchdev code checks that port is bridged, so no need to
perform the same check in the driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
348b8fc3cf mlxsw: spectrum_router: Initialize RIFs in a separate function
The router interfaces (RIFs) array is currently initialized together
with the general router configuration. However, in a follow-up patchset
we're going to introduce a common RIF core that will require us to
initialize more RIF constructs, so move the RIF initialization to its
own function.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
7e39d1153d mlxsw: spectrum_router: Move FIB notification block to router struct
The FIB notification block logically belongs inside the router specific
struct, so move it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
5f9efffbdb mlxsw: spectrum_router: Move RIFs array to its rightful place
The router interfaces (RIFs) array is of no interest to code outside the
routing realm, so declare it inside the router specific struct instead
of the chip-wide one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
5f6935c6a4 mlxsw: spectrum_switchdev: Reduce scope of bridge struct
Some attributes in the global chip struct are only relevant for bridge
operation, so encapsulate them in their own struct that isn't exposed to
non-bridge code.

This will also help us later, when we add more bridge-specific
attributes.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
9011b677e7 mlxsw: spectrum_router: Reduce scope of router struct
In a similar fashion to previous patch, the router structure
('mlxsw_sp_router') doesn't need to be accessible to anyone, but the
router code located at spectrum_router.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Ido Schimmel
33cbd87cc0 mlxsw: spectrum_buffer: Reduce scope of shared buffer struct
The shared buffer structure ('mlxsw_sp_sb') doesn't need to be
accessible to anyone, but the shared buffer code located at
spectrum_buffers.c

Make this apparent and reduce its scope by defining it there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-17 14:06:54 -04:00
Arnd Bergmann
2432a3fb5c mlx5e: add CONFIG_INET dependency
We now reference the arp_tbl, which requires IPv4 support to be
enabled in the kernel, otherwise we get a link error:

drivers/net/built-in.o: In function `mlx5e_tc_update_neigh_used_value':
(.text+0x16afec): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_neigh_init':
en_rep.c:(.text+0x16c16d): undefined reference to `arp_tbl'
drivers/net/built-in.o: In function `mlx5e_rep_netevent_event':
en_rep.c:(.text+0x16cbb5): undefined reference to `arp_tbl'

This adds a Kconfig dependency for it.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-16 15:48:01 -04:00
David S. Miller
42a928ced3 mlx5-fixes-2017-05-12
Misc fixes for mlx5 driver
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZGDM8AAoJEEg/ir3gV/o+svMH/1lAl+FIGCWgZ82/UbFHCZCW
 SIXEnP2id+Nic7JxQSa/RVQ43I75wkM8pN929hFoyz8p/pPLhZkpo7vX6yLWG2SL
 fTx/4Qn5jR/eow/D+fdrlyKMPg0A7fijY+ZnvDPsQkjtakIedCgc/A1xDufpKi7+
 I7nOJa4yACuZK0gzy32VGgpJw02q32eRTJjKHRiEYdmNQSIJpmRbG2m4e0z/me2s
 hrMt358/llPOZNwkAPD2SHZxH68oSq5EbSRmz5jDwXfTVFkjWNQVowwm3pCFjsQn
 3xDtkjakVPVBUagR3hngtLPcsDpUR4GU3tHr4l/UPp0wFBXVKgdupC7B+mCkSp4=
 =ubJG
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-05-12-V2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-05-12

This series contains some mlx5 fixes for net.
Please pull and let me know if there's any problem.

For -stable:
("net/mlx5e: Fix ethtool pause support and advertise reporting") kernels >= 4.8
("net/mlx5e: Use the correct pause values for ethtool advertising") kernels >= 4.8

v1->v2:
 Dropped statistics spinlock patch, it needs some extra work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:38:04 -04:00
yuval.shaia@oracle.com
4762010f09 net/mlx4_core: Use min3 to select number of MSI-X vectors
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-15 14:20:26 -04:00
Tariq Toukan
7913d20596 net/mlx5: Bump driver version
Remove date and bump version for mlx5_core driver.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:24:18 +03:00
Ilan Tayari
e29341fb3a net/mlx5: FPGA, Add basic support for Innova
Mellanox Innova is a NIC with ConnectX and an FPGA on the same
board. The FPGA is a bump-on-the-wire and thus affects operation of
the mlx5_core driver on the ConnectX ASIC.

Add basic support for Innova in mlx5_core.

This allows using the Innova card as a regular NIC, by detecting
the FPGA capability bit, and verifying its load state before
initializing ConnectX interfaces.

Also detect FPGA fatal runtime failures and enter error state if
they ever happen.

All new FPGA-related logic is placed in its own subdirectory 'fpga',
which may be built by selecting CONFIG_MLX5_FPGA.
This prepares for further support of various Innova features in later
patchsets.
Additional details about hardware architecture will be provided as
more features get submitted.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:24:17 +03:00
Ilan Tayari
0179720d6b net/mlx5: Introduce trigger_health_work function
Introduce new function for entering bad-health state.

This function will be called from FPGA-related logic in a later patch from
asynchronous event (IRQ) context, for that we change the spin lock to an
IRQ-safe one.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 14:22:10 +03:00
Noa Osherovich
2e9d3e83ab net/mlx5: Update the list of the PCI supported devices
Add the BlueField device and VF IDs to the supported devices list.

Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:53:26 +03:00
Leon Romanovsky
1b9a07ee25 {net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc
Commit a7c3e901a4 ("mm: introduce kv[mz]alloc helpers") added
proper implementation of mlx5_vzalloc function to the MM core.

This made the mlx5_vzalloc function useless, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:53:18 +03:00
Yishai Hadas
508541146a net/mlx5: Use underlay QPN from the root name space
Root flow table is dynamically changed by the underlying flow steering
layer, and IPoIB/ULPs have no idea what will be the root flow table in
the future, hence we need a dynamic infrastructure to move Underlay QPs
with the root flow table.

Fixes: b3ba51498b ("net/mlx5: Refactor create flow table method to accept underlay QP")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Saeed Mahameed
5360fd473c net/mlx5e: IPoIB, Only support regular RQ for now
IPoIB doesn't support striding RQ at the moment, for this
we need to explicitly choose non striding RQ in IPoIB init,
even if the HW supports it.

Fixes: 8f493ffd88 ("net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Saeed Mahameed
20b6a1c782 net/mlx5e: Fix setup TC ndo
Fail-safe support patches introduced a trivial bug,
setup tc callback is doing a wrong check of the netdevice state,
the fix is simply to invert the condition.

Fixes: 6f9485af40 ("net/mlx5e: Fail safe tc setup")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Gal Pressman
e3c1950371 net/mlx5e: Fix ethtool pause support and advertise reporting
Pause bit should set when RX pause is on, not TX pause.
Also, setting Asym_Pause is incorrect, and should be turned off.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Gal Pressman
b383b544f2 net/mlx5e: Use the correct pause values for ethtool advertising
Query the operational pause from firmware (PFCC register) instead of
always passing zeros.

Fixes: 665bc53969 ("net/mlx5e: Use new ethtool get/set link ksettings API")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-05-14 13:33:45 +03:00
Linus Torvalds
50fb55d88c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix multiqueue in stmmac driver on PCI, from Andy Shevchenko.

 2) cdc_ncm doesn't actually fully zero out the padding area is
    allocates on TX, from Jim Baxter.

 3) Don't leak map addresses in BPF verifier, from Daniel Borkmann.

 4) If we randomize TCP timestamps, we have to do it everywhere
    including SYN cookies. From Eric Dumazet.

 5) Fix "ethtool -S" crash in aquantia driver, from Pavel Belous.

 6) Fix allocation size for ntp filter bitmap in bnxt_en driver, from
    Dan Carpenter.

 7) Add missing memory allocation return value check to DSA loop driver,
    from Christophe Jaillet.

 8) Fix XDP leak on driver unload in qed driver, from Suddarsana Reddy
    Kalluru.

 9) Don't inherit MC list from parent inet connection sockets, another
    syzkaller spotted gem. Fix from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
  dccp/tcp: do not inherit mc_list from parent
  qede: Split PF/VF ndos.
  qed: Correct doorbell configuration for !4Kb pages
  qed: Tell QM the number of tasks
  qed: Fix VF removal sequence
  qede: Fix XDP memory leak on unload
  net/mlx4_core: Reduce harmless SRIOV error message to debug level
  net/mlx4_en: Avoid adding steering rules with invalid ring
  net/mlx4_en: Change the error print to debug print
  drivers: net: wimax: i2400m: i2400m-usb: Use time_after for time comparison
  DECnet: Use container_of() for embedded struct
  Revert "ipv4: restore rt->fi for reference counting"
  net: mdio-mux: bcm-iproc: call mdiobus_free() in error path
  net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
  ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf
  net: cdc_ncm: Fix TX zero padding
  stmmac: pci: split out common_default_data() helper
  stmmac: pci: RX queue routing configuration
  stmmac: pci: TX and RX queue priority configuration
  stmmac: pci: set default number of rx and tx queues
  ...
2017-05-09 15:42:31 -07:00
Jack Morgenstein
83bd5118a1 net/mlx4_core: Reduce harmless SRIOV error message to debug level
Under SRIOV resource management, extra counters are allocated to VFs
from a free pool. If that pool is empty, the ALLOC_RES command for
a counter resource fails -- and this generates a misleading error
message in the message log.

Under SRIOV, each VF is allocated (i.e., guaranteed) 2 counters --
one counter per port. For ETH ports, the RoCE driver requests an
additional counter (above the guaranteed counters). If that request
fails, the VF RoCE driver simply uses the default (i.e., guaranteed)
counter for that port.

Thus, failing to allocate an additional counter does not constitute
a  problem, and the error message on the PF when this occurs should
be reduced to debug level.

Finally, to identify the situation that the reason for the failure is
that no resources are available to grant to the VF, we modified the
error returned by mlx4_grant_resource to -EDQUOT (Quota exceeded),
which more accurately describes the error.

Fixes: c3abb51bdb ("IB/mlx4: Add RoCE/IB dedicated counters")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
Talat Batheesh
89c557687a net/mlx4_en: Avoid adding steering rules with invalid ring
Inserting steering rules with illegal ring is an invalid operation,
block it.

Fixes: 820672812f ('net/mlx4_en: Manage flow steering rules with ethtool')
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
Kamal Heib
505a9249c2 net/mlx4_en: Change the error print to debug print
The error print within mlx4_en_calc_rx_buf() should be a debug print.

Fixes: 51151a16a6 ('mlx4: allow order-0 memory allocations in RX path')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-09 11:22:46 -04:00
Linus Torvalds
3341713c67 Updates #2 for 4.12 kernel merge window
- mlx5/IPoIB fixup patch
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7idAAoJELgmozMOVy/d8bEP/31hq3Vr1unNJYjU8yEaWI0l
 D/k3ZSJcnLaeoUE5Ts69pYVkncAM/g55bmS80bKyNONRJp0BMU0MxmmzQC/iMqMM
 LAvDf3Qyv8rTrEgjjmjWVQrP/v7K0+teD6YMmTkfB8fWqhi/vR+84Sv3c50GUXpL
 OZMrn6p/IXptWa24kvi5/mKNSmeOBQkp/1lRxyi6bhAcFhjwQZoO4tDkOL6Aj1oA
 FFeUcBDwMsOjbqhH59sM9ryOnGXPEwJUnLOPfq36pIrFzyUtbbcmFfMmefjJWMLK
 XW4VBXywYLzkCvr2tGeXMukJzroUPQhD6G/4uwnFlwad0tovlBCwPmOmXHbnhRbh
 JmriZaQEGj43o3yH/HHSeRvbwJ7e71xbH0tW21K9cB/CuYTixwlvYAMd218qXQ+k
 Q6qBRueWlgsaDatlkuRV0ezrILvGQhn6QgEbCUIluDjKp/d6aRtI4IM60Y5G02U/
 WLvnMQJ/MaGQXPVR1gWqWpy1ojvj/mAeTQf6aKw1Icjd3oSPgAI06B/uThXoWxh/
 FyvDl/+HNW2StxPudQPKWkMjamaD/cToSa4HjLjdPER9HiCHmUru6nRZOufhwtwh
 q2NEPx5kraRQNEMqjpopzNIWPpGBNBm+htkqXmTRpDdfvKgQ29vL49GCcV1VHtj9
 TpYkqu/6ZVPItAAfyklh
 =Ihe9
 -----END PGP SIGNATURE-----
mergetag object 67cf3623e0
 type commit
 tag for-next
 tagger Doug Ledford <dledford@redhat.com> 1493940800 -0400
 
 Updates #3 for 4.12 kernel merge window
 
 - The hfi1 15 patch set that landed late
 - IPoIB get_link_ksettings which landed late because I asked for a
   respin
 - One late rxe change
 - One -rc worthy fix that's in early
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC7q5AAoJELgmozMOVy/dungQAKpWG7sq+23PhsXDTOGtKSRg
 6jrgt5lG+V4twAJogXazxYsanPLQOE6ItKUFaBqEh2fULGyzXyPv1PsLNAVi4MYR
 6vbsMh2qCW8+dy+uEBveirQzdwdJr8U7g1mvJYPDr1Rmaw852Sq+PbId3nEy/cFK
 boMn/zhSSJuF8VsTfFCHx1gVdRN7n7gclqBP7Phq4bzdZ8E0u5GIPjmPDePUTpwH
 r3yFSC3VV5JGGgOSUc66oPPoIgraAG9EQezn01Llk5n+HPJ3eM91spV1Y9tZ7EgA
 toNW+bxEmanFYtuR+qzd0ctozUt2Ke6AUdTgcZcSCFkIan2ZOssJ0y2UN3CsNxlQ
 rYcOBe5KBNcFVb5E3NoeG+iHo9jT86fBqzYheHKmQJy38xalAlOoX6mLOvn4pPzo
 DFYDd6PpEU9SQSmhpPj1UbhvCOzecBaV3cRF/TihnqlOrRiKtd/YCccqT5GguPCB
 RVW6h1qQ94rmr/k6gobT7lNDWn2bZAgUupUD19Nw+7YwauxjGhSFsXW357NVcs4B
 3a/uLNVvnoyIjm62Sm8YFz9Bz3pRutfP9b6pSJ5V4P7dBgXYFZU35buViwcAj5fL
 DNGx4a8Poi3VQR8lQjpK6MRCWAibUY27zW1nbWQrQ9TdcHhK4sJC9Mqe41W4FXpU
 qQLkYBChz+mkCzE44bN/
 =H9f5
 -----END PGP SIGNATURE-----

Merge tags 'for-linus' and 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull more rdma updates from Doug Ledford:
 "As mentioned in my first pull request, this is the subsequent pull
  requests I had. This is all I have, and in fact this cleans out the
  RDMA subsystem's entire patchworks queue of kernel changes that are
  ready to go (well, it did for the weekend anyway, a few new patches
  are in, but they'll be coming during the -rc cycle).

  The first tag contains a single patch that would have conflicted if
  taken from my tree or DaveM's tree as it needed our trees merged to
  come cleanly.

  The second tag contains the patch series from Intel plus three other
  stragllers that came in late last week. I took them because it allowed
  me to legitimately claim that the RDMA patchworks queue was, for a
  short time, 100% cleared of all waiting kernel patches, woohoo! :-).

  I have it under my for-next tag, so it did get 0day and linux- next
  over the end of last week, and linux-next did show one minor conflict.

  Summary:

  'for-linus' tag:
   - mlx5/IPoIB fixup patch

  'for-next' tag:
   - the hfi1 15 patch set that landed late
   - IPoIB get_link_ksettings which landed late because I asked for a
     respin
   - one late rxe change
   - one -rc worthy fix that's in early"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/mlx5: Enable IPoIB acceleration

* tag 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  rxe: expose num_possible_cpus() cnum_comp_vectors
  IB/rxe: Update caller's CRC for RXE_MEM_TYPE_DMA memory type
  IB/hfi1: Clean up on context initialization failure
  IB/hfi1: Fix an assign/ordering issue with shared context IDs
  IB/hfi1: Clean up context initialization
  IB/hfi1: Correctly clear the pkey
  IB/hfi1: Search shared contexts on the opened device, not all devices
  IB/hfi1: Remove atomic operations for SDMA_REQ_HAVE_AHG bit
  IB/hfi1: Use filedata rather than filepointer
  IB/hfi1: Name function prototype parameters
  IB/hfi1: Fix a subcontext memory leak
  IB/hfi1: Return an error on memory allocation failure
  IB/hfi1: Adjust default eager_buffer_size to 8MB
  IB/hfi1: Get rid of divide when setting the tx request header
  IB/hfi1: Fix yield logic in send engine
  IB/hfi1, IB/rdmavt: Move r_adefered to r_lock cache line
  IB/hfi1: Fix checks for Offline transient state
  IB/ipoib: add get_link_ksettings in ethtool
2017-05-08 20:07:29 -07:00
Michal Hocko
752ade68cb treewide: use kv[mz]alloc* rather than opencoded variants
There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:13 -07:00
Erez Shitrit
693dfd5a3f IB/mlx5: Enable IPoIB acceleration
Enable mlx5 IPoIB acceleration by declaring
mlx5_ib_{alloc,free}_rdma_netdev and assigning the mlx5
IPoIB rdma_netdev callbacks.

In addition, this patch brings in sync mlx5's IPoIB parts for net and IB
trees. As a precaution, we disabled IPoIB acceleration by default (in
the mlx5_core Kconfig file).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-05-04 16:22:08 -04:00
Ido Schimmel
b1e455260c mlxsw: spectrum_router: Simplify VRF enslavement
When a netdev is enslaved to a VRF master, its router interface (RIF)
needs to be destroyed (if exists) and a new one created using the
corresponding virtual router (VR).

>From the driver's perspective, the above is equivalent to an inetaddr
event sent for this netdev. Therefore, when a port netdev (or its
uppers) are enslaved to a VRF master, call the same function that
would've been called had a NETDEV_UP was sent for this netdev in the
inetaddr notification chain.

This patch also fixes a bug when a LAG netdev with an existing RIF is
enslaved to a VRF. Before this patch, each LAG port would drop the
reference on the RIF, but would re-join the same one (in the wrong VR)
soon after. With this patch, the corresponding RIF is first destroyed
and a new one is created using the correct VR.

Fixes: 7179eb5acd ("mlxsw: spectrum_router: Add support for VRFs")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 11:47:58 -04:00
Eli Cohen
0a0ab1d2cc net/mlx5: E-Switch, Avoid redundant memory allocation
struct esw_mc_addr is a small struct that can be part of struct
mlx5_eswitch. Define it as a field and not as a pointer and save the
kzalloc call and then error flow handling.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:21 +03:00
Eran Ben Elisha
0f6e4cf674 net/mlx5e: Disable HW LRO when PCI is slower than link on striding RQ
We will activate the HW LRO only on servers with PCI BW > MAX LINK BW,
or when PCI BW > 16Gbps. On other cases we do not want LRO by default as
LRO sessions might get timeout and add redundant software overhead.

Tested:
	ethtool -k <ifs-name> | grep large-receive-offload
	On systems with and without the limitations.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:20 +03:00
Tariq Toukan
b1b03bded1 net/mlx5e: Use u8 as ownership type in mlx5e_get_cqe()
CQE ownership indication is as small as a single bit.
Use u8 to speedup the comparison.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:19 +03:00
Tariq Toukan
ad78af9b88 net/mlx5e: Use prefetchw when a write is to follow
"prefetchw()" prefetches the cacheline for write. Use it for
skb->data, as soon we'll be copying the packet header there.

Performance:
Single-stream packet-rate tested with pktgen.
Packets are dropped in tc level to zoom into driver data-path.
Larger gain is expected for smaller packets, as less time
is spent on handling SKB fragments, making the path shorter
and the improvement more significant.

---------------------------------------------
packet size | before    | after     | gain  |
64B         | 4,113,306 | 4,778,720 |  16%  |
1024B       | 3,633,819 | 3,950,593 | 8.7%  |

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:17 +03:00
Tariq Toukan
1f5b1e47ee net/mlx5e: Optimize poll ICOSQ completion queue
UMR operations are more frequent and important.
Check them first, and add a compiler branch predictor hint.

According to current design, ICOSQ CQ can contain at most one
pending CQE per napi. Poll function is optimized accordingly.

Performance:
Single-stream packet-rate tested with pktgen.
Packets are dropped in tc level to zoom into driver data-path.
Larger gain is expected for larger packet sizes, as BW is higher
and UMR posts are more frequent.

---------------------------------------------
packet size | before    | after     | gain  |
64B         | 4,092,370 | 4,113,306 |  0.5% |
1024B       | 3,421,435 | 3,633,819 |  6.2% |

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:16 +03:00
Hadar Hen Zion
a2fa1fe5ad net/mlx5e: Act on delay probe time updates
The user can change delay_first_probe_time parameter through sysctl.
Listen to NETEVENT_DELAY_PROBE_TIME_UPDATE notifications and update the
intervals for updating the neighbours 'used' value periodic task and
for flow HW counters query periodic task.
Both of the intervals will be update only in case the new delay prob
time value is lower the current interval.

Since the driver saves only one min interval value and not per device,
the users will be able to set lower interval value for updating
neighbour 'used' value periodic task but they won't be able to schedule
a higher interval for this periodic task.
The used interval for scheduling neighbour 'used' value periodic task is
the minimal delay prob time parameter ever seen by the driver.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:15 +03:00
Hadar Hen Zion
f6dfb4c3f2 net/mlx5e: Update neighbour 'used' state using HW flow rules counters
When IP tunnel encapsulation rules are offloaded, the kernel can't see
the traffic of the offloaded flow. The neighbour for the IP tunnel
destination of the offloaded flow can mistakenly become STALE and
deleted by the kernel since its 'used' value wasn't changed.

To make sure that a neighbour which is used by the HW won't become
STALE, we proactively update the neighbour 'used' value every
DELAY_PROBE_TIME period, when packets were matched and counted by the HW
for one of the tunnel encap flows related to this neighbour.

The periodic task that updates the used neighbours is scheduled when a
tunnel encap rule is successfully offloaded into HW and keeps re-scheduling
itself as long as the representor's neighbours list isn't empty.

Add, remove, lookup and status change operations done over the
representor's neighbours list or the neighbour hash entry encaps list
are all serialized by RTNL lock.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:14 +03:00
Hadar Hen Zion
232c001398 net/mlx5e: Add support to neighbour update flow
In order to offload TC encap rules, the driver does a lookup for the IP
tunnel neighbour according to the output device and the destination IP
given by the user.

To keep tracking after the validity state of such neighbours, we keep
the neighbours information (pair of device pointer and destination IP)
in a hash table maintained at the relevant egress representor and
register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update
netevent, we search for a match among the cached neighbours entries used for
encapsulation.

In case the neighbour isn't valid, we can't offload the flow into the
HW. We cache the flow (requested matching and actions) in the driver and
offload the rule later, when the neighbour is resolved and becomes
valid.

When a flow is only cached in the driver and not offloaded into HW
yet, we use EAGAIN return value to mark it internally, the TC ndo still
returns success.

Listen to kernel neighbour update netevents to trace relevant neighbours
validity state:

1. If a neighbour becomes valid, offload the related rules to HW.

2. If the neighbour becomes invalid, remove the related rules from HW.

3. If the neighbour mac address was changed, update the encap header.
   Remove all the offloaded rules using the old encap header from the HW
   and insert new rules to HW with updated encap header.

Access to the neighbors hash table is protected by RTNL lock of its
caller or by the table's spinlock.

Details of the locking/synchronization among the different actions
applied on the neighbour table:

Add/remove operations - protected by RTNL lock of its caller (all TC
commands are protected by RTNL lock). Add and remove operations are
initiated only when the user inserts/removes a TC rule into/from the driver.

Lookup/remove operations - since the lookup operation is done from
netevent notifier block, RTNL lock can't be used (atomic context).
Use the table's spin lock to protect lookups from TC user removal operation.
bh is used since netevent can be called from a softirq context.

Lookup/add operations - The hash table access functions are taking
care of the protection between lookup and add operations.

When adding/removing encap headers and rules to/from the HW, RTNL lock
is used. It can happen when:

1. The user inserts/removes a TC rule into/from the driver (TC commands
are protected by RTNL lock of it's caller).

2. The driver gets neighbour notification event, which reports about
neighbour validity status change. Before adding/removing encap headers
and rules to/from the HW, RTNL lock is taken.

A neighbour hash table entry should be freed when its encap list is empty.
Since The neighbour update netevent notification schedules a neighbour
update work that uses the neighbour hash entry, it can't be freed
unconditionally when the encap list becomes empty during TC delete rule flow.
Use reference count to protect from freeing neighbour hash table entry
while it's still in use.

When the user asks to unregister a netdvice used by one of the neigbours,
neighbour removal notification is received. Then we take a reference on the
neighbour and don't free it until the relevant encap entries (and flows) are
marked as invalid (not offloaded) and removed from HW.
As long as the encap entry is still valid (checked under RTNL lock) we
can safely access the neighbour device saved on mlx5e_neigh struct.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:13 +03:00
Hadar Hen Zion
37b498ff23 net/mlx5e: Add neighbour hash table to the representors
Add hash table to the representors which is to be used by the next patch
to save neighbours information in the driver.

In order to offload IP tunnel encapsulation rules, the driver must find
the tunnel dst neighbour according to the output device and the
destination address given by the user. The next patch will cache the
neighbors information in the driver to allow support in neigh update
flow for tunnel encap rules.

The neighbour entries are also saved in a list so we easily iterate over
them when querying statistics in order to provide 'used' feedback to the
kernel neighbour NUD core.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:12 +03:00
Hadar Hen Zion
033354d501 net/mlx5e: Read neigh parameters with proper locking
The nud_state and hardware address fields are protected by the neighbour
lock, we should acquire it before accessing those parameters.

Use this lock to avoid inconsistency between the neighbour validity state
and it's hardware address.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:10 +03:00
Hadar Hen Zion
0b67a38fe6 net/mlx5e: Use flag to properly monitor a flow rule offloading state
Instead of relaying on the 'flow->rule' pointer value which can be
valid or invalid (in case the FW returns an error while trying to offload
the rule), monitor the rule state using a flag.

In downstream patch which adds support to IP tunneling neigh update
flow, a TC rule could be cached in the driver and not offloaded into the
HW. In this case, the flow handle pointer stays NULL.

Check the offloaded flag to properly deal with rules which are currently
not offloaded when querying rule statistics.

This patch doesn't add any new functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:09 +03:00
Hadar Hen Zion
1a8552bd81 net/mlx5e: Remove output device parameter from create encap header helpers definition
Passing output device parameter to the helper functions that deal with
creation of encapsulation headers is redundant. Output device parameter
can be defined inside those helpers, no need to pass it. Refactor the code by
removing the parameter from the function signature.

This patch doesn't change any functionality.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:08 +03:00
Or Gerlitz
c1ae11521b net/mlx5e: Move the encap entry structure from the eswitch header
The encap entry structure isn't manipulated by the eswitch code,
hence it can/needs to be removed from the eswitch header.

Do that, and change it to have mlx5e_ prefix.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:07 +03:00
Or Gerlitz
45247bf298 net/mlx5: Remove encap entry pointer from the eswitch flow attributes
Encap wise, the tc eswitch flow attribute struct needs to have
only the encap ID which is programmed later to the HW and none
of the higher level encap params, fix that.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-30 16:03:06 +03:00
Saeed Mahameed
1d447a3914 net/mlx5e: Extendable vport representor netdev private data
Make representor netdev private data extendable by adding new struct
"mlx5e_rep_priv" and use it as the rep netdev private data struct
instead of directly pointing to mlx5_eswitch_rep.

Added new en_rep.h header file to contain all representor related
definitions and prototypes, and moved all representor specific logic
into en_rep.c.

Needed for downstream patches to extend representor functionality to
support neighbour update.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2017-04-30 16:02:48 +03:00
David S. Miller
b1513c3531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 22:39:08 -04:00
David S. Miller
38baf3a68b mlx5-fixes-2017-04-22
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJY+9NgAAoJEEg/ir3gV/o+/RIH/2Ua/FvxWtnaXhLj9GPELdGx
 4Q2+ub43Q/F2cU2rIP0S/Ki3fEeOfk+IR87bvKBc+KTcLwUcBQloLjiLTxVOXSNY
 +NmE7T1gl7Sb4NzJ9lDVYbmUlDzWZixbFkQdZ6nZJTKecXuN+xooL7EWosyZKuFd
 FlDpIMacWlH2bMb/1U4lClg9MMPz8e37B9kJ0Vy/lert7NkVdXgYbPI2pKxweF9i
 7yH0pNLKYvIQOubZZ9A7gPhk+OGp6xLAo9pJF6xG8tQuXI59Fz6tcKGbNb8GdzZu
 g12EY2c75BxWJofPtvsDDM5i8ypwF3tfCqxDjw4h9F0wHGJv6tlh51vyuYA8ceg=
 =KnUF
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2017-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2017-04-22

This series contains some mlx5 fixes for net.

For your convenience, the series doesn't introduce any conflict with
the ongoing net-next pull request.

Please pull and let me know if there's any problem.

For -stable:
("net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5") kernels >= 4.10
("net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling") kernels >= 4.8
("net/mlx5e: Fix small packet threshold")       kernels >= 4.7
("net/mlx5: Fix driver load bad flow when having fw initializing timeout") kernels >= 4.4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 15:58:03 -04:00
Martin KaFai Lau
1510d72863 net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats
We have observed a sudden spike in rx/tx_packets and rx/tx_bytes
reported under /proc/net/dev.  There is a race in mlx5e_update_stats()
and some of the get-stats functions (the one that we hit is the
mlx5e_get_stats() which is called by ndo_get_stats64()).

In particular, the very first thing mlx5e_update_sw_counters()
does is 'memset(s, 0, sizeof(*s))'.  For example, if mlx5e_get_stats()
is unlucky at one point, rx_bytes and rx_packets could be 0.  One second
later, a normal (and much bigger than 0) value will be reported.

This patch is to use a 'struct mlx5e_sw_stats temp' to avoid
a direct memset zero on priv->stats.sw.

mlx5e_update_vport_counters() has a similar race.  Hence, addressed
together.  However, memset zero is removed instead because
it is not needed.

I am lucky enough to catch this 0-reset in rx multicast:
eth0: 41457665   76804   70    0    0    70          0     47085 15586634   87502    3    0    0     0       3          0
eth0: 41459860   76815   70    0    0    70          0     47094 15588376   87516    3    0    0     0       3          0
eth0: 41460577   76822   70    0    0    70          0         0 15589083   87521    3    0    0     0       3          0
eth0: 41463293   76838   70    0    0    70          0     47108 15595872   87538    3    0    0     0       3          0
eth0: 41463379   76839   70    0    0    70          0     47116 15596138   87539    3    0    0     0       3          0

v2: Remove memset zero from mlx5e_update_vport_counters()
v1: Use temp and memcpy

Fixes: 9218b44dcc ("net/mlx5e: Statistics handling refactoring")
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 12:15:08 -04:00
Ilan Tayari
5e82c9e4ed net/mlx5e: Fix ETHTOOL_GRXCLSRLALL handling
Handler for ETHTOOL_GRXCLSRLALL must set info->data to the size
of the table, regardless of the amount of entries in it.
Existing code does not do that, and this breaks all usage of ethtool -N
or -n without explicit location, with this error:
rmgr: Invalid RX class rules table size: Success

Set info->data to the table size.

Tested:
ethtool -n ens8
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1
ethtool -N ens8 flow-type ip4 src-ip 1.1.1.1 dst-ip 2.2.2.2 action 1 loc 55
ethtool -n ens8
ethtool -N ens8 delete 1023
ethtool -N ens8 delete 55

Fixes: f913a72aa0 ("net/mlx5e: Add support to get ethtool flow rules")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Eugenia Emantayev
cbad8cddb6 net/mlx5e: Fix small packet threshold
RX packet headers are meant to be contained in SKB linear part,
and chose a threshold of 128.
It turns out this is not enough, i.e. for IPv6 packet over VxLAN.
In this case, UDP/IPv4 needs 42 bytes, GENEVE header is 8 bytes,
and 86 bytes for TCP/IPv6. In total 136 bytes that is more than
current 128 bytes. In this case expand header flow is reached.
The warning in skb_try_coalesce() caused by a wrong truesize
was already fixed here:
commit 158f323b98 ("net: adjust skb->truesize in pskb_expand_head()").
Still, we prefer to totally avoid the expand header flow for performance reasons.
Tested regular TCP_STREAM with iperf for 1 and 8 streams, no degradation was found.

Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Maor Gottlieb
5ae85b0eda net/mlx5: Fix UAR memory leak
When UAR is released, we deallocate the device resource, but
don't unmmap the UAR mapping memory.
Fix the leak by unmapping this memory.

Fixes: a6d51b6861 ('net/mlx5: Introduce blue flame register allocator)
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Or Gerlitz
225aabaffe net/mlx5e: Make sure the FW max encap size is enough for ipv6 tunnels
Otherwise the code that fills the ipv6 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: ce99f6b97f ('net/mlx5e: Support SRIOV TC encapsulation offloads for IPv6 tunnels')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Or Gerlitz
32f3671f69 net/mlx5e: Make sure the FW max encap size is enough for ipv4 tunnels
Otherwise the code that fills the ipv4 encapsulation headers could be writing
beyond the allocated headers buffer.

Fixes: a54e20b4fc ('net/mlx5e: Add basic TC tunnel set action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Or Gerlitz
c415f704c8 net/mlx5: E-Switch, Correctly deal with inline mode on ConnectX-5
On ConnectX5 the wqe inline mode is "none" and hence the FW
reports MLX5_CAP_INLINE_MODE_NOT_REQUIRED.

Fix our devlink callbacks to deal with that on get and set.

Also fix the tc flow parsing code not to fail anything when
inline isn't required.

Fixes: bffaa91658 ('net/mlx5: E-Switch, Add control for inline mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Mohamad Haj Yahia
55378a238e net/mlx5: Fix driver load bad flow when having fw initializing timeout
If FW is stuck in initializing state we will skip the driver load, but
current error handling flow doesn't clean previously allocated command
interface resources.

Fixes: e3297246c2 ('net/mlx5_core: Wait for FW readiness on startup')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 21:52:37 +03:00
Stephen Hemminger
8bf3198a5e mlx5: fix warning about missing prototype
Fix sparse warning about missing prototypes. The rx/tx code path
defines functions with prototypes in ipoib.h.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 20:26:42 +03:00
Stephen Hemminger
a7082ef066 mlx5: hide unused functions
Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 20:26:41 +03:00
Roi Dayan
7768d1971d net/mlx5: E-Switch, Add control for encapsulation
Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 20:26:40 +03:00
Or Gerlitz
1967ce6ea5 net/mlx5: E-Switch, Refactor fast path FDB table creation in switchdev mode
Refactor the creation of the fast path FDB table that holds the
offloaded rules in SRIOV switchdev mode into it's own function.

This will be used in the next patch to be able and re-create the
table under different settings without going through legacy mode.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-22 20:26:38 +03:00
Dan Carpenter
6905e5a5c8 net/mlx5e: IPoIB, Fix error handling in mlx5_rdma_netdev_alloc()
The labels were out of order, so it either could result in an Oops or a
leak.

Fixes: 48935bbb7a ("net/mlx5e: IPoIB, Add netdevice profile skeleton")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 16:26:10 -04:00
Jiri Pirko
9d41accc1e mlxsw: spectrum: Add FID miss trap
When there is no FID set for a specific packet, the HW will drop it.
However, by default these packets are useful to be delivered to CPU as
it can inspect them and program HW accordingly. So add this trap.

This would only ever happen when port is enslaved to an OVS master.
Otherwise, packets would be dropped during VLAN / STP filtering,
before FID classification.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:31 -04:00
Jiri Pirko
2b94e58df5 mlxsw: spectrum: Allow ports to work under OVS master
>From now on, a port can become a slave of OVS master. All vlans
are enabled, STP state is set to "forwarding". It is up to the OVS
userspace daemon to setup the flows either in kernel or in HW using TC
flower offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:31 -04:00
Jiri Pirko
93cd081305 mlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan range
So far, mlxsw_sp_port_vlan_set range is limited by
MLXSW_REG_SPVM_REC_MAX_COUNT. In spectrum_switchdev code this is
wrapped up by a helper function which actually does multiple calls
to FW for bigger ranges. Move the code into mlxsw_sp_port_vlan_set
and use it always. That allows caller not to care about the range.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:30 -04:00
Jiri Pirko
cedbb8b259 mlxsw: spectrum_flower: Set dummy FID before forward action
HW requires the FID to be valid in order for the forward action to work.
So regardless of the current FID validity, just set the dummy FID which
would do the trick.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:30 -04:00
Jiri Pirko
202d6f423c mlxsw: spectrum: Add dummy FID initialization
For forwarding using ACL action, HW needs a valid FID to be setup. It
does not actually use it, so it can be any valid FID. So create a dummy
FID only for this purpose.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:30 -04:00
Jiri Pirko
ac44dd43d8 mlxsw: spectrum: Implement action to set FID
Implement part of multipurpose Virtual Router and Forwarding Domain
Action that takes care of setting up FID. We need to use it to be able
to forward packets using ACL action when no FID is associated on RX.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:30 -04:00
Jiri Pirko
b51df799f6 mlxsw: spectrum: Fix indent in mlxsw_sp_netdevice_port_upper_event
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 15:32:29 -04:00
Greg Thelen
8dc7d11f1d net/mlx4: suppress 'may be used uninitialized' warning
gcc 4.8.4 complains that mlx4_SW2HW_MPT_wrapper() uses an uninitialized
'mpt' variable:
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_SW2HW_MPT_wrapper':
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2802:12: warning: 'mpt' may be used uninitialized in this function [-Wmaybe-uninitialized]
     mpt->mtt = mtt;

I think this warning is a false complaint.  mpt is only used when
mr_res_start_move_to() return zero, and in all such cases it initializes
mpt.  But apparently gcc cannot see that.

Initialize mpt to avoid the warning.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-20 13:19:21 -04:00
Saeed Mahameed
955bc48081 net/mlx5e: E-switch vport manager is valid for ethernet only
Currently the driver support only ethernet eswitch, and we want to
protect downstream IPoIB netdev from trying to access it in IB link.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed
9d6bd752c6 net/mlx5e: IPoIB, RX handler
Implement IPoIB RX SKB handler.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:32 -04:00
Saeed Mahameed
20fd0c193f net/mlx5e: RX handlers per netdev profile
In order to have different RX handler per profile, fix and refactor the
current code to take the rx handler directly from the netdevice profile
rather than computing it on runtime as it was done with the switchdev
mode representor rx handler.

This will also remove the current wrong assumption in mlx5e_alloc_rq
code that mlx5e_priv->ppriv is of the type vport_rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed
258545449b net/mlx5e: IPoIB, Xmit flow
Implement mlx5e's IPoIB SKB transmit using the helper functions provided
by mlx5e ethernet tx flow, the only difference in the code between
mlx5e_xmit and mlx5i_xmit is that IPoIB has some extra fields to fill
(UD datagram segment) in the TX descriptor (WQE) and it doesn't need to
have any vlan handling.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed
77bdf8950b net/mlx5e: Xmit flow break down
Break current mlx5e xmit flow into smaller blocks (helper functions)
in order to reuse them for IPoIB SKB transmission.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed
ec8fd927b7 net/mlx5e: IPoIB, Underlay QP
Create IPoIB underlay QP needed by the IPoIB netdevice profile for RSS
and TX HW context to perform on IPoIB traffic.

Reset the underlay QP on dev_uninit ndo to stop IPoIB traffic going
through this QP when the ULP IPoIB decides to cleanup.

Implement attach/detach mcast RDMA netdev callbacks for later RDMA
netdev use.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:31 -04:00
Saeed Mahameed
603f4a4521 net/mlx5e: IPoIB, Basic netdev ndos open/close
Implement open/close of IPoIB netdevice ndos using mlx5e's
channels API to manage data path resources (RQs/SQs/CQs).

Set IPoIB netdev address on dev_init ndo.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed
5426a0b274 net/mlx5e: IPoIB, TX TIS creation
Modify mlx5e tis creation function to accept underlay qp number, which
will be needed by IPoIB.

Implement mlx5i (IPoIB) tx init/cleanup netdevice profile flows to
create one TIS with the IPoIB underlay qp, for IPoIB TX SQs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed
bc81b9d326 net/mlx5e: IPoIB, RSS flow steering tables
Like the mlx5e ethernet mode, on IPoIB mode we need to create RX steering
tables, but IPoIB do not require MAC and VLAN steering tables so the
only tables we create in here are:
1. TTC Table (Traffic Type Classifier table for RSS steering)
2. ARFS Table (for accelerated RFS support)

Creation of those tables is identical to mlx5e ethernet mode, hence the
use of mlx5e_create_ttc_table and mlx5e_arfs_create_tables.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed
8f493ffd88 net/mlx5e: IPoIB, RX steering RSS RQTs and TIRs
Implement IPoIB RX RSS (RQTs and TIRs) HW objects creation,
All we do here is simply reuse the mlx5e implementation to create
direct and indirect (RSS) steering HW objects.

For that we just expose
mlx5e_{create,destroy}_{direct,indirect}_{rqt,tir} functions into en.h
and call them from ipoib.c in init/cleanup_rx IPoIB netdevice profile
callbacks.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed
48935bbb7a net/mlx5e: IPoIB, Add netdevice profile skeleton
Create mlx5e IPoIB netdevice profile skeleton in the new ipoib.c
file with empty implementation.

Downstream patches will provide the full mlx5 rdma netdevice acceleration
support for IPoIB into this new file, by using the mlx5e netdevice
profile and new mlx5_channels APIs and infrastructures.
Same as already done in mlx5e NIC netdevice and switchdev mode VF
representors.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:30 -04:00
Saeed Mahameed
2c3b5beec4 net/mlx5e: More generic netdev management API
In preparation for mlx5e RDMA net_device support, here we generalize
mlx5e_attach/detach in a way that those functions will be agnostic
to link type.  For that we move ethernet specific NIC net device logic out
of those functions into {nic,rep}_{enable/disable} mlx5e NIC and
representor profiles callbacks.

Also some of the logic was moved only to NIC profile since it is not right
to have this logic for representor net device (e.g. set port MTU).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Erez Shitrit
ffdb8827ec net/mlx5: Enable flow-steering for IB link
Get the relevant capabilities if supports ipoib_enhanced_offloads and
init the flow steering table accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Erez Shitrit
b3ba51498b net/mlx5: Refactor create flow table method to accept underlay QP
IB flow tables need the underlay qp to perform flow steering.
Here we change the API of the flow tables creation to accept the
underlay QP number as a parameter in order to support IB (IPoIB) flow
steering.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 11:08:29 -04:00
Christoph Hellwig
3680b1f655 mlxsw: convert to pci_alloc_irq_vectors
Trivial conversion as only one vector is supported, but at least we
lose the useless msix_entry member in the per-device structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11 11:16:03 -04:00
Saeed Mahameed
457fcd8a11 net/mlx5e: Set default RX moderation parameters on driver load
RX moderation default parameters shouldn't be set in
mlx5e_build_rx_cq_param since it would reset the values every time on
netdev open/close.  Instead, it should be set in
mlx5e_set_rx_cq_mode_params which is called on driver load only.

Fixes: 6a9764efb2 ("net/mlx5e: Isolate open_channels from priv->params")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:21:28 +03:00
Eran Ben Elisha
95b6c6a519 net/mlx5e: Reuse alloc cq code for all CQs allocation
Reuse the code for mlx5e_alloc_cq and mlx5e_alloc_drop_cq, as they
have a similar flow.

Prior to this patch, the CQEs in the "drop CQ" were not initialized,
fixed
it with the shared flow of alloc CQ.  This is not a critical bug as the
RQ connected to this CQ never moved to RTS, but still better to have
this right.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:21:27 +03:00
Inbar Karmy
84e11edb71 net/mlx5e: Show board id in ethtool driver information
Add the board id (PSID) to the firmware-version field
in the ethtool -i (driver information).
The PSID is shown in parentheses, next to the fw-version.

$ ethtool -i ens6
firmware-version: 12.14.1101 (MT_2190110032)

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:21:26 +03:00
Eran Ben Elisha
6543b78ea1 net/mlx5e: Change FW sub_minor display to 4 zeros padding
FW version should be reported as X.Y.ZZZZ, add leading zeroes to sub
minor in order to fix it.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:21:25 +03:00
Guy Ergas
f6d96a2092 net/mlx5e: Make mlx5e_modify_rqs_vsd a static function
Make mlx5e_modify_rqs_vsd a static function and remove from en.h in
order to reduce redundant exposure of functions.

Signed-off-by: Guy Ergas <guye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:21:00 +03:00
Guy Ergas
102722fc68 net/mlx5e: Add support for RXFCS feature flag
Add support for rx-fcs flag from ethtool.
In case this flag is set, update all RQs to scatter the FCS data into
the packet.

Signed-off-by: Guy Ergas <guye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:20:59 +03:00
Majd Dibbiny
d0dd989f97 net/mlx5: Update the list of the PCI supported devices
Rename the ConnectX-5 PCIe 4.0 to be ConnectX-5 Ex.
Also add the upcoming ConnectX-6 and it's VF IDs to the list.

Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-04-07 01:20:58 +03:00
Eric Dumazet
75d04aa368 mlx4: trust shinfo->gso_segs
mlx4 is the only driver in the tree making a point to recompute
shinfo->gso_segs.

Lets remove superfluous code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 13:27:49 -07:00
Tobias Regnery
053ee0a703 net/mlx5e: fix build error without CONFIG_SYSFS
Commit 9008ae0748 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
copied the calls to netif_set_real_num_{tx,rx}_queues from
mlx5e_open_locked to mlx5e_activate_priv_channels and wraps them in an
if condition to test for netdev->real_num_{tx,rx}_queues.

But netdev->real_num_rx_queues is conditionally compiled in if CONFIG_SYSFS
is set. Without CONFIG_SYSFS the build fails:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_activate_priv_channels':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2515:12: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?

Fix this by unconditionally call netif_set_real_num{tx,rx}_queues like before
commit 9008ae0748.

Fixes: 9008ae0748 ("net/mlx5e: Minimize mlx5e_{open/close}_locked")
Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 12:55:36 -07:00
David S. Miller
6f14f443d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple cases of overlapping changes (adding code nearby,
a function whose name changes, for example).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-06 08:24:51 -07:00
Andrew Morton
e270e96686 drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: fix build with gcc-4.4.4
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_set_rxfh':
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1067: warning: (near initialization for 'rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1068: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: excess elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:1069: warning: (near initialization for 'rrp')

gcc-4.4.4 has issues with anonymous union initializers.  Work around this.

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-02 19:32:57 -07:00
Andrew Morton
956327913c drivers/net/ethernet/mellanox/mlx5/core/en_main.c: fix build with gcc-4.4.4
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2210: error: unknown field 'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2211: warning: (near initialization for 'direct_rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_channels':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: (near initialization for 'rrp.<anonymous>')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2227: warning: initialization makes integer from pointer without a cast
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2228: error: unknown field 'rss' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: excess elements in struct initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2229: warning: (near initialization for 'rrp')
drivers/net/ethernet/mellanox/mlx5/core/en_main.c: In function 'mlx5e_redirect_rqts_to_drop':
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2238: error: unknown field 'rqn' specified in initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: missing braces around initializer
drivers/net/ethernet/mellanox/mlx5/core/en_main.c:2239: warning: (near initialization for 'drop_rrp.<anonymous>')

gcc-4.4.4 has issues with anonymous union initializers.  Work around this.

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-02 19:32:57 -07:00
David S. Miller
6c2257062e mlx5e-pedit 2017-03-28
Or Gerlitz says:
 
 This series adds support for offloading modifications of packet headers using
 ConnectX-5 HW header re-write as an action applied during packet steering.
 
 The offloaded SW mechanism is TC's pedit action. The offloading is
 supported for E-Switch steering of VF traffic in the SRIOV
 switchdev mode and for NIC (non eswitch) RX.
 
 One use-case for this offload on virtual networks, is when the hypervisor
 implements flow based router such as Open-Stack's DVR, where L2 headers
 of guest packets re-written with routers' MAC addresses and the IP TTL
 is decremented.
 
 Another use case (which can be applied in parallel with routing) is
 stateless NAT where guest L3/L4 headers are re-written.
 
 The series is built as follows: the 1st six patches are preperations which
 don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
 and commands) for header re-write, and patch nine allows offloading driver
 to access pedit keys.
 
 The 10th patch is somehow the core of the series, where we translate from
 the pedit way to represent set of header modification elements to the FW
 API for that same matter.
 
 Once a set of HW modification is established, we register it with the FW
 and get a modify header ID. When this ID is used with an action during
 packet steering, the HW applies the header modification on the packet.
 
 Patches 11 and 12 implement the above logic as an offload for pedit action
 for the NIC and E-Switch use-cases.
 
 I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing
 and helping me testing this functionality on HW simulator, before it could
 be done with FW.
 
 - Or.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJY2lwEAAoJEEg/ir3gV/o+rFIH+wdwGawEjoDhpihLqJHoRtwo
 Wvy88Lczj++Pfzt9E0kgwgmOdnj7j+GVOh6ALjneE3PDBJEFWG/GWY5aRYonlhhf
 zibafMTYf+8Dmm9qHW/C4OvhQowSrkG1RDucM2eyjXJfnAShZCh7dV4CDD7paxhu
 N2rlDdSEl0Im4aPCNHzyrdGg06Fy3A0DQkDvVLIQhKV0cLPIoC0U/i+ymVtsCUY/
 sSEEuSohvwdD5Ga5ZZdKicCo61lIRSi2rX5v4sK0exhAO3S8xyrKnwbiN7nVAQqg
 eVZ/ekbBiksD8MRMKctt/zGxd0X4PDaQ8J9XyF9CL6pRC5VipsDy+P/GEhj/x8U=
 =l2Qo
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-pedit' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Or Gerlitz says:

====================
mlx5e-pedit 2017-03-28

This series adds support for offloading modifications of packet headers using
ConnectX-5 HW header re-write as an action applied during packet steering.

The offloaded SW mechanism is TC's pedit action. The offloading is
supported for E-Switch steering of VF traffic in the SRIOV
switchdev mode and for NIC (non eswitch) RX.

One use-case for this offload on virtual networks, is when the hypervisor
implements flow based router such as Open-Stack's DVR, where L2 headers
of guest packets re-written with routers' MAC addresses and the IP TTL
is decremented.

Another use case (which can be applied in parallel with routing) is
stateless NAT where guest L3/L4 headers are re-written.

The series is built as follows: the 1st six patches are preperations which
don't yet add new functionality, patches 7-8 add the FW APIs (data-structures
and commands) for header re-write, and patch nine allows offloading driver
to access pedit keys.

The 10th patch is somehow the core of the series, where we translate from
the pedit way to represent set of header modification elements to the FW
API for that same matter.

Once a set of HW modification is established, we register it with the FW
and get a modify header ID. When this ID is used with an action during
packet steering, the HW applies the header modification on the packet.

Patches 11 and 12 implement the above logic as an offload for pedit action
for the NIC and E-Switch use-cases.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing
and helping me testing this functionality on HW simulator, before it could
be done with FW.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-29 11:24:21 -07:00
Talat Batheesh
e497ec680c net/mlx5: Avoid dereferencing uninitialized pointer
In NETDEV_CHANGEUPPER event the upper_info field is valid
only when linking is true. Otherwise it should be ignored.

Fixes: 7907f23adc (net/mlx5: Implement RoCE LAG feature)
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 18:07:15 -07:00
Arkadi Sharshevsky
2ba5999f00 mlxsw: spectrum: Add Support for erif table entries access
Implement dpipe's table ops for erif table which provide:
1. Getting the entries in the table with the associate values.
	- match on "mlxsw_meta:erif_index"
	- action on "mlxsw_meta:forwared_out"
2. Synchronize the hardware in case of enabling/disabling counters which
   mean removing erif counters from all interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:55 -07:00
Arkadi Sharshevsky
fd1b9d4192 mlxsw: spectrum_router: Add rif helper functions
Add rif helper function to access the rif index and rif devices ifindex.
This functions will be used by dpipe in order to dump the rif table.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:55 -07:00
Arkadi Sharshevsky
e0c0afd8aa mlxsw: spectrum: Support for counters on router interfaces
Add support for counter allocation on router interfaces. The allocation
depends on the counter state of relevant table. In case the counting is
disabled or no counters left the counter index will be set as invalid.

Also a counter pool for router allocation is added.

Signed-off-by: Arakdi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:55 -07:00
Arkadi Sharshevsky
ba73e97a63 mlxsw: reg: Add Router Interface Counter Register
The RICNT register retrieves per port performance counter. It will be
used to query the router interfaces statistics.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:55 -07:00
Arkadi Sharshevsky
d54b70feb6 mlxsw: spectrum: Add definition for egress rif table
Add definition for egress router interface table. This table describes
the final part in the routing pipeline. This table matches the egress
interface index (rif index, which is set by the previous stages and
determine the out port) and makes the decision of forwarding the packet
towards the L2 logic or dropping it.

The metadata header is added to represent this internal information.
The rif index field is mapped logically to netdevice ifindex.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:54 -07:00
Arkadi Sharshevsky
230ead0141 mlxsw: spectrum: Add placeholder for dpipe
Add placeholder for dpipe. Support for specific tables and headers will
be introduced in following patches. The headers are shared between all
mlxsw_sp instances.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:54 -07:00
Arkadi Sharshevsky
0f630fcbe5 mlxsw: reg: Add counter fields to RITR register
Update RITR for counter support. This allows adding counters for
ASIC's router ports.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-28 17:11:54 -07:00
Or Gerlitz
d7e75a325c net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions
This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:10 +03:00
Or Gerlitz
2f4fe4cab0 net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions
This includes calling the parsing code that translates from pedit
speak to the HW API, allocation (deallocation) of a modify header
context and setting the modify header id associated with this
context to the FTE of that flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:08 +03:00
Or Gerlitz
d79b6df6b1 net/mlx5e: Add parsing of TC pedit actions to HW format
Parse/translate a set of TC pedit actions to be formed in the HW API format.

User-space provides set of keys where each one of them is made of: command (add or
set), header-type, byte offset within that header along with a 32 bit mask and value.

The mask dictates what bits in the 32 bit word that starts on the offset we should
be dealing with, but under negative polarity (unset bits are to be modified).

We do a 1st pass over the set of keys while using the header-type and offset to
fill the masks and the values into a data-structure containting all the
supported network headers.

We then do a 2nd pass over the set of fields to re-write supported by the HW,
where for each such candidate field, we use the masks filled on the 1st pass to
realize if we should offloading re-write it.

In case offloading is required, we fill a HW descriptor with the following:

(1) the header field to modify
(2) the bit offset within the field from where to modify (set command only)
(3) the value to set/add
(4) the length in bits 1...32 to modify (set command only)

Note that it's possible for a given pedit mask to dictate modifying the
same header field multiple times or to modify multiple header fields.
Currently such combinations are not supported for offloading, hence, for set
commands, the offset within the field is always zero, and the length to modify
is the field size.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:07 +03:00
Or Gerlitz
2de24fedb9 net/mlx5: Introduce alloc/dealloc modify header context commands
Implement the low-level commands to support packet header re-write.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:05 +03:00
Or Gerlitz
2a69cb9ff7 net/mlx5: Introduce modify header structures, commands and steering action definitions
Add the definitions related to creation/deletion of a modify header
context and the modify header steering action which are used for HW
packet header modify (re-write) as part of steering. Add as well the
modify header id into two intermediate structs and set it to the FTE.

Note that as the push/pop vlan steering actions are emulated by the
ewitch management code, we're not breaking any compatibility while
changing their values to make room for the modify header action which
is not emulated and whose value is part of the FW API. The new bit
values for the emulated actions are at the end of the possible range.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:04 +03:00
Or Gerlitz
a750276f81 net/mlx5: Reorder few command cases to reflect their natural order
Move the commands related to scheduling elements and vport qos to
a suitable location (according to the MLX5_CMD_OP enum values) in
the command string and internal error helpers.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:03 +03:00
Or Gerlitz
e753b2b50d net/mlx5: Add helper to initialize a flow steering actions struct instance
There are bunch of places in the code where the intermediate struct
that keeps the elements related to flow actions is initialized with
the same default values. Put that into a small DECLARE type helper.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:01 +03:00
Or Gerlitz
aa0cbbae5d net/mlx5e: Properly deal with resource cleanup when adding TC flow fails
The code for adding tc fdb flows leaves things half set when it fails
in the middle. Currently we are not leaking things (e.g eswitch
vlan reference, encap reference and HW resources) since the main
code to add flower rules does a cleanup by calling mlx5e_tc_del_flow().

This cleanup further works just b/c we're checking there if the HW rule
for the flow we are attempting to delete is valid before touching it, and
since under the current possible combinations of supported actions it's okay
to go and blidnly deref or delete all the action related resources (encap, vlan).

Instead, do things properly, namely make sure that if add flow fails we
clean all what was allocated or referenced. Now, the flow delete code can
blindly deref/deallocate both the rule and the actions related resources and
when more action combinations are introduced (such as the upcoming header
re-write) we are fine with clear and robust code.

While here, align all of nic/fdb parse actions/add flow functions to get
mlx5e_tc_flow struct param and pick the attributes or whatever else needed
from there.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:34:00 +03:00
Or Gerlitz
17091853fc net/mlx5e: Add intermediate struct for TC flow parsing attributes
Add intermediate structure to store attributes parsed from TC filter
matching/actions parts which are soon to be configured into the HW.

Currently put there the flow matching spec after being parsed. More
content to be added in down-stream patch.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:33:59 +03:00
Or Gerlitz
3bc4b7bfa0 net/mlx5e: Add NIC attributes for offloaded TC flows
Add structure that contains the attributes related to offloaded
NIC flows. Currently it has the actions and flow tag.

While here, do xmas tree cleanup of the TC configure function.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:33:58 +03:00
Or Gerlitz
ecf5bb796b net/mlx5e: Add prefix for e-switch offloaded TC flow attributes
Add esw_ prefix to the flow attributes attached to offloaded e-switch
TC flows. This is a pre-step to add attributes to offloaded NIC TC flows.

Also, save one pointer space by using gcc's zero size array, this would
be beneficial for environments where 100Ks (or Ms) of flows are offloaded.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-03-28 15:33:57 +03:00
Saeed Mahameed
2e20a15120 net/mlx5e: Fail safe mtu and lro setting
Use the new fail-safe channels switch mechanism to set new
netdev mtu and lro settings.

MTU and lro settings demand some HW configuration changes after new
channels are created and ready for action. In order to unify switch
channels routine for LRO and MTU changes, and maybe future configuration
features, we now pass to it a modify HW function pointer to be
invoked directly after old channels are de-activated and before new
channels are activated.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:24 +03:00
Saeed Mahameed
6f9485af40 net/mlx5e: Fail safe tc setup
Use the new fail-safe channels switch mechanism to set up new
tc parameters.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:23 +03:00
Saeed Mahameed
be7e87f92b net/mlx5e: Fail safe cqe compressing/moderation mode setting
Use the new fail-safe channels switch mechanism to set new
CQE compressing and CQE moderation mode settings.

We also move RX CQE compression modify function out of en_rx file  to
a more appropriate place.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:22 +03:00
Saeed Mahameed
546f18ed3f net/mlx5e: Fail safe ethtool settings
Use the new fail-safe channels switch mechanism to set new ethtool
settings:
 - ring parameters
 - coalesce parameters
 - tx copy break parameters

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:21 +03:00
Saeed Mahameed
55c2503dae net/mlx5e: Introduce switch channels
A fail safe helper functions that allows switching to new channels on the
fly,  In simple words:

make_new_config(new_params)
{
    new_channels = open_channels(new_params);
    if (!new_channels)
         return "Failed, but current channels are still active :)"

    switch_channels(new_channels);

    return "SUCCESS";
}

Demonstrate mlx5e_switch_priv_channels usage in set channels ethtool
callback and make it fail-safe using the new switch channels mechanism.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:20 +03:00
Saeed Mahameed
9008ae0748 net/mlx5e: Minimize mlx5e_{open/close}_locked
mlx5e_redirect_rqts_to_{channels,drop} and mlx5e_{add,del}_sqs_fwd_rules
and Set real num tx/rx queues belong to
mlx5e_{activate,deactivate}_priv_channels, for that we move those functions
and minimize mlx5e_open/close flows.

This will be needed in downstream patches to replace old channels with new
ones without the need to call mlx5e_close/open.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:19 +03:00
Saeed Mahameed
a43b25daef net/mlx5e: CQ and RQ don't need priv pointer
Remove mlx5e_priv pointer from CQ and RQ structs,
it was needed only to access mdev pointer from priv pointer.

Instead we now pass mdev where needed.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:18 +03:00
Saeed Mahameed
6a9764efb2 net/mlx5e: Isolate open_channels from priv->params
In order to have a clean separation between channels resources creation
flows and current active mlx5e netdev parameters, make sure each
resource creation function do not access priv->params, and only works
with on a new fresh set of parameters.

For this we add "new" mlx5e_params field to mlx5e_channels structure
and use it down the road to mlx5e_open_{cq,rq,sq} and so on.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:18 +03:00
Saeed Mahameed
acc6c5953a net/mlx5e: Split open/close channels to stages
As a foundation for safe config flow, a simple clear API such as
(Open then Activate) where the "Open" handles the heavy unsafe
creation operation and the "activate" will be fast and fail safe,
to enable the newly created channels.

For this we split the RQs/TXQ SQs and channels open/close flows to
open => activate, deactivate => close.

This will simplify the ability to have fail safe configuration changes
in downstream patches as follows:

make_new_config(new_params)
{
     old_channels = current_active_channels;
     new_channels = create_channels(new_params);
     if (!new_channels)
              return "Failed, but current channels still active :)"
     deactivate_channels(old_channels); /* Can't fail */
     activate_channels(new_channels); /* Can't fail */
     close_channels(old_channels);
     current_active_channels = new_channels;

     return "SUCCESS";
}

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:17 +03:00
Saeed Mahameed
b676f65389 net/mlx5e: Refactor refresh TIRs
Rename mlx5e_refresh_tirs_self_loopback to mlx5e_refresh_tirs,
as it will be used in downstream (Safe config flow) patches, and make it
fail safe on mlx5e_open.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:16 +03:00
Saeed Mahameed
a5f97fee74 net/mlx5e: Redirect RQT refactoring
RQ Tables are always created once (on netdev creation) pointing to drop RQ
and at that stage, RQ tables (indirection tables) are always directed to
drop RQ.

We don't need to use mlx5e_fill_{direct,indir}_rqt_rqns to fill the drop
RQ in create RQT procedure.

Instead of having separate flows to redirect direct and indirect RQ Tables
to the current active channels Receive Queues (RQs), we unify the two
flows by introducing mlx5e_redirect_rqt function and redirect_rqt_param
struct. Combined, they provide one generic logic to fill the RQ table RQ
numbers regardless of the RQ table purpose (direct/indirect).

Demonstrated the usage with mlx5e_redirect_rqts_to_channels which will
be called on mlx5e_open and with mlx5e_redirect_rqts_to_drop which will
be called on mlx5e_close.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:15 +03:00
Saeed Mahameed
ff9c852f91 net/mlx5e: Introduce mlx5e_channels
Have a dedicated "channels" handler that will serve as channels
(RQs/SQs/etc..) holder to help with separating channels/parameters
operations, for the downstream fail-safe configuration flow, where we will
create a new instance of mlx5e_channels with the new requested parameters
and switch to the new channels on the fly.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:14 +03:00
Saeed Mahameed
be4891af77 net/mlx5e: Set netdev->rx_cpu_rmap on netdev creation
To simplify mlx5e_open_locked flow we set netdev->rx_cpu_rmap on netdev
creation rather on netdev open, it is redundant to set it every time on
mlx5e_open_locked.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:13 +03:00
Saeed Mahameed
7f859ecfa8 net/mlx5e: Set SQ max rate on mlx5e_open_txqsq rather on open_channel
Instead of iterating over the channel SQs to set their max rate, do it
on SQ creation per TXQ SQ.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
2017-03-27 15:08:12 +03:00
Arkadi Sharshevsky
1312444374 mlxsw: spectrum_kvdl: Cosmetic kvdl allocator API change
Currently the return allocated index and err value are multiplexed.
This patch changes the API to decouple the ret value from the allocated
index.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-25 19:56:15 -07:00
Saeed Mahameed
3139104861 net/mlx5e: Different SQ types
Different SQ types (tx, xdp, ico) are growing apart, we separate them
and remove unwanted parts in each one of them, to simplify data path and
utilize data cache.

Remove DB union from SQ structures since it is not needed anymore as we
now have different SQ data type for each SQ.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
33ad971186 net/mlx5e: Generalize SQ create/modify/destroy functions
In the next patches we will introduce different SQ types,
and we would want to reuse those functions, in this patch we make them
agnostic to SQ type (txq, xdp, ico).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
3b77235b94 net/mlx5e: Proper names for SQ/RQ/CQ functions
Rename mlx5e_{create,destroy}_{sq,rq,cq} to
mlx5e_{alloc,free}_{sq,rq,cq}.

Rename mlx5e_{enable,disable}_{sq,rq,cq} to
mlx5e_{create,destroy}_{sq,rq,cq}.

mlx5e_{enable,disable}_{sq,rq,cq} used to actually create/destroy the SQ
in FW, so we rename them to align the functions names with FW semantics.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
864b2d7153 net/mlx5e: Generalize tx helper functions for different SQ types
In the next patches we will introduce different SQ types, for that we here
generalize some TX helper functions to work with more basic SQ parameters,
in order to re-use them for the different SQ types.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
2239185ccd net/mlx5e: Optimize XDP frame xmit
XDP SQ has a fixed size WQE (MLX5E_XDP_TX_WQEBBS = 1) and only posts
one kind of WQE (MLX5_OPCODE_SEND),

Also we initialize SQ descriptors static fields once on open_xdpsq,
rather than every time on critical path.

Optimize the code in light of those facts and add a prefetch of the TX
descriptor first thing in the xdp xmit function.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case              Before     Now        improvement
---------------------------------------------------------------
XDP TX   (1 core)      13Mpps    13.7Mpps       5%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
39e12351a3 net/mlx5e: Poll XDP TX CQ before RX CQ
Handle XDP TX completions before handling RX packets, to make sure more
free space is available for XDP TX packets a moment before handling
RX packets.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case              Before     Now      improvement
---------------------------------------------------------------
XDP Drop (1 core)      16.9Mpps  16.9Mpps    No change
XDP TX   (1 core)      12Mpps    13Mpps      8%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:46 -07:00
Saeed Mahameed
31871f87bb net/mlx5e: Move XDP SQ instance into RQ
To save many rq->channel->sq dereferences in fast-path.
And rename it to xdpsq.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:45 -07:00
Saeed Mahameed
eba2db2bd2 net/mlx5e: Move mlx5e_rq struct declaration
Move struct mlx5e_rq and friends to appear after mlx5e_sq declaration in
en.h.

We will need this for next patch to move the mlx5e_sq instance into
mlx5e_rq struct for XDP SQs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:45 -07:00
Saeed Mahameed
1c4bf94045 net/mlx5e: Move XDP completion functions to rx file
XDP code belongs to RX path, move mlx5e_poll_xdp_tx_cq and
mlx5e_free_xdp_tx_descs to en_rx.c.

Rename them to mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:45 -07:00
Saeed Mahameed
aff2615763 net/mlx5e: Single bfreg (UAR) for all mlx5e SQs and netdevs
One is sufficient since Blue Flame is not supported anymore.
This will also come in handy for switchdev mode to save resources, since
VF representors will use same single UAR as well for their own SQs.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:45 -07:00
Saeed Mahameed
6982ab6097 net/mlx5e: Xmit, no write combining
mlx5e netdev Blue Flame (write combining) support demands a lot of
overhead for a little latency gain for some special cases, this overhead
is hurting the common case.

Here we remove xmit Blue Flame support by creating all bfregs with no
write combining for all SQs, and we remove a lot of BF logic and
conditions from xmit data path.

Simplify mlx5e_tx_notify_hw (doorbell function) by removing BF related
code and by removing one memory barrier needed for WC mapped SQ doorbell
buffers, which no longer exist.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case                   Before      Now      improvement
---------------------------------------------------------------
TX packets (24 threads)     50Mpps      54Mpps    8%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:45 -07:00
Saeed Mahameed
80fe326ab8 net/mlx5e: Use dma_rmb rather than rmb in CQE fetch routine
Use dma_rmb in mlx5e_get_cqe rather than aggressive rmb (at least on
some architectures), this should help improve the performance on such
CPU archs where dma_rmb is optimized.

Performance improvement:
System: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz

Test case                   Baseline      Now      improvement
---------------------------------------------------------------
TX packets (24 threads)     45Mpps        50Mpps      11%
TC stack Drop (1 core)      3.45Mpps      3.6Mpps     5%
XDP Drop      (1 core)      14Mpps        16.9Mpps    20%
XDP TX        (1 core)      10.4Mpps      12Mpps      15%

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 19:11:44 -07:00
Ido Schimmel
18281f2dab mlxsw: spectrum: Query cell size from firmware
As explained in the previous patch, the cell size may change in future
devices, so query it from the firmware instead of hard coding it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:53:29 -07:00
Ido Schimmel
f417f04da5 mlxsw: spectrum: Refactor port buffer configuration
The sizes and thresholds of the priority group (PG) buffers are
configured in cells, which represent a specific amount of bytes.

The cell size can vary in different devices, so it's better to query it
from the firmware than hard coding it.

Refactor the code dealing with this value into different functions, so
that it will be easier to make the conversion in the next patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:53:29 -07:00
Ido Schimmel
d3daae1b08 mlxsw: spectrum_buffers: Query shared buffer size from firmware
Instead of hard coding the size of the shared buffer in the driver,
query it from the firmware, as it may change in future devices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:53:28 -07:00
Ido Schimmel
5ec2ee7dd2 mlxsw: Query maximum number of ports from firmware
We currently hard code the maximum number of ports in the driver, but
this may change in future devices, so query it from the firmware
instead.

Fallback to a maximum of 64 ports in case this number can't be queried.
This should only happen in SwitchX-2 for which this number is correct.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:53:28 -07:00
Ido Schimmel
8494ab06e0 mlxsw: spectrum_router: Query number of LPM trees from firmware
Instead of hard coding the number of LPM trees in the driver, query it
from the firmware, as it may change in future devices.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 13:53:28 -07:00
Ido Schimmel
9a32562bec mlxsw: Remove debugfs interface
We don't use it during development and we can't extend it either, so
remove it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 21:29:32 -07:00
David S. Miller
16ae1f2236 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmmii.c
	drivers/net/hyperv/netvsc.c
	kernel/bpf/hashtab.c

Almost entirely overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 16:41:27 -07:00
Gal Pressman
8ab7e2ae15 net/mlx5e: Count LRO packets correctly
RX packets statistics ('rx_packets' counter) used to count LRO packets
as one, even though it contains multiple segments.
This patch will increment the counter by the number of segments, and
align the driver with the behavior of other drivers in the stack.

Note that no information is lost in this patch due to 'rx_lro_packets'
counter existence.

Before, ethtool showed:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
     rx_packets: 435277
     rx_lro_packets: 35847
     rx_packets_phy: 1935066

Now, we will see the more logical statistics:
$ ethtool -S ens6 | egrep "rx_packets|rx_lro_packets"
     rx_packets: 1935066
     rx_lro_packets: 35847
     rx_packets_phy: 1935066

Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 12:11:13 -07:00