Commit Graph

3713 Commits

Author SHA1 Message Date
Aviad Yehezkel
5f4183781a net/mlx5: Add empty egress namespace to flow steering core
Currently, we don't support egress flow steering namespace in mlx5
flow steering core implementation. However, when we want to encrypt
a packet, we model it as a flow steering rule in the egress path.
To overcome this, we add an empty egress namespace to flow steering.
This namespace is initialized only when ipsec support exists.
In the future, this will grow to a full blown full steering
implementation, resembling the ingress path.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:20:13 -08:00
Matan Barak
af76c50198 net/mlx5: Add shim layer between fs and cmd
The shim layer allows each namespace to define possibly different
functionality for add/delete/update commands. The shim layer
introduced here, will be used to support flow steering with the FPGA.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:19:56 -08:00
Matan Barak
a9db0ecf15 {net,IB}/mlx5: Add has_tag to mlx5_flow_act
The has_tag member will indicate whether a tag action was specified
in flow specification.

A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag.  HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.

So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:33 -08:00
Matan Barak
04e87170b0 net/mlx5: FPGA and IPSec initialization to be before flow steering
Some flow steering namespace initialization (i.e. egress namespace)
might depend on FPGA capabilities. Changing the initialization order
such that the FPGA will be initialized before flow steering.

Flow steering fs cmds initialization might depend on
IPSec capabilities. Changing the initialization order such
that the IPSec will be initialized before flow steering as well.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:10 -08:00
Aviad Yehezkel
1c9a10ebc7 net/mlx5e: Removed not need synchronize_rcu
This is already done by xfrm layer between state_dev_del callback
to state_dev_free callback.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:09 -08:00
Aviad Yehezkel
dc7debec07 net/mlx5e: Fixed sleeping inside atomic context
We can't allocate with GFP_KERNEL inside spinlock.
Actually ida_simple doesn't require spinlock so remove it.

Fixes: 547eede070 ("net/mlx5e: IPSec, Innova IPSec offload infrastructure")
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:09 -08:00
Aviad Yehezkel
ef927a9c16 net/mlx5e: Wait for FPGA command responses with a timeout
Generally, FPGA IPSec commands must always complete.
We want to wait for one minute for them to complete gracefully also
when killing a process.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:08 -08:00
Aviad Yehezkel
46f3ee4f3a net/mlx5: Fixed compilation issue when CONFIG_MLX5_ACCEL is disabled
IPSec init and cleanup functions also depends on linux/mlx5/driver.h.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-03-06 22:06:08 -08:00
David S. Miller
0f3e9c97eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
All of the conflicts were cases of overlapping changes.

In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-06 01:20:46 -05:00
Shalom Toledo
0a8a1bf17e mlxsw: spectrum_switchdev: Check success of FDB add operation
Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.

Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 18:12:44 -05:00
David Ahern
5e18b9c550 mlxsw: spectrum_router: Add support for ipv6 hash policy update
Similar to 28678f07f1 ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 13:04:23 -05:00
David Ahern
3192dac64c net: Rename NETEVENT_MULTIPATH_HASH_UPDATE
Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 13:04:22 -05:00
Yuval Mintz
494fff5637 ipmr, ip6mr: Make mfc_cache a common structure
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic  - mr_mfc
and convert both ipmr and ip6mr into using it.

For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-01 13:13:23 -05:00
Daniel Jurgens
aba4621346 {net, IB}/mlx5: Raise fatal IB event when sys error occurs
All other mlx5_events report the port number as 1 based, which is how FW
reports it in the port event EQE. Reporting 0 for this event causes
mlx5_ib to not raise a fatal event notification to registered clients
due to a seemingly invalid port.

All switch cases in mlx5_ib_event that go through the port check are
supposed to set the port now, so just do it once at variable
declaration.

Fixes: 89d44f0a6c73("net/mlx5_core: Add pci error handlers to mlx5_core driver")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-02-28 12:10:32 -07:00
Ido Schimmel
b3529af6bb spectrum: Reference count VLAN entries
One of the basic construct in the device is a port-VLAN pair, which can
be bound to a FID or a RIF in order to direct packets to the bridge or
the router, respectively.

Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
sw1p1.10), VID 1 is used to represent these and thus this VID can be
used by both upper devices of mlxsw ports and by the driver itself.

However, this VID is not reference counted and therefore might be freed
prematurely, which can result in various WARNINGs. For example:

$ ip link add name br0 type bridge vlan_filtering 1
$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev team0 master br0
$ ip link set dev enp1s0np1 master team0
$ ip address add 192.0.2.1/24 dev enp1s0np1

The enslavement to team0 will fail because team0 already has an upper
and thus vlan_vids_del_by_dev() will be executed as part of team's error
path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
WARNING will be generated when the driver will realize it can't find VID
1 on the port and bind it to a RIF.

Fix this by adding a reference count to the VLAN entries on the port, in
a similar fashion to the reference counting used by the corresponding
'vlan_vid_info' structure in the 8021q driver.

Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Reported-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:32:36 -05:00
Ido Schimmel
9d45deb04c mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast
When multicast snooping is enabled, the Linux bridge resorts to flooding
unregistered multicast packets to all ports only in case it did not
detect a querier in the network.

The above condition is not reflected to underlying drivers, which is
especially problematic in IPv6 environments, as multicast snooping is
enabled by default and since neighbour solicitation packets might be
treated as unregistered multicast packets in case there is no
corresponding MDB entry.

Until the Linux bridge reflects its querier state to underlying drivers,
simply treat unregistered multicast packets as broadcast and allow them
to reach their destination.

Fixes: 9df552ef3e ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:32:36 -05:00
Jiri Pirko
77d270967c mlxsw: spectrum: Fix handling of resource_size_param
Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables and
also memcpy size_params during devlink resource registration.
Also, introduce a convenient size_param_init helper.

Fixes: ef3116e540 ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:32:36 -05:00
Jiri Pirko
2ddc94c76c mlxsw: core: Fix flex keys scratchpad offset conflict
IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
scratchpad as L4 ports. Fix this by shifting all up.

Fixes: 5f57e09091 ("mlxsw: acl: Add ip ttl acl element")
Fixes: i80d0fe4710c ("mlxsw: acl: Add ip tos acl element")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:32:36 -05:00
Nogah Frankel
32dc5efc6c mlxsw: spectrum: qdiscs: prio: Handle graft command
Handle graft command for an offloaded sch_prio.
Grafting a qdisc to any place other than under its original parent is not
supported by mlxsw and will cause the grafted qdisc to stop being
offloaded.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
98ceb7b6d6 mlxsw: spectrum: qdiscs: prio: Delete child qdiscs when removing bands
When the number the bands of sch_prio is decreased, child qdiscs on the
deleted bands would get deleted as well.
This change and deletions are being done under sch_tree_lock of the
sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if
it is offloaded. Un-offloading can't be done inside this lock.
Move the offload command to be done before reducing the number of bands,
so unoffloading of the qdiscs that are about to be deleted could be done
outside of the lock.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
23f2b4048c mlxsw: spectrum: Update sch_prio stats to include sch_red related drops
sch_prio as root qdisc should count all the drops its children have. Since
it is possible for it to have sch_red children, it needs to count RED early
drops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:01 -05:00
Nogah Frankel
cc6e5c13af mlxsw: spectrum: qdiscs: Update backlog handling of a child qdiscs
When removing a child qdisc its backlog will be decreased from the parent
backlog. The driver backlog count should do the same.
When the parent changes its configuration, the child might need to clean
its stats. However, the backlog can't be cleaned with the rest of the
stats, because it reflects a momentary value that needs to be synced with
the core, not the history of the qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
04cc0bf5d6 mlxsw: spectrum: qdiscs: Collect stats for sch_red based on priomap
Priority counters count packets according to their packet priority.
Collect the stats for sch_red based on these counters, so the qdisc bstats
will be the sum of counters matching the priorities marked in the qdisc
priomap.
Changing the mapping of the priorities to bands while traffic is running
can result in losing the stats of the bands qdiscs from their last dump
call to this change, as if the qdisc was unoffloaded and re-offloaded. It
will not affect the traffic behaviour according to sch_red.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
1631ab2e8d mlxsw: spectrum: qdiscs: Add priority map per qdisc
Add priority map per qdisc, to indicate which priorities are being
directed through this qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
2f88047ec4 mlxsw: spectrum: Add priority counters
Add TX packets and bytes counters per switch priority per port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
Nogah Frankel
eed4baeb04 mlxsw: spectrum: qdiscs: Support qdisc per tclass
Add the option to set a qdisc per tclass.  Match the qdisc to the tclass by
parent ID. Supported currently for sch_red only.
It allows offloading sch_prio as root qdisc and sch_red as its child.
(However, doing so might corrupt the stats for both parent and child.)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 12:06:00 -05:00
David S. Miller
fb66cb0775 mlx5-update-2018-02-23 (IB representors)
From: Mark Bloch <markb@mellanox.com>
 =========
 Add IB representor when in switchdev mode
 
 The following series adds support for an IB (RAW Ethernet only) device
 representor which is created when the user switches to switchdev mode.
 
 Today when switching to switchdev mode the only representors which are
 created are net devices. Each netdev is a representor of a virtual
 function and any data sent via the representor is received on the virtual
 function, and any data sent via the virtual function is received by the
 representor.
 
 For the mlx5 driver the main use of this functionality is to be able to
 use Open vSwitch on the hypervisor in order to manage/control traffic
 from/to the virtual functions. Open vSwitch can also work with  DPDK
 devices and not just net devices, this series exposes an IB device, which
 Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.
 
 An IB device representor exposes only RAW Ethernet QP capabilities and
 the ability to create flow rules to direct traffic to its RX queues. The
 state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
 corresponding net device representor. No other RDMA/RoCE functionality is
 currently supported and no GID table is exposed.
 =========
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJakH7zAAoJEEg/ir3gV/o+c/MIAMGGgNajr49+JP3t9wnrs011
 +cTfAfM88HBzTlfb/COEBz+jurH2oB7ZF4RZC29S+6pR3loKKBuvbiPndE0XKjSg
 Ue4sOkawybmDvfo9ZiMsusOiMfTp5wsLmqJP1HRUvGMAlSBeriMTZfbiKzx5c3Ok
 X8cMnRIvUOtCoQaJTfKarDUn4OF8aFam4tQW8k/RAo77kTPyihb1NlGiblrcCA2E
 PWYAOWW3D8gvE0cr19JVgEqpKIaJ/VRyjwQ7m8XSvfBJtw1ZTO6YMXiXbWMOsRzD
 fx33H+n/qwJT0cnxDmSpZrR7mEk+Wr2HL92O85KDupOSgLOIlywmtIIkEAnCeaw=
 =Fq6m
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

mlx5-update-2018-02-23 (IB representors)

From: Mark Bloch <markb@mellanox.com>
=========
Add IB representor when in switchdev mode

The following series adds support for an IB (RAW Ethernet only) device
representor which is created when the user switches to switchdev mode.

Today when switching to switchdev mode the only representors which are
created are net devices. Each netdev is a representor of a virtual
function and any data sent via the representor is received on the virtual
function, and any data sent via the virtual function is received by the
representor.

For the mlx5 driver the main use of this functionality is to be able to
use Open vSwitch on the hypervisor in order to manage/control traffic
from/to the virtual functions. Open vSwitch can also work with  DPDK
devices and not just net devices, this series exposes an IB device, which
Mellanox PMD driver uses, which then can be used by Open vSwitch DPDK.

An IB device representor exposes only RAW Ethernet QP capabilities and
the ability to create flow rules to direct traffic to its RX queues. The
state of the IB device (ACTIVE/DOWN etc..) is based on the state of the
corresponding net device representor. No other RDMA/RoCE functionality is
currently supported and no GID table is exposed.
=========

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-28 09:54:54 -05:00
Tariq Toukan
a970d8dba5 net/mlx4_en: RX csum, pre-define enabled protocols for IP status masking
Pre-define a mask for IP status of a completion, that tests the
MLX4_CQE_STATUS_IPV6 only in case CONFIG_IPV6 is enabled.
Use it for IP status testing upon completion, instead of separating
the datapath into two flows.
This takes common code structures (such as closing parenthesis)
back to their original place, and makes code more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Tariq Toukan
1cb8b1216c net/mlx4_en: Combine checks of end-cases in RX completion function
Combine two end-cases in the same if statement with a single return value.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Eran Ben Elisha
4f32e1c4a9 net/mlx4_en: Remove unnecessary warn print in reset config
In mlx4_en_reset_config, there was a redundant warn print that was left
from previous versions of this function. No warn is needed anymore.

This warn can be confusing when RX-FCS is changed:
Turn OFF RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)
Turn ON RX-FCS:
  mlx4_en: eth1: Changing device configuration rx filter(0) rx vlan(1)

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Eran Ben Elisha
f26d0d2543 net/mlx4_en: Add physical RX/TX bytes/packets counters
Add physical RX/TX packets/bytes counters into ethtool output to monitor
all traffic that was received and transmitted on the port. These
counters are available only for none Virtual Function.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:53:26 -05:00
Petr Machata
8f08a528de mlxsw: spectrum_span: Support mirror to ip6gretap
Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap
netdevice.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:28 -05:00
Petr Machata
27cf76fe60 mlxsw: spectrum_span: Support mirror to gretap
When a user requests mirror from a mlxsw physical port (possibly based
on an ACL match) to a gretap netdevice, the driver needs to resolve the
request to a particular physical port that the mirrored packets will
egress through, and a suite of configuration keys (importantly, IP and
MAC addresses). That means calling into routing and neighbor kernel code
to simulate the decisions made by the system for packets passing through
a gretap netdevice.

Add a new instance of mlxsw_sp_span_entry_ops to support this.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:28 -05:00
Petr Machata
52a6444cda mlxsw: Move a mirroring check to mlxsw_sp_span_entry_create
The check for whether a mirror port (which is a mlxsw front panel port)
belongs to the same mlxsw instance as the mirrored port, is currently
only done in spectrum_acl, even though it's applicable for the matchall
case as well. Thus move it to mlxsw_sp_span_entry_create().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
803335acbe mlxsw: Handle config changes pertinent to SPAN
For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.

Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.

Call this function strategically at event handlers that might influence
the mirroring configuration.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
169b5d95c1 mlxsw: spectrum_span: Generalize SPAN support
To support mirroring to different device types, the functions that
partake in configuring the port analyzer need to be extended to admit
non-trivial SPAN types.

Create a structure where all details of SPAN configuration are kept,
struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops
to keep per-SPAN-type operations.

Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and
once for a suite of NOP callbacks used for invalidated SPAN entry. Put
the formet as a sole member of a new array mlxsw_sp_span_entry_types,
where all known SPAN types are kept. Introduce a new function,
mlxsw_sp_span_entry_ops(), to look up the right ops suite given a
netdevice.

Change mlxsw_sp_span_mirror_add() to use both parms and ops structures.
Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to
take these as arguments. Modify mlxsw_sp_span_entry_configure() and
mlxsw_sp_span_entry_deconfigure() to dispatch to ops.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
079c9f393b mlxsw: spectrum: Keep mirror netdev in mlxsw_sp_span_entry
Currently the only mirror action supported by mlxsw is mirror to another
mlxsw physical port. Correspondingly, span_entry, which tracks each
mlxsw mirror in the system, currently holds a u8 number of the
destination port.

To extend this system to mirror to gretap and ip6gretap netdevices, have
struct mlxsw_sp_span_entry actually hold the destination netdevice
itself.

This change then trickles down in obvious manner to SPAN module API and
mirror-related interfaces in struct mlxsw_afa_ops.

To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked
and the corresponding SPAN entry invalidated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
7b2ef81fd2 mlxsw: spectrum_span: Extract mlxsw_sp_span_entry_{de, }configure()
Configuring the hardware for encapsulated SPAN involves more code than
the simple mirroring case. Extract the related code to a separate
function to separate it from the rest of SPAN entry creation. Extract
deconfigure as well for symmetry, even though disablement is the same
regardless of SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
3546b03ffc mlxsw: spectrum_span: Initialize span_entry.id eagerly
It is known statically ahead of time which SPAN entry will have which
ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until
the entry is actually created. This simplifies some code in
mlxsw_sp_span_entry_create()

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:27 -05:00
Petr Machata
98977089d8 mlxsw: span: Remove span_entry by span_id
Instead of removing span_entry by the port number, allow removing by
SPAN id. That simplifies some code right here, and for mirroring to soft
netdevices, avoids problems with netdevice pointer invalidation and
reuse.

Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port()
and keep it--follow-up patches will make use of it.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata
1da93eb466 mlxsw: reg: Extend mlxsw_reg_mpat_pack()
To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field
to set the SPAN type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata
0d6cd3fcbc mlxsw: reg: Add SPAN encapsulation to MPAT register
MPAT Register is used to query and configure the Switch Port Analyzer
Table. To configure Port Analyzer to encapsulate mirrored packets,
additional fields need to be specified for the MPAT register.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata
8897207c89 mlxsw: spectrum_ipip: Support decoding IPv6 tunnel addresses
To support mirroring to ip6gretap, the SPAN module needs to be able to
decode IPv6 addresses specified at that tunnel.

Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to
support IPv6 addresses. To that end, add and publish a support function
mlxsw_sp_ipip_netdev_parms6().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:26 -05:00
Petr Machata
7e58a6c662 mlxsw: spectrum_ipip: Extract mlxsw_sp_l3addr_is_zero
Extract the logic for determining whether a given IPv4/IPv6 address is
all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function.
Make that function public within the module.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-27 14:46:25 -05:00
Arnd Bergmann
ed2da6270e mlxsw: spectrum_kvdl: avoid uninitialized variable warning
gcc warns that 'resource_id' is not initialized if we don't come though
any of the three 'case' statements before:

drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c: In function 'mlxsw_sp_kvdl_part_init':
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:275:8: error: 'resource_id' may be used uninitialized in this function [-Werror=maybe-uninitialized]

In the current code, that won't happen, but it's more robust to explicitly
handle this by returning a failure from mlxsw_sp_kvdl_part_init.

Fixes: 887839e696 ("mlxsw: spectrum_kvdl: Add support for dynamic partition set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26 11:33:02 -05:00
Arnd Bergmann
b89c7695b1 mlxsw: spectrum_kvdl: use div_u64() for 64-bit division
Calculating the number of entries now uses 64-bit arithmetic that
causes a link error on 32-bit architectures:

drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.o: In function `mlxsw_sp_kvdl_init':
spectrum_kvdl.c:(.text+0x51c): undefined reference to `__aeabi_uldivmod'

We could probably use a 32-bit division here as before, but since this is
not in a performance critical path, div_u64() seems cleaner here.

Fixes: 887839e696 ("mlxsw: spectrum_kvdl: Add support for dynamic partition set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26 11:33:02 -05:00
Ido Schimmel
65b53bfd49 mlxsw: spectrum_switchdev: Allow port enslavement to a VLAN-unaware bridge
Up until now we only allowed VLAN devices to be put in a VLAN-unaware
bridge, but some users need the ability to enslave physical ports as
well.

This is achieved by mapping the port and VID 1 to the bridge's vFID,
instead of the port and the VID used by the VLAN device.

The above is valid because as long as the port is not enslaved to a
bridge, VID 1 is guaranteed to be configured as PVID and egress
untagged.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-26 11:12:26 -05:00
David S. Miller
f74290fdb3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-24 00:04:20 -05:00
Mark Bloch
c5447c7059 net/mlx5: E-Switch, Reload IB interface when switching devlink modes
Up until this point it wasn't possible to activate IB representors
when switching to switchdev mode, remove this limitation.

We trigger reload of the PF IB interface in order to make sure that
already allocated resources are invalid and new resources will be opened
correctly with all the limitations of switchdev mode applied (only raw
packet capabilities, without RoCE). We also move the remove/add to a
place where the E-Switch mode is set/unset to better control when to
trigger this action, this will allow the IB side to start in the correct
mode.

For better code reuse, create a function which reloads an interface and
export it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:39 -08:00
Mark Bloch
f80be5436d net/mlx5: E-Switch, Optimize HW steering tables in switchdev mode
Under switchdev mode we insert an eswitch miss rule causing any
unmatched traffic to be sent towards the PF vport. This miss rule can
be optimized if we break it to two, one case is for multicast traffic and
the other for unicast.

Breaking the miss rule into two (unicast and multicast) allows the firmware
to program the hardware in a more efficient way.

Using ConncetX-5 Ex with IXIA and testpmd (which use IB representors):

IXIA -> NIC -> PF -> IB representor -> NIC -> VF:
    - Without this optimization: 9.2 MPPS.
    - With this optimization: 18 MPPS.

VF -> NIC -> IB representor-> PF -> NIC -> IXIA:
    - Without this optimization: 17 MPPS.
    - With this optimization: 23.4 MPPS.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:38 -08:00
Mark Bloch
cd3d07e7db net/mlx5: E-Switch, Increase number of FTEs in FDB in switchdev mode
The max FTE number should be the max number of SQs that can be opened.
Ethernet representors open one SQ each. Once we add IB representor this
will increase (depends on the user). For now lets start with 31
per IB representor and if needed increase in the future.

This increase only affects the number of FTEs in the slow path FDB,
offloaded rules (done via TC on the fast path portion of the FDB)
aren't affected.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:38 -08:00
Mark Bloch
57cbd893c4 net/mlx5: E-Switch, Move representors definition to a global scope
In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:38 -08:00
Mark Bloch
22215908d8 net/mlx5: E-Switch, Add callback to get representor device
Add a callback interface to get a protocol device (per representor type).
The Ethernet representors will expose their netdev via this interface.

This functionality can be later used by IB representor in order to find the
corresponding net device representor.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-23 12:36:38 -08:00
David S. Miller
f4af1db48b mlx5-updates-2018-02-21
This series includes shared code updates for mlx5 core driver for both
 netdev and rdma subsystems.
 
 By Saeed,
 First six patches of the series are meant to address a performance issue
 and should provide a performance boost for multi core IRQ interrupt hungry
 workloads.  The issue is fixed in the first patch, all other patches are
 meant to refactor the code in light of this fix.
 
 The problem it comes to fix, is a shared spinlock accessed across all HCA
 IRQs which protects the CQ database.  To solve this we simply move the CQ
 database and its spinlock to be per EQ (IRQ), thus per core.
 
 By Yonatan,
 Fragmented completion queue (CQ) for RDMA,
 core driver implementation to create fragmented CQ buffers rather than
 one large contiguous memory buffer, the implementation scheme already
 exist and used by the netdev CQs, the patch shares that code with the
 rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJajcwxAAoJEEg/ir3gV/o+uJAIAICP2LBu5ANBgX7Z7/B0IZX4
 Abkfk+IKR44v+3t0BBgEAZI2A9LcTQ8pwIOeLCDpcKd78665Y7rYQL1p31WuPKc5
 1WwwBesjQLAmdiNLIjUnjW1fd3YyOmqLBcaMMYIeF0m4m4pCQue1kwtxesQrhdZV
 I7FBxxnQt/8o6hoh/5nmU4DzGyZaY+uR/4FxUCEreklfhNeI5M2DZKe1xQCmhsJ4
 Tosh/cJ8kSBJfclH6hk1lh5eWGDYDltWCpY86KBWsYktl10VAE3P7LPmekn487ta
 e6cL1NPoEHbp+/UBThkNZMzUnA7wpoNeDbTTtVZrdcL1KivwColt1r3pmS1bqEQ=
 =DlfW
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-02-21

This series includes shared code updates for mlx5 core driver for both
netdev and rdma subsystems.

By Saeed,
First six patches of the series are meant to address a performance issue
and should provide a performance boost for multi core IRQ interrupt hungry
workloads.  The issue is fixed in the first patch, all other patches are
meant to refactor the code in light of this fix.

The problem it comes to fix, is a shared spinlock accessed across all HCA
IRQs which protects the CQ database.  To solve this we simply move the CQ
database and its spinlock to be per EQ (IRQ), thus per core.

By Yonatan,
Fragmented completion queue (CQ) for RDMA,
core driver implementation to create fragmented CQ buffers rather than
one large contiguous memory buffer, the implementation scheme already
exist and used by the netdev CQs, the patch shares that code with the
rdma CQ creation flow and makes use of the new API in mlx5_ib driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-22 14:38:38 -05:00
David S. Miller
9c4ff2a9ec mlx5-fixes-2018-02-20
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJajJahAAoJEEg/ir3gV/o+PTQIALd4/l5VEU1vWqXp6PPaIaRP
 fCt4CKXkeIozyOhkiaYLcNAvPnsJRi54a/PGyUbHwxadkjuOhS8WETyQwMf8gbKF
 g+LTDseqKjWcu6fUMqe7TmjrNHSo2VErjOjHw1+7emKSkBZuc8zBBND9PqHtOQY6
 f1KoUY6IBs22dwqzynxkC5A62VQbTYd0FPnf1n9wN8ubFXSb37KUgKoW4kmMFmLe
 JsYEcsAKxpfepnP+EYLyDrVcT1LA9edi5DI5uxSfNgas6bTtjRPhtyZCznDF8rly
 jEWv3O2c+HhchuysoACTLiTllviHGj0Myv+rGb0ouhjZWMtulgPPQ04lbYjjr84=
 =uF0v
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2018-02-20

The following pull request includes some fixes for the mlx5 core and
netdevice driver.

Please pull and let me know if there's any issue.

-stable 4.10.y:
('net/mlx5e: Fix loopback self test when GRO is off')

-stable 4.12.y:
('net/mlx5e: Specify numa node when allocating drop rq')

-stable 4.13.y:
('net/mlx5e: Verify inline header size do not exceed SKB linear size')

-stable 4.15.y:
('net/mlx5e: Fix TCP checksum in LRO buffers')
('net/mlx5: Fix error handling when adding flow rules')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-21 14:57:35 -05:00
Vlad Buslov
9238e380e8 net/mlx5: Fix error handling when adding flow rules
If building match list or adding existing fg fails when
node is locked, function returned without unlocking it.
This happened if node version changed or adding existing fg
returned with EAGAIN after jumping to search_again_locked label.

Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:53:00 -08:00
Eugenia Emantayev
26a0f6e829 net/mlx5: E-Switch, Fix drop counters use before creation
First use of drop counters happens in esw_apply_vport_conf function,
while they are allocated later in the flow. Fix that by moving
esw_vport_create_drop_counters function to be called before the first use.

Fixes: b8a0dbe3a9 ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:53:00 -08:00
Or Gerlitz
96de67a772 net/mlx5: Add header re-write to the checks for conflicting actions
We can't allow only some of the rules sharing an FTE to ask for
header re-write, add it to the conflicting action checks.

Fixes: 0d235c3fab ('net/mlx5: Add hash table to search FTEs in a flow-group')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:59 -08:00
Daniel Jurgens
c67f100eda net/mlx5: Use 128B cacheline size for 128B or larger cachelines
The adapter uses the cache_line_128byte setting to set the bounds for
end padding. On systems where the cacheline size is greater than 128B
use 128B instead of the default of 64B. This results in fewer partial
cacheline writes. There's a 50% chance it will pad to the end of a 256B
cache line vs only 25% when using 64B.

Fixes: f32f5bd2eb ("net/mlx5: Configure cache line size for start and end padding")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:58 -08:00
Gal Pressman
2f0db87901 net/mlx5e: Specify numa node when allocating drop rq
When allocating a drop rq, no numa node is explicitly set which means
allocations are done on node zero. This is not necessarily the nearest
numa node to the HCA, and even worse, might even be a memoryless numa
node.

Choose the numa_node given to us by the pci device in order to properly
allocate the coherent dma memory instead of assuming zero is valid.

Fixes: 556dd1b9c3 ("net/mlx5e: Set drop RQ's necessary parameters only")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:58 -08:00
Or Gerlitz
001a2fc0c8 net/mlx5e: Return error if prio is specified when offloading eswitch vlan push
This isn't supported when we emulate eswitch vlan push action which
is the current state of things.

Fixes: 8b32580df1 ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:57 -08:00
Or Gerlitz
4f5c02f949 net/mlx5: Address static checker warnings on non-constant initializers
Address these sparse warnings on drivers/net/ethernet/mellanox/mlx5

[..]/core/diag/fs_tracepoint.c:99:53: warning: non-constant initializer for static object
[..]/core/diag/fs_tracepoint.c:102:53: warning: non-constant initializer for static object

etc

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:56 -08:00
Or Gerlitz
9afe9a5353 net/mlx5e: Eliminate build warnings on no previous prototype
Fix these gcc warnings on drivers/net/ethernet/mellanox/mlx5:

[..]/core/lib/clock.c:454:6: warning: no previous prototype for 'mlx5_init_clock' [-Wmissing-prototypes]
[..]/core/lib/clock.c:510:6: warning: no previous prototype for 'mlx5_cleanup_clock' [-Wmissing-prototypes]
[..]/core/en_main.c:3141:5: warning: no previous prototype for 'mlx5e_setup_tc' [-Wmissing-prototypes]

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:56 -08:00
Eran Ben Elisha
f600c60880 net/mlx5e: Verify inline header size do not exceed SKB linear size
Driver tries to copy at least MLX5E_MIN_INLINE bytes into the control
segment of the WQE. It assumes that the linear part contains at least
MLX5E_MIN_INLINE bytes, which can be wrong.

Cited commit verified that driver will not copy more bytes into the
inline header part that the actual size of the packet. Re-factor this
check to make sure we do not exceed the linear part as well.

This fix is aligned with the current driver's assumption that the entire
L2 will be present in the linear part of the SKB.

Fixes: 6aace17e64 ("net/mlx5e: Fix inline header size for small packets")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:55 -08:00
Inbar Karmy
ef7a3518f7 net/mlx5e: Fix loopback self test when GRO is off
When GRO is off, the transport header pointer in sk_buff is
initialized to network's header.

To find the udp header, instead of using udp_hdr() which assumes
skb_network_header was set, manually calculate the udp header offset.

Fixes: 0952da791c ("net/mlx5e: Add support for loopback selftest")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:54 -08:00
Gal Pressman
8babd44d20 net/mlx5e: Fix TCP checksum in LRO buffers
When receiving an LRO packet, the checksum field is set by the hardware
to the checksum of the first coalesced packet. Obviously, this checksum
is not valid for the merged LRO packet and should be fixed.  We can use
the CQE checksum which covers the checksum of the entire merged packet
TCP payload to help us calculate the checksum incrementally.

Tested by sending IPv4/6 traffic with LRO enabled, RX checksum disabled
and watching nstat checksum error counters (in addition to the obvious
bandwidth drop caused by checksum errors).

This bug is usually "hidden" since LRO packets would go through the
CHECKSUM_UNNECESSARY flow which does not validate the packet checksum.

It's important to note that previous to this patch, LRO packets provided
with CHECKSUM_UNNECESSARY are indeed packets with a correct validated
checksum (even though the checksum inside the TCP header is incorrect),
since the hardware LRO aggregation is terminated upon receiving a packet
with bad checksum.

Fixes: e586b3b0ba ("net/mlx5: Ethernet Datapath files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-20 12:52:54 -08:00
Arkadi Sharshevsky
7f47b19bd7 mlxsw: spectrum_kvdl: Add support for per part occupancy
Add support for calculating occupancy for separate kvdl parts.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20 13:38:56 -05:00
Arkadi Sharshevsky
887839e696 mlxsw: spectrum_kvdl: Add support for dynamic partition set
Add support for dynamic partition set via the resource interface.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20 13:38:55 -05:00
Arkadi Sharshevsky
51d3c08e33 mlxsw: spectrum_kvdl: Add support for linear division resources
The linear part of the KVD memory is sub-divided into multiple parts. This
patch exposes this internal partitions via the resource interface.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20 13:38:55 -05:00
Arkadi Sharshevsky
4f4bbf7c4e devlink: Perform cleanup of resource_set cb
After adding size validation logic into core cleanup is required.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-20 13:38:54 -05:00
David S. Miller
f5c0c6f429 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-02-19 18:46:11 -05:00
Ido Schimmel
d1c95af366 mlxsw: spectrum_router: Do not unconditionally clear route offload indication
When mlxsw replaces (or deletes) a route it removes the offload
indication from the replaced route. This is problematic for IPv4 routes,
as the offload indication is stored in the fib_info which is usually
shared between multiple routes.

Instead of unconditionally clearing the offload indication, only clear
it if no other route is using the fib_info.

Fixes: 3984d1a89f ("mlxsw: spectrum_router: Provide offload indication using nexthop flags")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-19 11:21:08 -05:00
Yonatan Cohen
388ca8be00 IB/mlx5: Implement fragmented completion queue (CQ)
The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-02-15 00:30:03 -08:00
Saeed Mahameed
3ec5693b17 net/mlx5: Remove redundant EQ API exports
EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.

Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:30:02 -08:00
Saeed Mahameed
3ac7afdbcf net/mlx5: Move CQ completion and event forwarding logic to eq.c
Since CQ tree is now per EQ, CQ completion and event forwarding became
specific implementation of EQ logic, this patch moves that logic to eq.c
and makes those functions static.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:30:02 -08:00
Saeed Mahameed
f105b45bf7 net/mlx5: CQ hold/put API
Now as the CQ table is per EQ, add an API to hold/put CQ to be used from
eq.c in downstream patch.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:30:01 -08:00
Saeed Mahameed
d5c07157dd net/mlx5: EQ add/del CQ API
Add API to add/del CQ to/from EQs CQ table to be used in cq.c upon CQ
creation/destruction, as CQ table is now private to eq.c.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:30:00 -08:00
Saeed Mahameed
d2ff4fa575 net/mlx5: Add missing likely/unlikely hints to cq events
If a hardware event is targeting a CQ, that CQ should exist.
Add unlikely to error handling flows.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:30:00 -08:00
Saeed Mahameed
02d92f7903 net/mlx5: CQ Database per EQ
Before this patch the driver had one CQ database protected via one
spinlock, this spinlock is meant to synchronize between CQ
adding/removing and CQ IRQ interrupt handling.

On a system with large number of CPUs and on a work load that requires
lots of interrupts, this global spinlock becomes a very nasty hotspot
and introduces a contention between the active cores, which will
significantly hurt performance and becomes a bottleneck that prevents
seamless cpu scaling.

To solve this we simply move the CQ database and its spinlock to be per
EQ (IRQ), thus per core.

Tested with:
system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M

WITHOUT THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41

Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
Overhead  Command          Shared Object                 Symbol
+   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
+   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit

WITH THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69

Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
Overhead  Command          Shared Object             Symbol
+   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
+    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit

Tested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-02-15 00:29:54 -08:00
Arkadi Sharshevsky
6c677750f2 mlxsw: spectrum: Use NL_SET_ERR_MSG_MOD
Use NL_SET_ERR_MSG_MOD helper which adds the module name instead
of specifying the prefix each time.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:27:20 -05:00
Petr Machata
a629ef210d mlxsw: spectrum: Move SPAN code to separate module
For the upcoming work on SPAN, it makes sense to move the current code
to a module of its own. It already has a well-defined API boundary to
the mirror management (which is used from matchall and ACL code). A
couple more functions need to be exported for the functions that
spectrum.c needs to use for MTU handling and subsystem init/fini.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:26:25 -05:00
Petr Machata
ce470b44e2 mlxsw: spectrum: Drop struct span_entry.used
The member ref_count already determines whether a given SPAN entry is
used, and is as easy to use as a dedicated boolean.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:26:25 -05:00
Petr Machata
306a934e5b mlxsw: spectrum: Fix a coding style nit
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:26:25 -05:00
Petr Machata
e437f3b62d mlxsw: spectrum: Distinguish between IPv4/6 tunnels
struct ip_tunnel_parm, where GRE and several other tunnel types hold
information, is IPv4-specific. The current router / ipip code in mlxsw
however uses it as if it were generic.

Make it clear that it's not. Rename many functions from _params_ to
_params4_. mlxsw_sp_ipip_parms_saddr() and _daddr() take a proto
argument to dispatch on it. Move the dispatch logic to
mlxsw_sp_ipip_netdev_saddr() and _daddr(), and replace with
single-protocol functions.

In struct mlxsw_sp_ipip_entry, move the "parms" field to a (for the time
being, singleton) union. Update users throughout.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:24:28 -05:00
Petr Machata
fe735a3d2c mlxsw: spectrum_ipip: Add a forgotten include
struct ip_tunnel_parm, which is used in spectrum_ipip.h, is defined in
if_tunnel.h. However, the former neglects to include the latter.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:24:28 -05:00
Jiri Pirko
0f2d2b2736 mlxsw: spectrum_router: Fix error path in mlxsw_sp_vr_create
Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:

[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]

Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.

Fixes: 76610ebbde ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13 12:22:29 -05:00
Linus Torvalds
2246edfaf8 Second pull request for 4.16 merge window
- Clean up some function signatures in rxe for clarity
 - Tidy the RDMA netlink header to remove unimplemented constants
 - bnxt_re driver fixes, one is a regression this window.
 - Minor hns driver fixes
 - Various fixes from Dan Carpenter and his tool
 - Fix IRQ cleanup race in HFI1
 - HF1 performance optimizations and a fix to report counters in the right units
 - Fix for an IPoIB startup sequence race with the external manager
 - Oops fix for the new kabi path
 - Endian cleanups for hns
 - Fix for mlx5 related to the new automatic affinity support
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaePL/AAoJELgmozMOVy/dqhsQALUhzDuuJ+/F6supjmyqZG53
 Ak/PoFjTmHToGQfDq/1TRzyKwMx12aB2l6WGZc31FzhvCw4daPWkoEVKReNWUUJ+
 fmESxjLgo8ZRGSqpNxn9Q8agE/I/5JZQoA8bCFCYgdZPKTPNKdtAVBphpdhmrOX4
 ygjABikWf/wBsNF1A8lnX9xkfPO21cPHrFQLTnuOzOT/hc6U+PPklHSQCnS91svh
 1+Pqjtssg54rxYkJqiFq3giSnfwvmAXO8WyVGmRRPFGLpB0nIjq0Sl6ZgLLClz7w
 YJdiBGr7rlnNMgGCjlPU2ZO3lO6J0ytXQzFNqRqvKryXQOv+uVeJgep7WqHTcdQU
 UN30FCKQMgLL/F6NF8wKaKcK4X0VgXQa7gpuH2fVSXF0c3LO3/mmWNjixbGSzT2c
 Wj+EW3eOKlTddhRLhgbMOdwc32tIGhaD85z2F4+FZO+XI9ZQtJaDewWVDjYoumP/
 RlDIFw+KCgSq7+UZL8CoXuh0BuS1nu9TGfkx1HW0DLMF1+yigNiswpUfksV4cISP
 JqE2I3yH0A4UobD/a+f9IhIfk2MjxO0tJWNjU8IA9LXgUFlskQ6MpH/AcE9G8JNv
 tlfLGR3s4PJa/7j/Iy2F84og/b/KH8v7vyj4Eknq/hLq63/BiM5wj0AUBRrGulN6
 HhAMOegxGZ7IKP/y0L7I
 =xwZz
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull more rdma updates from Doug Ledford:
 "Items of note:

   - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
     worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
     4.15. The fix is here.

   - one of the patches (the endian notation patch from Lijun) looks
     like a lot of lines of change, but it's mostly mechanical in
     nature. It amounts to the biggest chunk of change in it (it's about
     2/3rds of the overall pull request).

  Summary:

   - Clean up some function signatures in rxe for clarity

   - Tidy the RDMA netlink header to remove unimplemented constants

   - bnxt_re driver fixes, one is a regression this window.

   - Minor hns driver fixes

   - Various fixes from Dan Carpenter and his tool

   - Fix IRQ cleanup race in HFI1

   - HF1 performance optimizations and a fix to report counters in the right units

   - Fix for an IPoIB startup sequence race with the external manager

   - Oops fix for the new kabi path

   - Endian cleanups for hns

   - Fix for mlx5 related to the new automatic affinity support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
  net/mlx5: increase async EQ to avoid EQ overrun
  mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
  RDMA/hns: Fix the endian problem for hns
  IB/uverbs: Use the standard kConfig format for experimental
  IB: Update references to libibverbs
  IB/hfi1: Add 16B rcvhdr trace support
  IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
  IB/core: Avoid a potential OOPs for an unused optional parameter
  IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
  IB/ipoib: Fix for potential no-carrier state
  IB/hfi1: Show fault stats in both TX and RX directions
  IB/hfi1: Remove blind constants from 16B update
  IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
  IB/hfi1: Do not override given pcie_pset value
  IB/hfi1: Optimize process_receive_ib()
  IB/hfi1: Remove unnecessary fecn and becn fields
  IB/hfi1: Look up ibport using a pointer in receive path
  IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
  IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
  IB/hfi1: Remove dependence on qp->s_hdrwords
  ...
2018-02-06 11:09:45 -08:00
Max Gurtovoy
03ecdd2dcf net/mlx5: increase async EQ to avoid EQ overrun
Currently the async EQ has 256 entries only. It might not be big enough
for the SW to handle all the needed pending events. For example, in case
of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
and the target goes down, the FW will raise 1024 "last WQE reached" events
and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
it the FW should be able to delay the event and raise it later on using internal
backpressure mechanism.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-02-05 10:58:25 -05:00
Linus Torvalds
b2fe5fa686 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Significantly shrink the core networking routing structures. Result
    of http://vger.kernel.org/~davem/seoul2017_netdev_keynote.pdf

 2) Add netdevsim driver for testing various offloads, from Jakub
    Kicinski.

 3) Support cross-chip FDB operations in DSA, from Vivien Didelot.

 4) Add a 2nd listener hash table for TCP, similar to what was done for
    UDP. From Martin KaFai Lau.

 5) Add eBPF based queue selection to tun, from Jason Wang.

 6) Lockless qdisc support, from John Fastabend.

 7) SCTP stream interleave support, from Xin Long.

 8) Smoother TCP receive autotuning, from Eric Dumazet.

 9) Lots of erspan tunneling enhancements, from William Tu.

10) Add true function call support to BPF, from Alexei Starovoitov.

11) Add explicit support for GRO HW offloading, from Michael Chan.

12) Support extack generation in more netlink subsystems. From Alexander
    Aring, Quentin Monnet, and Jakub Kicinski.

13) Add 1000BaseX, flow control, and EEE support to mvneta driver. From
    Russell King.

14) Add flow table abstraction to netfilter, from Pablo Neira Ayuso.

15) Many improvements and simplifications to the NFP driver bpf JIT,
    from Jakub Kicinski.

16) Support for ipv6 non-equal cost multipath routing, from Ido
    Schimmel.

17) Add resource abstration to devlink, from Arkadi Sharshevsky.

18) Packet scheduler classifier shared filter block support, from Jiri
    Pirko.

19) Avoid locking in act_csum, from Davide Caratti.

20) devinet_ioctl() simplifications from Al viro.

21) More TCP bpf improvements from Lawrence Brakmo.

22) Add support for onlink ipv6 route flag, similar to ipv4, from David
    Ahern.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1925 commits)
  tls: Add support for encryption using async offload accelerator
  ip6mr: fix stale iterator
  net/sched: kconfig: Remove blank help texts
  openvswitch: meter: Use 64-bit arithmetic instead of 32-bit
  tcp_nv: fix potential integer overflow in tcpnv_acked
  r8169: fix RTL8168EP take too long to complete driver initialization.
  qmi_wwan: Add support for Quectel EP06
  rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK
  ipmr: Fix ptrdiff_t print formatting
  ibmvnic: Wait for device response when changing MAC
  qlcnic: fix deadlock bug
  tcp: release sk_frag.page in tcp_disconnect
  ipv4: Get the address of interface correctly.
  net_sched: gen_estimator: fix lockdep splat
  net: macb: Handle HRESP error
  net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
  ipv6: addrconf: break critical section in addrconf_verify_rtnl()
  ipv6: change route cache aging logic
  i40e/i40evf: Update DESC_NEEDED value to reflect larger value
  bnxt_en: cleanup DIM work on device shutdown
  ...
2018-01-31 14:31:10 -08:00
Jason Gunthorpe
e7996a9a77 Linux 4.15
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
 CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
 oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
 xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
 hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
 oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
 =sOml
 -----END PGP SIGNATURE-----

Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git

To resolve conflicts in:
 drivers/infiniband/hw/mlx5/main.c
 drivers/infiniband/hw/mlx5/qp.c

From patches merged into the -rc cycle. The conflict resolution matches
what linux-next has been carrying.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-30 09:30:00 -07:00
Gal Pressman
468330e886 net/mlx5e: IPoIB, Fix copy-paste bug in flow steering refactoring
On TTC table creation, the indirection TIRs should be used instead of
the inner indirection TIRs.

Fixes: 1ae1df3a11 ("net/mlx5e: Refactor RSS related objects and code")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-29 14:24:48 -05:00
Jakub Kicinski
15f4edb3d9 mlxsw: use tc_cls_can_offload_and_chain0()
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 21:23:09 -05:00
Jakub Kicinski
9ab88e83fd mlx5: use tc_cls_can_offload_and_chain0()
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25 21:23:08 -05:00
David S. Miller
955bd1d216 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 23:44:15 -05:00
Yuval Mintz
1ecdaea02c mlxsw: spectrum_router: Don't log an error on missing neighbor
Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.

Simply remove the error message and gracefully accept such events.

Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24 10:58:22 -05:00
Ido Schimmel
2b52ce02e1 mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree
In commit fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.

However, this optimization had to be removed in commit a69518cf0b
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.

Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.

To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22 09:22:11 -05:00
Ido Schimmel
3aad95df92 mlxsw: spectrum_router: Pass FIB node to LPM tree unlink function
Next patch will try to optimize the LPM tree and make sure only used
prefix lengths are present, to avoid unnecessary look-ups.

Pass the currently removed FIB node to the unlinking function as its
associated prefix length is a potential candidate for removal from the
tree.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22 09:22:11 -05:00
Ido Schimmel
4fd003125f mlxsw: spectrum_router: Use the nodes list as indication for empty FIB
Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.

Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-22 09:22:10 -05:00
Arkadi Sharshevsky
d0d13c1858 mlxsw: spectrum_acl: Add support for mirror action
Add support for mirror action. Only one mirror action can be set per rule.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Arkadi Sharshevsky
7928756cd0 mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for Spectrum
Introduce extension of mlxsw_afa_ops in order to add/del mirroring and
implement the ops for Spectrum.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Arkadi Sharshevsky
5c8d39c99a mlxsw: spectrum: Extend and export SPAN API
Extend SPAN API for ACL case. In case of ACL triggering the MPAR register
shouldn't be configured. This patch also export those helpers for
ACL usage.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Arkadi Sharshevsky
db0553b261 mlxsw: spectrum_acl: Add support for mirroring action
The patch extends the trap action for mirroring.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Jiri Pirko
c18c1e186b mlxsw: core: Make counter index allocated inside the action append
So far, the caller of mlxsw_afa_block_append_counter needed to allocate
counter index by hand. Benefit from the previously introduced resource
infra and counter_index_get/put callbacks, and allocate the counter
index in place where it is needed, inside the action append function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Jiri Pirko
140ce42121 mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list
Since the resource list needs to be used also for other entries different
to fwd_entry_ref, make the list generic. For that purpose, introduce a
resource structure with couple of helpers that the code which need to
store a per-block resource should use.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
Jiri Pirko
4c6b7f6307 mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for Spectrum
Introduce extension of mlxsw_afa_ops in order to get/put counter indexes
and implement the ops for Spectrum.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:21:30 -05:00
David S. Miller
f9b6ae29ae mlx5-updates-2018-01-19
From: Or Gerlitz <ogerlitz@mellanox.com>
 =======
 First six patches of this series further enhances the mlx5 hairpin support.
 The first two patches deal with using different hairpin instances
 for flows whose packets have different priorities to align with the port
 TX QoS model. The next four patches allow us to do HW spreading
 of flows over a set of hairpin pairs using RSS. The last two patches
 change the driver to also set the size of the HW hairpin queues.
 ========
 
 Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
 Add more debug data for TX timeout handling, and further enhance and optimize
 TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
 polling EQ in case of a TX timeout in order to recover from a lost interrupt.
 If this is not the case (no pending EQEs), perform a channels full recovery as
 usual.
 
 From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
 to have an update_stats() callback which will be used to fetch the hardware or
 software counters data, this will improve the current API and reduce code
 duplication.
 
 From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
 flow.
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaYmJ7AAoJEEg/ir3gV/o+SBIH/2VUS1RPNQfgv1pU2WI78Me3
 2Iy8zA2fyx5Ko28Kzu/QljlIUs5/4K0rIjDoT5NpmkLf22lWB3QSqyCkqducdl6J
 6kAwZ5kPfA8r1jnhlQfhy5VLhaoNhcHeMafXwP9jSy3BvCYRWTQOsp8fN6fBcSX5
 s75DcmqE5ljm5b2Y9pIVkYdzz3usQ9/kq+5MPcZodw3XISuXgv1UPlkon4DdVsXQ
 7zf3frQnctfTAepkflWMm8BGUO15a4uYfhrMUyvOFHihtIWA9ILEpUC7qdn3uQ6X
 phNA6BRoo9bdo7ZSl8+ZsCi+K4LtyevQTHs7Hn/OMSMvxvDLWZswldu2mbg9xVQ=
 =YP/D
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2018-01-19

From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========

Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.

From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.

From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:13:23 -05:00
Talat Batheesh
e58edaa486 net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare
Helmut reported a bug about division by zero while
running traffic and doing physical cable pull test.

When the cable unplugged the ppms become zero, so when
dividing the current ppms by the previous ppms in the
next dim iteration there is division by zero.

This patch prevent this division for both ppms and epms.

Fixes: c3164d2fc4 ("net/mlx5e: Added BW check for DIM decision mechanism")
Reported-by: Helmut Grauer <helmut.grauer@de.ibm.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21 18:06:33 -05:00
David S. Miller
8565d26bcb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The BPF verifier conflict was some minor contextual issue.

The TUN conflict was less trivial.  Cong Wang fixed a memory leak of
tfile->tx_array in 'net'.  This is an skb_array.  But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19 22:59:33 -05:00
Yuval Mintz
fd5204cdfb mlxsw: spectrum: Upper-bound supported FW version
During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19 15:45:56 -05:00
Gal Pressman
63a612f984 net/mlx5e: Add likely to the common RX checksum flow
Most of the packets return true for is_last_ethertype_ip, surround it
with likely compiler hint.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Kamal Heib
1938617735 net/mlx5e: Extend the stats group API to have update_stats()
Extend the stats group API to have an update_stats() callback which
will be used to fetch the hardware or software counters data.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Kamal Heib
a89842811e net/mlx5e: Merge per priority stats groups
Merge the per priority traffic and pfc groups into one group, because
both groups share the same update_stats() callback which will be
introduced in the upcoming patch.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Eran Ben Elisha
57d689a8ca net/mlx5e: Add per-channel counters infrastructure, use it upon TX timeout
Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt
recovery actions happened upon TX timeouts.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Eran Ben Elisha
7ca560b5af net/mlx5e: Poll event queue upon TX timeout before performing full channels recovery
Up until this patch, on every TX timeout we would try to do channels
recovery.  However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).

This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.

Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue.  One of the above actions will move the queue to be ready for
transmit again.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Eran Ben Elisha
3a32b26a4e net/mlx5e: Add Event Queue meta data info for TX timeout logs
When TX timeout occurs, EQ consumer index and irqn can help in debug for
understanding the SW state of EQ. Add them to the logger prints for the
relevant EQ only.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Eran Ben Elisha
8499094526 net/mlx5e: Print delta since last transmit per SQ upon TX timeout
When driver callback for TX timeout is being called, it handles all
stopped xmit queues (not only the ones which their timeout expired).
Add usecs since last transmit to TX timeout logs per send queue in order
to monitor if the queue timeout expired.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:33 +02:00
Or Gerlitz
eb9180f792 net/mlx5e: Set hairpin queue size
For a given hairpin packet buffer size, different queue sizes
(values of log_hairpin_num_packets) determine how the data is broken
to strides on the RQ. Currently the chosen value is set to 64B strides.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
4d533e0f86 net/mlx5: Enable setting hairpin queue size
Allow to specify the size of the hairpin queues along with the
packet buffer data size from the core setup code.

If the driver doesn't provide this, the FW applies proper value that
matches the provided data size and a FW chosen RQ stride size.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
3f6d08d196 net/mlx5e: Add RSS support for hairpin
Support RSS for hairpin traffic. We create multiple hairpin RQ/SQ pairs
and RSS TTC table per hairpin instance and steer the related flows
through that table so they are spread between the pairs.

We open one pair per 50Gbs link speed, for all speeds <= 50Gbs, there
is one pair and no RSS while for 100Gbs ports two RSSed pairs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
ddae74ac10 net/mlx5: Vectorize the low level core hairpin object
Enhance the hairpin setup code at the core to support a set of N
(RQ,SQ) pairs. This will be later used by the caller to set RSS
spreading among the different RQs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
479f074c5b net/mlx5e: Enlarge the NIC TC offload steering prio to support two levels
This will allow to be able and set TC rule whose steering dest is
RSS TTC steering table.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
1ae1df3a11 net/mlx5e: Refactor RSS related objects and code
In order to use RSS for hairpin, we refactor the code that deals with
setup of the TTC steering tables. This is done using an interim ttc
params object that has the flow table attributes, TIR numbers, etc.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
106be53b6b net/mlx5e: Set per priority hairpin pairs
As part of the QoS model, on xmit, the HW mandates that all packets going
through a given SQ have the same priority. To align hairpin SQs with that,
we use the priority given as part of the matching for the hairpin hash key.

This ensures that flows/packets mapped to different HW priorities will
go through different hairpin instances. If no priority is given for
matching, we treat that as an 8th priority, this is in order not to
harm cases where priority is specified.

Only the PCP priority trust model is supported.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Or Gerlitz
d882286853 net/mlx5e: Use vhca id as the hairpin peer identifier
The peer vhca id spans less bits vs the ifindex and can
well serve for the hairpin hash key, move to use that.

This is a pre-step to put more info into the hairpin hash
key in downstream patch while keeping it at 32 bits.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-19 22:41:32 +02:00
Wei Yongjun
8df1d08bf2 mlxsw: spectrum: Make function mlxsw_sp_kvdl_part_occ() static
Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
 symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19 14:35:32 -05:00
Ido Schimmel
ed604c5da3 mlxsw: spectrum_router: Free LPM tree upon failure
When a new LPM tree is created, we try to replace the trees in the
existing virtual routers with it. If we fail, the tree needs to be
freed.

Currently, this does not happen in the unlikely case where we fail to
bind the tree to the first virtual router, since its reference count
never transitions from 1 to 0.

Fix that by taking a reference before binding the tree.

Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18 20:54:58 -05:00
Luis de Bethencourt
4c5386bae9 net/mlx5e: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18 16:22:11 -05:00
Feras Daoud
24d33d2c8e net/mlx5e: Add clock info page to mlx5 core devices
Adds a new page to mlx5 core containing clock info data that allows
user level applications to translate between cqe timestamp to
nanoseconds. The information stored into this page is represented
through mlx5_ib_clock_info.

In order to synchronize between kernel and user space a sequence
number is incremented at the beginning and end of each update.
An odd number means the data is being updated while an even means
the access was already done. To guarantee that the data structure
was accessed atomically user will:

repeat:
        seq1 = <read sequence>
        goto <repeate> while odd
        <read data structure>
        seq2 = <read sequence>
        if seq1 != seq2 goto repeat

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-01-18 14:49:21 -05:00
Jiri Pirko
4b23258d6a mlxsw: spectrum_acl: Pass mlxsw_sp_port down to ruleset bind/unbind ops
No need to convert from mlxsw_sp_port to net_device and back again.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 14:53:58 -05:00
Jiri Pirko
3aaff32304 mlxsw: spectrum_acl: Implement TC block sharing
Benefit from the prepared TC and in-driver ACL infrastructure and
introduce block sharing offload. For that, a new struct "block" is
introduced in spectrum_acl in order to hold a list of specific
block-port bindings.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 14:53:58 -05:00
Jiri Pirko
02caf4995a mlxsw: spectrum_acl: Don't store netdev and ingress for ruleset unbind
Instead, pass netdev and ingress flag to ruleset unbind op.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 14:53:57 -05:00
Jiri Pirko
9fe5fdf27e mlxsw: spectrum_acl: Reshuffle code around mlxsw_sp_acl_ruleset_create/destroy
In order to prepare for follow-up changes, make the bind/unbind helpers
very simple. That required move of ht insertion/removal and bind/unbind
calls into mlxsw_sp_acl_ruleset_create/destroy.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 14:53:57 -05:00
Jakub Kicinski
416ef9b15c net: sched: red: don't reset the backlog on every stat dump
Commit 0dfb33a0d7 ("sch_red: report backlog information") copied
child's backlog into RED's backlog.  Back then RED did not maintain
its own backlog counts.  This has changed after commit 2ccccf5fb4
("net_sched: update hierarchical backlog too") and commit d7f4f332f0
("sch_red: update backlog as well").  Copying is no longer necessary.

Tested:

$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 1260b 14p requeues 14
  marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
 Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
 backlog 1260b 14p requeues 14

Recently RED offload was added.  We need to make sure drivers don't
depend on resetting the stats.  This means backlog should be treated
like any other statistic:

  total_stat = new_hw_stat - prev_hw_stat;

Adjust mlxsw.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 14:29:32 -05:00
David S. Miller
c02b3741eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Overlapping changes all over.

The mini-qdisc bits were a little bit tricky, however.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17 00:10:42 -05:00
Wei Yongjun
e02f08a070 mlxsw: spectrum: qdiscs: Make function mlxsw_sp_qdisc_prio_unoffload static
Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:464:1: warning:
 symbol 'mlxsw_sp_qdisc_prio_unoffload' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:25:42 -05:00
Arkadi Sharshevsky
24cc68ad6c mlxsw: core: Add support for reload
Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.

In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Arkadi Sharshevsky
e21d21ca31 mlxsw: pci: Add support for getting resource through devlink
Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Arkadi Sharshevsky
afadc26b3a mlxsw: spectrum: Add support for getting kvdl occupancy
Add support for getting the kvdl occupancy through the resource interface.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Arkadi Sharshevsky
c0253a45fd mlxsw: spectrum_dpipe: Connect dpipe tables to resources
Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host -> KVD hash single
2. IPv6 host -> KVD hash double
3. Adjacency -> KVD linear

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Arkadi Sharshevsky
ef3116e540 mlxsw: spectrum: Register KVD resources with devlink
Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Arkadi Sharshevsky
54a2e8d456 mlxsw: pci: Add support for performing bus reset
This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16 14:15:35 -05:00
Jack Morgenstein
852f692759 IB/mlx4: Fix incorrectly releasing steerable UD QPs when have only ETH ports
Allocating steerable UD QPs depends on having at least one IB port,
while releasing those QPs does not.

As a result, when there are only ETH ports, the IB (RoCE) driver
requests releasing a qp range whose base qp is zero, with
qp count zero.

When SR-IOV is enabled, and the VF driver is running on a VM over
a hypervisor which treats such qp release calls as errors
(rather than NOPs), we see lines in the VM message log like:

 mlx4_core 0002:00:02.0: Failed to release qp range base:0 cnt:0

Fix this by adding a check for a zero count in mlx4_release_qp_range()
(which thus treats releasing 0 qps as a nop), and eliminating the
check for device managed flow steering when releasing steerable UD QPs.
(Freeing ib_uc_qpns_bitmap unconditionally is also OK, since it
remains NULL when steerable UD QPs are not allocated).

Cc: <stable@vger.kernel.org>
Fixes: 4196670be7 ("IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-15 15:33:21 -07:00
Nogah Frankel
93d8a4c1b5 mlxsw: spectrum: qdiscs: Support stats for PRIO qdisc
Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 12:21:12 -05:00
Nogah Frankel
46a3615be4 mlxsw: spectrum: qdiscs: Support PRIO qdisc offload
Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 12:21:12 -05:00
Yuval Mintz
48276a296a mlxsw: spectrum_router: Configure default routing priority
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 12:21:11 -05:00
Yuval Mintz
ddb362ced1 mlxsw: reg: add rdpm register
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.

This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 12:21:11 -05:00
Ido Schimmel
3743d88ab4 mlxsw: spectrum_router: Add support for IPv6 non-equal-cost multipath
Since commit eb789980d0 ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.

Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.

This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-14 12:06:15 -05:00
David S. Miller
19d28fbd30 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.

Also, we should attempt to patch BPF call args if JIT always on is
enabled.  Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11 22:13:42 -05:00
Feras Daoud
237f258c42 net/mlx5e: Remove timestamp set from netdevice open flow
To avoid configuration override, timestamp set call will
be moved from the netdevice open flow to the init flow.
By this, a close-open procedure will not override the timestamp
configuration.
In addition, the change will rename mlx5e_timestamp_set function
to be mlx5e_timestamp_init.

Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:50 +02:00
Feras Daoud
afc98a0b46 net/mlx5: Update ptp_clock_event foreach PPS event
PPS event did not update ptp_clock_event fields, therefore,
timestamp value was not updated correctly. This fix updates the
event source and the timestamp value for each PPS event.

Fixes: 7c39afb394 ("net/mlx5: PTP code migration to driver core section")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:50 +02:00
Gal Pressman
75b81ce719 net/mlx5e: Don't override netdev features field unless in error flow
Set features function sets dev->features in order to keep track of which
features were successfully changed and which weren't (in case the user
asks for more than one change in a single command).

This breaks the logic in __netdev_update_features which assumes that
dev->features is not changed on success and checks for diffs between
features and dev->features (diffs that might not exist at this point
because of the driver override).

The solution is to keep track of successful/failed feature changes and
assign them to dev->features in case of failure only.

Fixes: 0e405443e8 ("net/mlx5e: Improve set features ndo resiliency")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
4b7d4363f1 net/mlx5e: Check support before TC swap in ETS init
Should not do the following swap between TCs 0 and 1
when max num of TCs is 1:
tclass[prio=0]=1, tclass[prio=1]=0, tclass[prio=i]=i (for i>1)

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
97c8c3aa48 net/mlx5e: Add error print in ETS init
ETS initialization might fail, add a print to indicate
such failures.

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Gal Pressman
e556f6dd47 net/mlx5e: Keep updating ethtool statistics when the interface is down
ethtool statistics should be updated even when the interface is down
since it shows more than just netdev counters, which might change while
the logical link is down.
One useful use case, for example, is when running RoCE traffic over the
interface (while the logical link is down, but physical link is up) and
examining rx_prioX_bytes.

Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Maor Gottlieb
259bbc575c net/mlx5: Fix error handling in load one
We didn't store the result of mlx5_init_once, due to that
mlx5_load_one returned success on error.  Fix that.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Eran Ben Elisha
72f36be061 net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Alaa Hleihel
b6908c2960 net/mlx5: Fix memory leak in bad flow of mlx5_alloc_irq_vectors
Fix a memory leak where in case that pci_alloc_irq_vectors failed,
priv->irq_info was not released.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:46 +02:00
Eran Ben Elisha
8978cc921f {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 00:52:42 +02:00
Nogah Frankel
56202ca4ed mlxsw: spectrum: qdiscs: Remove qdisc before setting a new one
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9cf6c9c758 mlxsw: spectrum: qdiscs: Create a generic replace function
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9a37a59f71 mlxsw: spectrum: qdiscs: Create a generic destroy function
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
562ffbc4b3 mlxsw: spectrum: qdiscs: Add an ops struct
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
cba7158ff1 mlxsw: spectrum: qdiscs: Unite all handle checks
Every qdisc op gets the qdisc handle ID as well as its location.  Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
d56c89550b mlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
c2ed6db765 mlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
4d1a4b8473 mlxsw: spectrum: qdiscs: Clean qdisc statistics structs
Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f8253df553 net: sch: red: Change offloaded xstats to be incremental
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f34b4aac46 net: sch: red: Change the name of the stats struct to be generic
Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
371b437a32 mlxsw: spectrum: qdiscs: Move qdisc's declarations to its designated file
Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Ido Schimmel
d016e13d80 mlxsw: spectrum: Fix typo in firmware upgrade message
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:05:00 -05:00
Jiri Pirko
db84924c4f mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.

Fixes: 96f17e0776 ("mlxsw: spectrum: Support RED qdisc offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:23 -05:00
Yuval Mintz
8e033a93b3 mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.

Wait for 100ms before starting the polling.

Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:22 -05:00
Wei Yongjun
e213f5b6dd net/mlx5e: fix error return code in mlx5e_alloc_rq()
Fix to return a negative error code from the xdp_rxq_info_reg() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 0ddf543226 ("xdp/mlx5: setup xdp_rxq_info")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:54:36 -05:00
Andy Gospodarek
8115b750db net/dim: use struct net_dim_sample as arg to net_dim
Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:45 -05:00
Andy Gospodarek
4c4dbb4a73 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
9a31742531 net/mlx5e: Change Mellanox references in DIM code
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
b9c872f231 net/mlx5e: Move generic functions to new file
These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f5e7f67d9b net/mlx5e: Move AM logic enums
More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
138968e997 net/mlx5e: Remove rq references in mlx5e_rx_am
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f58ee099f3 net/mlx5e: Move interrupt moderation forward declarations
Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
98dd1edffc net/mlx5e: Move interrupt moderation structs to new file
Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
David S. Miller
65d51f2682 mlx5-updates-2018-01-08
Four patches from Or that add Hairpin support to mlx5:
 ===========================================================
 From:  Or Gerlitz <ogerlitz@mellanox.com>
 
 We refer the ability of NIC HW to fwd packet received on one port to
 the other port (also from a port to itself) as hairpin. The application API
 is based
 on ingress tc/flower rules set on the NIC with the mirred redirect
 action. Other actions can apply to packets during the redirect.
 
 Hairpin allows to offload the data-path of various SW DDoS gateways,
 load-balancers, etc to HW. Packets go through all the required
 processing in HW (header re-write, encap/decap, push/pop vlan) and
 then forwarded, CPU stays at practically zero usage. HW Flow counters
 are used by the control plane for monitoring and accounting.
 
 Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
 All the flows that share <recv NIC, mirred NIC> are redirected through
 the same hairpin pair. Currently, only header-rewrite is supported as a
 packet modification action.
 
 I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
 functionality
 on HW simulator, before it was avail in the FW so the driver code could be
 tested early.
 ===========================================================
 
 From Feras three patches that provide very small changes that allow IPoIB
 to support RX timestamping for child interfaces, simply by hooking the mlx5e
 timestamping PTP ioctl to IPoIB child interface netdev profile.
 
 One patch from Gal to fix a spilling mistake.
 
 Two patches from Eugenia adds drop counters to VF statistics
 to be reported as part of VF statistics in netlink (iproute2) and
 implemented them in mlx5 eswitch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaVF5WAAoJEEg/ir3gV/o+fRkH/0PxjwJRA3REqhi/H8HOdH9f
 cBLrOzFdqTCYQWQFCLFbMQ/Zgoel3KglpJ0iQMjuVFfjMbybVXOe8FAEVdbWHnfL
 C+2HRMe8dplKrsq5UkxJhbyKhFKhl2XeMFYWonw9dSM7Nz5DyowQ1y1r5SgMlMAv
 t3mYAIa4kZHK18BjDoIsCoAXXwsHiztR2irMp5+DwataTGP7vC7AsrucDxLA/qFf
 I3E15DZk9s1f53PUuY7CYnUnJfMMP3VJdxpyx4k6xt9J2IMuilF4YyD6wpAKsVQU
 /LzRkWI9x/6QindffqlrACeeidimOeY4pC4txIhS5uXgFXulugDHq1/Ih1sgZS8=
 =g5vr
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2018-01-08

Four patches from Or that add Hairpin support to mlx5:
===========================================================
From:  Or Gerlitz <ogerlitz@mellanox.com>

We refer the ability of NIC HW to fwd packet received on one port to
the other port (also from a port to itself) as hairpin. The application API
is based
on ingress tc/flower rules set on the NIC with the mirred redirect
action. Other actions can apply to packets during the redirect.

Hairpin allows to offload the data-path of various SW DDoS gateways,
load-balancers, etc to HW. Packets go through all the required
processing in HW (header re-write, encap/decap, push/pop vlan) and
then forwarded, CPU stays at practically zero usage. HW Flow counters
are used by the control plane for monitoring and accounting.

Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
All the flows that share <recv NIC, mirred NIC> are redirected through
the same hairpin pair. Currently, only header-rewrite is supported as a
packet modification action.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
functionality
on HW simulator, before it was avail in the FW so the driver code could be
tested early.
===========================================================

From Feras three patches that provide very small changes that allow IPoIB
to support RX timestamping for child interfaces, simply by hooking the mlx5e
timestamping PTP ioctl to IPoIB child interface netdev profile.

One patch from Gal to fix a spilling mistake.

Two patches from Eugenia adds drop counters to VF statistics
to be reported as part of VF statistics in netlink (iproute2) and
implemented them in mlx5 eswitch.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:57:19 -05:00
Eugenia Emantayev
bacc794331 net/mlx5e: Remove redundant checks in set_ringparam
Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:54:50 -05:00
Eugenia Emantayev
7589fd5c8c net/mlx4_en: Align behavior of set ring size flow via ethtool
In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:54:49 -05:00
David S. Miller
a0ce093180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-09 10:37:00 -05:00
Eugenia Emantayev
b8a0dbe3a9 net/mlx5e: E-switch, Add steering drop counters
Add flow counters to count packets dropped due to drop rules
configured in eswitch egress and ingress ACLs.
These counters will count VFs violations and incoming traffic drops.
Will be presented on hypervisor via standard 'ip -s link show' command.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       2
    TX: bytes  packets  errors  dropped carrier collsns
    1406       17       0       0       0       0
    vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off
    RX: bytes  packets  mcast   bcast   dropped
    1666       29       14         32      0
    TX: bytes  packets   dropped
    2880       44       2412

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Gal Pressman
4312782479 net/mlx5e: IPoIB, Fix spelling mistake "functionts" -> "functions"
Fix trivial spelling mistake: "functionts" -> "functions".

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
93b66472ce net/mlx5e: IPoIB, Add ethtool support to get child time stamping parameters
Add support to get time stamping capabilities using ethtool for
child interface.
Usage example:
	ethtool -T CHILD-DEVNAME

This change reuses the functionality of parent devices and does not
introduce any new logic.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
08437c572c net/mlx5e: IPoIB, Add PTP ioctl support for child interface
Add support to control precision time protocol on child interfaces
using ioctl.

This commit changes the following:
- Change parent ioctl function to be non static
- Reuse the parent ioctl function in child devices

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
36e564b76f net/mlx5e: IPoIB, Use correct timestamp in child receive flow
The current implementation takes the child timestamp object from
the parent since the rq in mlx5i_complete_rx_cqe belongs to the parent.
This change fixes the issue by taking the correct timestamp.

Fixes: 7e7f4780c3 ("net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
5c65c564c9 net/mlx5e: Support offloading TC NIC hairpin flows
We refer to TC NIC rule that involves forwarding as "hairpin".

All hairpin rules from the current NIC device (called "func" in
the code) to a given NIC device ("peer") are steered into the
same hairpin RQ/SQ pair.

The hairpin pair is set on demand and removed when there are no
TC rules that need it.

Here's a TC rule that matches on icmp, does header re-write of the
dst mac and hairpin from RX/enp1s2f1 to TX/enp1s2f2 (enp1s2f1/2 are
two mlx5 devices):

tc filter add dev enp1s2f1 protocol ip parent ffff: prio 2
    flower skip_sw ip_proto icmp
     action pedit ex munge eth dst set 10:22:33:44:55:66 pipe
     action mirred egress redirect dev enp1s2f2

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
77ab67b7f0 net/mlx5e: Basic setup of hairpin object
Add the code to do basic setup for hairpin object which
will later serve offloading TC flows.

This includes calling the mlx5 core to create/destroy the hairpin
pair object and setting the HW transport objects that will be used
for steering matched flows to go through hairpin.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
18e568c390 net/mlx5: Hairpin pair core object setup
Low level code to setup hairpin pair core object, deals with:
 - create hairpin RQs/SQs
 - destroy hairpin RQs/SQs
 - modifying hairpin RQs/SQs - pairing (rst2rdy) and unpairing (rdy2rst)

Unlike conventional RQs/SQs, the memory used for the packet and descriptor
buffers is allocated by the firmware and not the driver. The driver sets
the overall data size (log).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Gal Pressman
cd4a87dff7 net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 20:53:14 -05:00
Daniel Jurgens
c4b76d8d95 net/mlx5: Set num_vhca_ports capability
Set the current capability to the max capability. Doing so enables dual
port RoCE functionality if supported by the firmware.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:48 -07:00
Daniel Jurgens
cfe4e37fdc {net, IB}/mlx5: Change set_roce_gid to take a port number
When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens
32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
8737f818ca net/mlx5: Set software owner ID during init HCA
Generate a unique 128bit identifier for each host and pass that value to
firmware in the INIT_HCA command if it reports the sw_owner_id
capability. Each device bound to the mlx5_core driver will have the same
software owner ID.

In subsequent patches mlx5_core devices will be bound via a new VPort
command so that they can operate together under a single InfiniBand
device. Only devices that have the same software owner ID can be bound,
to prevent traffic intended for one host arriving at another.

The INIT_HCA command length was expanded by 128 bits. The command
length is provided as an input FW commands. Older FW does not have a
problem receiving this command in the new longer form.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:20 -07:00
Daniel Jurgens
734dc065fc net/mlx5: Fix race for multiple RoCE enable
There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2aff6 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:20 -07:00
Moni Shoua
dd44572aeb net/mlx5: Enable DC transport
Enable DC transport in the firmware to provide its functionality.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Moni Shoua
57cda166bb net/mlx5: Add DCT command interface
Add a missing command interface to work with a DCT. It includes: creating,
destroying and get events for.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:49 -07:00
David S. Miller
7f0b800048 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-01-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a start of a framework for extending struct xdp_buff without
   having the overhead of populating every data at runtime. Idea
   is to have a new per-queue struct xdp_rxq_info that holds read
   mostly data (currently that is, queue number and a pointer to
   the corresponding netdev) which is set up during rxqueue config
   time. When a XDP program is invoked, struct xdp_buff holds a
   pointer to struct xdp_rxq_info that the BPF program can then
   walk. The user facing BPF program that uses struct xdp_md for
   context can use these members directly, and the verifier rewrites
   context access transparently by walking the xdp_rxq_info and
   net_device pointers to load the data, from Jesper.

2) Redo the reporting of offload device information to user space
   such that it works in combination with network namespaces. The
   latter is reported through a device/inode tuple as similarly
   done in other subsystems as well (e.g. perf) in order to identify
   the namespace. For this to work, ns_get_path() has been generalized
   such that the namespace can be retrieved not only from a specific
   task (perf case), but also from a callback where we deduce the
   netns (ns_common) from a netdevice. bpftool support using the new
   uapi info and extensive test cases for test_offload.py in BPF
   selftests have been added as well, from Jakub.

3) Add two bpftool improvements: i) properly report the bpftool
   version such that it corresponds to the version from the kernel
   source tree. So pick the right linux/version.h from the source
   tree instead of the installed one. ii) fix bpftool and also
   bpf_jit_disasm build with bintutils >= 2.9. The reason for the
   build breakage is that binutils library changed the function
   signature to select the disassembler. Given this is needed in
   multiple tools, add a proper feature detection to the
   tools/build/features infrastructure, from Roman.

4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
   stacktrace map. It is currently unimplemented, but there are
   use cases where user space needs to walk all stacktrace map
   entries e.g. for dumping or deleting map entries w/o having to
   close and recreate the map. Add BPF selftests along with it,
   from Yonghong.

5) Few follow-up cleanups for the bpftool cgroup code: i) rename
   the cgroup 'list' command into 'show' as we have it for other
   subcommands as well, ii) then alias the 'show' command such that
   'list' is accepted which is also common practice in iproute2,
   and iii) remove couple of newlines from error messages using
   p_err(), from Jakub.

6) Two follow-up cleanups to sockmap code: i) remove the unused
   bpf_compute_data_end_sk_skb() function and ii) only build the
   sockmap infrastructure when CONFIG_INET is enabled since it's
   only aware of TCP sockets at this time, from John.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07 21:26:31 -05:00
Jesper Dangaard Brouer
ae75415de1 mlx4: setup xdp_rxq_info
Driver hook points for xdp_rxq_info:
 * reg  : mlx4_en_create_rx_ring
 * unreg: mlx4_en_destroy_rx_ring

Tested on actual hardware.

Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:21 -08:00
Jesper Dangaard Brouer
0ddf543226 xdp/mlx5: setup xdp_rxq_info
The mlx5 driver have a special drop-RQ queue (one per interface) that
simply drops all incoming traffic. It helps driver keep other HW
objects (flow steering) alive upon down/up operations.  It is
temporarily pointed by flow steering objects during the interface
setup, and when interface is down. It lacks many fields that are set
in a regular RQ (for example its state is never switched to
MLX5_RQC_STATE_RDY). (Thanks to Tariq Toukan for explanation).

The XDP RX-queue info for this drop-RQ marked as unused, which
allow us to use the same takedown/free code path as other RX-queues.

Driver hook points for xdp_rxq_info:
 * reg   : mlx5e_alloc_rq()
 * unused: mlx5e_alloc_drop_rq()
 * unreg : mlx5e_free_rq()

Tested on actual hardware with samples/bpf program

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:20 -08:00
Arnd Bergmann
74bd5d56bf net/mlx5e: hide an unused variable
The uplink_rpriv variable was added at the start of the function but
only used inside of an #ifdef:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_route_lookup_ipv6':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:1549:25: error: unused variable 'uplink_rpriv' [-Werror=unused-variable]

This moves the declaration into that #ifdef as well.

Fixes: 5ed99fb421 ("net/mlx5e: Move ethernet representors data into separate struct")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05 10:55:34 -05:00
Ido Schimmel
90045fc9c7 mlxsw: spectrum: Relax sanity checks during enslavement
Since commit 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that
have uppers") the driver forbids enslavement to netdevs that already
have uppers of their own, as this can result in various ordering
problems.

This requirement proved to be too strict for some users who need to be
able to enslave ports to a bridge that already has uppers. In this case,
we can allow the enslavement if the bridge is already known to us, as
any configuration performed on top of the bridge was already reflected
to the device.

Fixes: 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that have uppers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:38:26 -05:00
Ido Schimmel
8764a8267b mlxsw: spectrum_router: Fix NULL pointer deref
When we remove the neighbour associated with a nexthop we should always
refuse to write the nexthop to the adjacency table. Regardless if it is
already present in the table or not.

Otherwise, we risk dereferencing the NULL pointer that was set instead
of the neighbour.

Fixes: a7ff87acd9 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:37:16 -05:00
David S. Miller
6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Linus Torvalds
19286e4a7a Third pull request for 4.15-rc
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
   The problem was a driver bug in the handling of shutting down a QP.
 - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
   unwind and a use after free bug
 - Improper congestion counter values on mlx5 when link aggregation is enabled
 - ipoib lockdep regression introduced in this merge window
 - hfi1 regression supporting the device in a VM introduced in a recent patch
 - Typo that breaks future uAPI compatibility in the verbs core
 - More SELinux related oops fixing
 - Fix an oops during error unwind in mlx5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
 LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
 ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
 aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
 dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
 NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
 Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
 WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
 lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
 uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
 4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
 395aK2aZSShIkZvIwbI8
 =IkGi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is the next batch of for-rc patches from RDMA. It includes the
  fix for the ipoib regression I mentioned last time, and the result of
  a fairly major debugging effort to get iser working reliably on cxgb4
  hardware - it turns out the cxgb4 driver was not handling QP error
  flushing properly causing iser to fail.

   - cxgb4 fix for an iser testing failure as debugged by Steve and
     Sagi. The problem was a driver bug in the handling of shutting down
     a QP.

   - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
     error unwind and a use after free bug

   - Improper congestion counter values on mlx5 when link aggregation is
     enabled

   - ipoib lockdep regression introduced in this merge window

   - hfi1 regression supporting the device in a VM introduced in a
     recent patch

   - Typo that breaks future uAPI compatibility in the verbs core

   - More SELinux related oops fixing

   - Fix an oops during error unwind in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
  IB/mlx5: Fix congestion counters in LAG mode
  RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
  RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
  RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
  iw_cxgb4: when flushing, complete all wrs in a chain
  iw_cxgb4: reflect the original WR opcode in drain cqes
  iw_cxgb4: Only validate the MSN for successful completions
2017-12-28 23:06:01 -08:00
David S. Miller
d367341b25 mlx5-shared-4.16-1
mlx5 shared code for both rdma-next and net-next trees.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaRXPQAAoJEEg/ir3gV/o+H4wH/2CkV3tOLfRekNd4CFSoH78A
 zH0Gjwa3P7aXybTmhXbMNCYLEoVEZ5pSlToOmjz1FrmxhH62JQ80WyKOcYtiHMBg
 3x5tFZboLc9tMGwPhyBJBjyiH+Gh9ZMoD6hBFgSvIG/hNPUb1W48/Pc+R61gOrMw
 6ADU+6mIf5cHNQ4c/V/SBlfiQjSXN4Y38knhTeZy8dLcZZVg1eMn+pj7W/haAyb6
 t3IMEaUmlDYwQmtxTT2snK4VutEPfxYGv1gyKSkZXmY74aRvSzlgV7PqXM3qsV4W
 8ZEhEHZJGi6NXC2hk5FQSSPWhQOhAmpjTHm8aImK0SIf68YajjzaZnT9S+eMmdY=
 =uMjj
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-shared-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 E-Switch updates 2017-12-19

This series includes updates for mlx5 E-Switch infrastructures,
to be merged into net-next and rdma-next trees.

Mark's patches provide E-Switch refactoring that generalize the mlx5
E-Switch vf representors interfaces and data structures. The serious is
mainly focused on moving ethernet (netdev) specific representors logic out
of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
provides better separation and allows future support for other types of vf
representors (e.g. RDMA).

Gal's patches at the end of this serious, provide a simple syntax fix and
two other patches that handles vport ingress/egress ACL steering name
spaces to be aligned with the Firmware/Hardware specs.

V1->V2:
 - Addressed coding style comments in patches #1 and #7
 - The series is still based on rc4, as now I see net-next is also @rc4.

V2->V3:
 - Fixed compilation warning, reported by Dave.

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 19:32:59 -05:00
Gal Pressman
9b93ab981e net/mlx5: Separate ingress/egress namespaces for each vport
Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.

Fixes: efdc810ba3 ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
4484e29948 net/mlx5: Fix ingress/egress naming mistake
The functions names do not represent their actions, switch the mistaken
ingress/egress naming.

Fixes: fba53f7b57 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
18a89ab766 net/mlx5e: E-Switch, Use the name of static array instead of its address
Using the address of a static array is the same as using its name (in
this specific use-case), but it's confusing and makes the code less
readable.

Fixes: 1bd27b11c1 ("net/mlx5: Introduce E-switch QoS management")
Fixes: bd77bf1cb5 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
2c47bf80e8 net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep
Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch
code to mlx5e and rename it to better reflect where it belongs

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
a4b97ab421 net/mlx5: E-Switch, Create generic header struct to be used by representors
Now that we don't store type dependent data in struct mlx5_eswitch_rep
we can create a generic interface, and representor type.

struct mlx5_eswitch_rep will store an array of interfaces, each
interface is used by a different representor type.

Once we moved to a more generic interface, rdma driver representors can
be added and utilize the same mechanism as the Ethernet driver
representors use.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:50 +02:00
Moni Shoua
a42b63c1ac net/mlx4_en: Change default QoS settings
Change the default mapping between TC and TCG as follows:

Prio     |             TC/TCG
         |      from             to
         |    (set by FW)      (set by SW)
---------+-----------------------------------
0        |      0/0              0/7
1        |      1/0              0/6
2        |      2/0              0/5
3        |      3/0              0/4
4        |      4/0              0/3
5        |      5/0              0/2
6        |      6/0              0/1
7        |      7/0              0/0

These new settings cause that a pause frame for any prio stops
traffic for all prios.

Fixes: 564c274c3d ("net/mlx4_en: DCB QoS support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
fd4a3e2828 net/mlx4_core: Cleanup FMR unmapping flow
Remove redundant and not essential operations in fmr unmap/free.
According to device spec, in FMR unmap it is sufficient to set
ownership bit to SW. This allows remapping afterwards.

Fixes: 8ad11fb6b0 ("IB/mlx4: Implement FMRs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
dc484851ed net/mlx4_en: RX csum, reorder branches
Use early goto commands, and save else branches.
This uses less indentations and brackets, making the code
more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
345ef18c24 net/mlx4_en: RX csum, remove redundant branches and checks
Do not check IPv6 bit in cqe status if CONFIG_IPV6 is not enabled.
Function check_csum() is reached only with IPv4 or IPv6 set (if enabled),
if IPv6 is not set (or is not enabled) it is redundant to test the
IPv4 bit.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Mark Bloch
5ed99fb421 net/mlx5e: Move ethernet representors data into separate struct
Ethernet representors have a need to store data which is applicable
only for them. Create a priv void pointer in struct mlx5_eswitch_rep
and move mlx5e to store the relevant data there. As part of this change
we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the
E-Switch code will copy a priv value which is garbage.

We also rename mlx5_eswitch_get_uplink_netdev() to
mlx5_eswitch_get_uplink_priv() and make it return void *.
This way E-Switch code doesn't need to deal with net devices and
we leave the task of getting it to mlx5e.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
159fe63922 net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function
In order for representors to send packets directly to VFs we use an
E-Switch function which insert special rules into the HW. For symmetry
create an E-Switch function that deletes these rules as well.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
f7a68945a5 net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch
In our pursuit to cleanup e-switch sub-module from mlx5e specific code,
we move the functions that insert/remove the flow steering rules that
allow mlx5e representors to send packets directly to VFs into the EN
driver code.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
4c66df01f5 net/mlx5: E-Switch, Simplify representor load/unload callback API
In the load() callback for loading representors we don't really need
struct mlx5_eswitch but struct mlx5_core_dev, pass it directly.

In the unload() callback for unloading representors we don't need the
struct mlx5_eswitch argument, remove it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
6ed1803abe net/mlx5: E-Switch, Refactor load/unload of representors
Refactor the load/unload stages for better code reuse.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
e8d31c4d65 net/mlx5: E-Switch, Refactor vport representors initialization
Refactor the init stage of vport representors registration.
vport number and hw id can be assigned by the E-Switch driver and not by
the netdevice driver. While here, make the error path of mlx5_eswitch_init()
a reverse order of the good path, also use kcalloc to allocate an array
instead of kzalloc.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Jason Gunthorpe
76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
David S. Miller
fba961ab29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of overlapping changes.  Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.

Include a necessary change by Jakub Kicinski, with log message:

====================
cls_bpf no longer takes care of offload tracking.  Make sure
netdevsim performs necessary checks.  This fixes a warning
caused by TC trying to remove a filter it has not added.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22 11:16:31 -05:00
Majd Dibbiny
71a0ff65a2 IB/mlx5: Fix congestion counters in LAG mode
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-21 16:06:07 -07:00
Moshe Shemesh
a2fba188fd net/mlx5: Stay in polling mode when command EQ destroy fails
During unload, on mlx5_stop_eqs we move command interface from events
mode to polling mode, but if command interface EQ destroy fail we move
back to events mode.
That's wrong since even if we fail to destroy command interface EQ, we
do release its irq, so no interrupts will be received.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:05 +02:00
Moshe Shemesh
d6b2785cd5 net/mlx5: Cleanup IRQs in case of unload failure
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
In such failure flow the function will return without
releasing all EQs irqs and then pci_free_irq_vectors will fail.
Fix by only warn on destroy EQ failure and continue to release other
EQs and their irqs.

It fixes the following kernel trace:
kernel: kernel BUG at drivers/pci/msi.c:352!
...
...
kernel: Call Trace:
kernel: pci_disable_msix+0xd3/0x100
kernel: pci_free_irq_vectors+0xe/0x20
kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:05 +02:00
Maor Gottlieb
139ed6c6c4 net/mlx5: Fix steering memory leak
Flow steering priority and namespace are software only objects that
didn't have the proper destructors and were not freed during steering
cleanup.

Fix it by adding destructor functions for these objects.

Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:04 +02:00
Gal Pressman
0c1cc8b221 net/mlx5e: Prevent possible races in VXLAN control flow
When calling add/remove VXLAN port, a lock must be held in order to
prevent race scenarios when more than one add/remove happens at the
same time.
Fix by holding our state_lock (mutex) as done by all other parts of the
driver.
Note that the spinlock protecting the radix-tree is still needed in
order to synchronize radix-tree access from softirq context.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:03 +02:00
Gal Pressman
23f4cc2cd9 net/mlx5e: Add refcount to VXLAN structure
A refcount mechanism must be implemented in order to prevent unwanted
scenarios such as:
- Open an IPv4 VXLAN interface
- Open an IPv6 VXLAN interface (different socket)
- Remove one of the interfaces

With current implementation, the UDP port will be removed from our VXLAN
database and turn off the offloads for the other interface, which is
still active.
The reference count mechanism will only allow UDP port removals once all
consumers are gone.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:03 +02:00
Gal Pressman
6323514116 net/mlx5e: Fix possible deadlock of VXLAN lock
mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
context) and mlx5e_features_check (softirq), but the lock acquired does
not disable bottom half and might result in deadlock. Fix it by simply
replacing spin_lock() with spin_lock_bh().
While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().

lockdep's WARNING: inconsistent lock state
[  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
[  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[  654.028528] {SOFTIRQ-ON-W} state was registered at:
[  654.028607]   _raw_spin_lock+0x3c/0x70
[  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
[  654.028878]   process_one_work+0x1e9/0x640
[  654.028942]   worker_thread+0x4a/0x3f0
[  654.029002]   kthread+0x141/0x180
[  654.029056]   ret_from_fork+0x24/0x30
[  654.029114] irq event stamp: 579088
[  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
[  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
[  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
[  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
[  654.029684] other info that might help us debug this:
[  654.029781]  Possible unsafe locking scenario:

[  654.029868]        CPU0
[  654.029908]        ----
[  654.029947]   lock(&(&vxlan_db->lock)->rlock);
[  654.030045]   <Interrupt>
[  654.030090]     lock(&(&vxlan_db->lock)->rlock);
[  654.030162]
 *** DEADLOCK ***

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:02 +02:00
Moni Shoua
dbff26e44d net/mlx5: Fix error flow in CREATE_QP command
In error flow, when DESTROY_QP command should be executed, the wrong
mailbox was set with data, not the one that is written to hardware,
Fix that.

Fixes: 09a7d9eca1 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:02 +02:00
Eugenia Emantayev
777ec2b2a3 net/mlx5: Fix misspelling in the error message and comment
Fix misspelling in word syndrome.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:01 +02:00
Eugenia Emantayev
696a97cf9f net/mlx5e: Fix defaulting RX ring size when not needed
Fixes the bug when turning on/off CQE compression mechanism
resets the RX rings size to default value when it is not
needed.

Fixes: 2fc4bfb725 ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:00 +02:00
Gal Pressman
2989ad1ec0 net/mlx5e: Fix features check of IPv6 traffic
The assumption that the next header field contains the transport
protocol is wrong for IPv6 packets with extension headers.
Instead, we should look the inner-most next header field in the buffer.
This will fix TSO offload for tunnels over IPv6 with extension headers.

Performance testing: 19.25x improvement, cool!
Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap.
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
TSO: Enabled
Before: 4,926.24  Mbps
Now   : 94,827.91 Mbps

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:00 +02:00
Huy Nguyen
ff0891915c net/mlx5e: Fix ETS BW check
Fix bug that allows ets bw sum to be 0% when ets tc type exists.

Fixes: 08fb1dacdd ('net/mlx5e: Support DCBNL IEEE ETS')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:59 +02:00
Eran Ben Elisha
37e92a9d4f net/mlx5: Fix rate limit packet pacing naming and struct
In mlx5_ifc, struct size was not complete, and thus driver was sending
garbage after the last defined field. Fixed it by adding reserved field
to complete the struct size.

In addition, rename all set_rate_limit to set_pp_rate_limit to be
compliant with the Firmware <-> Driver definition.

Fixes: 7486216b3a ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: 1466cc5b23 ("net/mlx5: Rate limit tables support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
Saeed Mahameed
231243c827 Revert "mlx5: move affinity hints assignments to generic code"
Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with  -EIO

This reverts commit a435393aca.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393aca ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jes Sorensen <jsorensen@fb.com>
Reported-by: Jes Sorensen <jsorensen@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
Kamal Heib
bae115a2bb net/mlx5: FPGA, return -EINVAL if size is zero
Currently, if a size of zero is passed to
mlx5_fpga_mem_{read|write}_i2c()
the "err" return value will not be initialized, which triggers gcc
warnings:

[..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error:
uninitialized symbol 'err'.
[..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error:
uninitialized symbol 'err'.

fix that.

Fixes: a9956d35d1 ('net/mlx5: FPGA, Add SBU infrastructure')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:57 +02:00
Petr Machata
8ba6b30ef7 mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug
This reverts commit 63dd00fa3e.

RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
error code.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Fixes: 63dd00fa3e ("mlxsw: spectrum_router: Add batch neighbour deletion")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 11:08:27 -05:00
David S. Miller
c30abd5e40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-16 22:11:55 -05:00
Yuval Mintz
fccff08628 mlxsw: spectrum: Disable MAC learning for ovs port
Learning is currently enabled for ports which are OVS slaves -
even though OVS doesn't need this indication.
Since we're not associating a fid with the port, HW would continuously
notify driver of learned [& aged] MACs which would be logged as errors.

Fixes: 2b94e58df5 ("mlxsw: spectrum: Allow ports to work under OVS master")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 10:47:36 -05:00
Eran Ben Elisha
5a1647c391 net/mlx4_en: Fill all counters under one call of stats lock
Before this patch, the stats_lock was acquired twice. In between the
locks Driver sent command to gather some more statistics (per priority
and counter statistics). If the stats lock was acquired by get
statistics NDO in between we would have report out of sync counters.

Fix this by collecting all stats from Firmware in advance and then
fill the Software structs under one lock.

Fixes: 0b131561a7 ("net/mlx4_en: Add Flow control statistics display via ethtool")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:37 -05:00
Eran Ben Elisha
0bb9fc4f54 net/mlx4_core: Fix wrong calculation of free counters
The field res_free indicates the total number of counters which are
available for allocation (reserved and unreserved). Fixed a bug where
the reserved counters were subtracted from res_free before any
allocation was performed.

Before this fix, free counters which were not reserved could not be
allocated.

Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Eugenia Emantayev
78034f5fdd net/mlx4_en: Fix selftest for small MTUs
Set the minimal MTU threshold for running loopback selftest.
MTU should be big enough to include packet payload, NET_IP_ALIGN,
Ethernet headers and preamble length.

Fixes: e7c1c2c462 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Jiri Pirko
9454d9307e mlxsw: spectrum: handle NETIF_F_HW_TC changes correctly
Currently, whenever the NETIF_F_HW_TC feature changes, we silently
always allow it, but we actually do not disable the flows in HW
on disable. That breaks user's expectations. So just forbid
the feature disable in case there are any filters offloaded.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06 15:11:17 -05:00
Cong Wang
9f8a739e72 act_mirred: get rid of tcfm_ifindex from struct tcf_mirred
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.

If we would support moving target netdev across netns, using
pointer would be better than ifindex.

This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06 14:50:13 -05:00