Commit Graph

3713 Commits

Author SHA1 Message Date
Gal Pressman
75b81ce719 net/mlx5e: Don't override netdev features field unless in error flow
Set features function sets dev->features in order to keep track of which
features were successfully changed and which weren't (in case the user
asks for more than one change in a single command).

This breaks the logic in __netdev_update_features which assumes that
dev->features is not changed on success and checks for diffs between
features and dev->features (diffs that might not exist at this point
because of the driver override).

The solution is to keep track of successful/failed feature changes and
assign them to dev->features in case of failure only.

Fixes: 0e405443e8 ("net/mlx5e: Improve set features ndo resiliency")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
4b7d4363f1 net/mlx5e: Check support before TC swap in ETS init
Should not do the following swap between TCs 0 and 1
when max num of TCs is 1:
tclass[prio=0]=1, tclass[prio=1]=0, tclass[prio=i]=i (for i>1)

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:49 +02:00
Tariq Toukan
97c8c3aa48 net/mlx5e: Add error print in ETS init
ETS initialization might fail, add a print to indicate
such failures.

Fixes: 08fb1dacdd ("net/mlx5e: Support DCBNL IEEE ETS")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Gal Pressman
e556f6dd47 net/mlx5e: Keep updating ethtool statistics when the interface is down
ethtool statistics should be updated even when the interface is down
since it shows more than just netdev counters, which might change while
the logical link is down.
One useful use case, for example, is when running RoCE traffic over the
interface (while the logical link is down, but physical link is up) and
examining rx_prioX_bytes.

Fixes: f62b8bb8f2 ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:48 +02:00
Maor Gottlieb
259bbc575c net/mlx5: Fix error handling in load one
We didn't store the result of mlx5_init_once, due to that
mlx5_load_one returned success on error.  Fix that.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Eran Ben Elisha
72f36be061 net/mlx5: Fix mlx5_get_uars_page to return error code
Change mlx5_get_uars_page to return ERR_PTR in case of
allocation failure. Change all callers accordingly to
check the IS_ERR(ptr) instead of NULL.

Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware and software flows")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:47 +02:00
Alaa Hleihel
b6908c2960 net/mlx5: Fix memory leak in bad flow of mlx5_alloc_irq_vectors
Fix a memory leak where in case that pci_alloc_irq_vectors failed,
priv->irq_info was not released.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 02:01:46 +02:00
Eran Ben Elisha
8978cc921f {net,ib}/mlx5: Don't disable local loopback multicast traffic when needed
There are systems platform information management interfaces (such as
HOST2BMC) for which we cannot disable local loopback multicast traffic.

Separate disable_local_lb_mc and disable_local_lb_uc capability bits so
driver will not disable multicast loopback traffic if not supported.
(It is expected that Firmware will not set disable_local_lb_mc if
HOST2BMC is running for example.)

Function mlx5_nic_vport_update_local_lb will do best effort to
disable/enable UC/MC loopback traffic and return success only in case it
succeeded to changed all allowed by Firmware.

Adapt mlx5_ib and mlx5e to support the new cap bits.

Fixes: 2c43c5a036 ("net/mlx5e: Enable local loopback in loopback selftest")
Fixes: c85023e153 ("IB/mlx5: Add raw ethernet local loopback support")
Fixes: bded747bb4 ("net/mlx5: Add raw ethernet local loopback firmware command")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-12 00:52:42 +02:00
Nogah Frankel
56202ca4ed mlxsw: spectrum: qdiscs: Remove qdisc before setting a new one
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9cf6c9c758 mlxsw: spectrum: qdiscs: Create a generic replace function
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
9a37a59f71 mlxsw: spectrum: qdiscs: Create a generic destroy function
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
562ffbc4b3 mlxsw: spectrum: qdiscs: Add an ops struct
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
cba7158ff1 mlxsw: spectrum: qdiscs: Unite all handle checks
Every qdisc op gets the qdisc handle ID as well as its location.  Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:41 -05:00
Nogah Frankel
d56c89550b mlxsw: spectrum: qdiscs: Add tclass number to the mlxsw_sp_qdisc
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
c2ed6db765 mlxsw: spectrum: qdiscs: Make the clean stats function to be for RED only
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
4d1a4b8473 mlxsw: spectrum: qdiscs: Clean qdisc statistics structs
Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f8253df553 net: sch: red: Change offloaded xstats to be incremental
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
f34b4aac46 net: sch: red: Change the name of the stats struct to be generic
Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Nogah Frankel
371b437a32 mlxsw: spectrum: qdiscs: Move qdisc's declarations to its designated file
Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:07:40 -05:00
Ido Schimmel
d016e13d80 mlxsw: spectrum: Fix typo in firmware upgrade message
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 16:05:00 -05:00
Jiri Pirko
db84924c4f mlxsw: spectrum_qdisc: Don't use variable array in mlxsw_sp_tclass_congestion_enable
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.

Fixes: 96f17e0776 ("mlxsw: spectrum: Support RED qdisc offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:23 -05:00
Yuval Mintz
8e033a93b3 mlxsw: pci: Wait after reset before accessing HW
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.

Wait for 100ms before starting the polling.

Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:58:22 -05:00
Wei Yongjun
e213f5b6dd net/mlx5e: fix error return code in mlx5e_alloc_rq()
Fix to return a negative error code from the xdp_rxq_info_reg() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 0ddf543226 ("xdp/mlx5: setup xdp_rxq_info")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:54:36 -05:00
Andy Gospodarek
8115b750db net/dim: use struct net_dim_sample as arg to net_dim
Simplify the arguments net_dim() by formatting them into a struct
net_dim_sample before calling the function.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Suggested-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:45 -05:00
Andy Gospodarek
4c4dbb4a73 net/mlx5e: Move dynamic interrupt coalescing code to include/linux
This move allows drivers to add private structure elements to track the
number of packets, bytes, and interrupts events per ring.  A driver
also defines a workqueue handler to act on this collected data once per
poll and modify the coalescing parameters per ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
9a31742531 net/mlx5e: Change Mellanox references in DIM code
Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables and add generic profile references.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:44 -05:00
Andy Gospodarek
b9c872f231 net/mlx5e: Move generic functions to new file
These functions were identified as ones that could be made generic and
used by multiple drivers.  Most of the contents of en_rx_am.c are moved
to net_dim.c.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f5e7f67d9b net/mlx5e: Move AM logic enums
More movement to help make this code more generic.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
138968e997 net/mlx5e: Remove rq references in mlx5e_rx_am
This makes mlx5e_am_sample more generic so that it can be called easily
from a driver that does not use the same data structure to store these
values in a single structure.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
f58ee099f3 net/mlx5e: Move interrupt moderation forward declarations
Move these to newly created file to prepare to move these functions to a
library.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
Andy Gospodarek
98dd1edffc net/mlx5e: Move interrupt moderation structs to new file
Create new header file to prepare to move code that handles irq
moderation to a library that lives in a header file.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Acked-by: Tal Gilboa <talgi@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 15:27:35 -05:00
David S. Miller
65d51f2682 mlx5-updates-2018-01-08
Four patches from Or that add Hairpin support to mlx5:
 ===========================================================
 From:  Or Gerlitz <ogerlitz@mellanox.com>
 
 We refer the ability of NIC HW to fwd packet received on one port to
 the other port (also from a port to itself) as hairpin. The application API
 is based
 on ingress tc/flower rules set on the NIC with the mirred redirect
 action. Other actions can apply to packets during the redirect.
 
 Hairpin allows to offload the data-path of various SW DDoS gateways,
 load-balancers, etc to HW. Packets go through all the required
 processing in HW (header re-write, encap/decap, push/pop vlan) and
 then forwarded, CPU stays at practically zero usage. HW Flow counters
 are used by the control plane for monitoring and accounting.
 
 Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
 All the flows that share <recv NIC, mirred NIC> are redirected through
 the same hairpin pair. Currently, only header-rewrite is supported as a
 packet modification action.
 
 I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
 functionality
 on HW simulator, before it was avail in the FW so the driver code could be
 tested early.
 ===========================================================
 
 From Feras three patches that provide very small changes that allow IPoIB
 to support RX timestamping for child interfaces, simply by hooking the mlx5e
 timestamping PTP ioctl to IPoIB child interface netdev profile.
 
 One patch from Gal to fix a spilling mistake.
 
 Two patches from Eugenia adds drop counters to VF statistics
 to be reported as part of VF statistics in netlink (iproute2) and
 implemented them in mlx5 eswitch.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaVF5WAAoJEEg/ir3gV/o+fRkH/0PxjwJRA3REqhi/H8HOdH9f
 cBLrOzFdqTCYQWQFCLFbMQ/Zgoel3KglpJ0iQMjuVFfjMbybVXOe8FAEVdbWHnfL
 C+2HRMe8dplKrsq5UkxJhbyKhFKhl2XeMFYWonw9dSM7Nz5DyowQ1y1r5SgMlMAv
 t3mYAIa4kZHK18BjDoIsCoAXXwsHiztR2irMp5+DwataTGP7vC7AsrucDxLA/qFf
 I3E15DZk9s1f53PUuY7CYnUnJfMMP3VJdxpyx4k6xt9J2IMuilF4YyD6wpAKsVQU
 /LzRkWI9x/6QindffqlrACeeidimOeY4pC4txIhS5uXgFXulugDHq1/Ih1sgZS8=
 =g5vr
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-updates-2018-01-08

Four patches from Or that add Hairpin support to mlx5:
===========================================================
From:  Or Gerlitz <ogerlitz@mellanox.com>

We refer the ability of NIC HW to fwd packet received on one port to
the other port (also from a port to itself) as hairpin. The application API
is based
on ingress tc/flower rules set on the NIC with the mirred redirect
action. Other actions can apply to packets during the redirect.

Hairpin allows to offload the data-path of various SW DDoS gateways,
load-balancers, etc to HW. Packets go through all the required
processing in HW (header re-write, encap/decap, push/pop vlan) and
then forwarded, CPU stays at practically zero usage. HW Flow counters
are used by the control plane for monitoring and accounting.

Hairpin is implemented by pairing a receive queue (RQ) to send queue (SQ).
All the flows that share <recv NIC, mirred NIC> are redirected through
the same hairpin pair. Currently, only header-rewrite is supported as a
packet modification action.

I'd like to thanks Elijah Shakkour <elijahs@mellanox.com> for implementing this
functionality
on HW simulator, before it was avail in the FW so the driver code could be
tested early.
===========================================================

From Feras three patches that provide very small changes that allow IPoIB
to support RX timestamping for child interfaces, simply by hooking the mlx5e
timestamping PTP ioctl to IPoIB child interface netdev profile.

One patch from Gal to fix a spilling mistake.

Two patches from Eugenia adds drop counters to VF statistics
to be reported as part of VF statistics in netlink (iproute2) and
implemented them in mlx5 eswitch.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10 14:57:19 -05:00
Eugenia Emantayev
bacc794331 net/mlx5e: Remove redundant checks in set_ringparam
Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:54:50 -05:00
Eugenia Emantayev
7589fd5c8c net/mlx4_en: Align behavior of set ring size flow via ethtool
In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09 11:54:49 -05:00
David S. Miller
a0ce093180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-09 10:37:00 -05:00
Eugenia Emantayev
b8a0dbe3a9 net/mlx5e: E-switch, Add steering drop counters
Add flow counters to count packets dropped due to drop rules
configured in eswitch egress and ingress ACLs.
These counters will count VFs violations and incoming traffic drops.
Will be presented on hypervisor via standard 'ip -s link show' command.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast
    0          0        0       0       0       2
    TX: bytes  packets  errors  dropped carrier collsns
    1406       17       0       0       0       0
    vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, trust off, query_rss off
    RX: bytes  packets  mcast   bcast   dropped
    1666       29       14         32      0
    TX: bytes  packets   dropped
    2880       44       2412

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Gal Pressman
4312782479 net/mlx5e: IPoIB, Fix spelling mistake "functionts" -> "functions"
Fix trivial spelling mistake: "functionts" -> "functions".

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
93b66472ce net/mlx5e: IPoIB, Add ethtool support to get child time stamping parameters
Add support to get time stamping capabilities using ethtool for
child interface.
Usage example:
	ethtool -T CHILD-DEVNAME

This change reuses the functionality of parent devices and does not
introduce any new logic.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
08437c572c net/mlx5e: IPoIB, Add PTP ioctl support for child interface
Add support to control precision time protocol on child interfaces
using ioctl.

This commit changes the following:
- Change parent ioctl function to be non static
- Reuse the parent ioctl function in child devices

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Feras Daoud
36e564b76f net/mlx5e: IPoIB, Use correct timestamp in child receive flow
The current implementation takes the child timestamp object from
the parent since the rq in mlx5i_complete_rx_cqe belongs to the parent.
This change fixes the issue by taking the correct timestamp.

Fixes: 7e7f4780c3 ("net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
5c65c564c9 net/mlx5e: Support offloading TC NIC hairpin flows
We refer to TC NIC rule that involves forwarding as "hairpin".

All hairpin rules from the current NIC device (called "func" in
the code) to a given NIC device ("peer") are steered into the
same hairpin RQ/SQ pair.

The hairpin pair is set on demand and removed when there are no
TC rules that need it.

Here's a TC rule that matches on icmp, does header re-write of the
dst mac and hairpin from RX/enp1s2f1 to TX/enp1s2f2 (enp1s2f1/2 are
two mlx5 devices):

tc filter add dev enp1s2f1 protocol ip parent ffff: prio 2
    flower skip_sw ip_proto icmp
     action pedit ex munge eth dst set 10:22:33:44:55:66 pipe
     action mirred egress redirect dev enp1s2f2

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
77ab67b7f0 net/mlx5e: Basic setup of hairpin object
Add the code to do basic setup for hairpin object which
will later serve offloading TC flows.

This includes calling the mlx5 core to create/destroy the hairpin
pair object and setting the HW transport objects that will be used
for steering matched flows to go through hairpin.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Or Gerlitz
18e568c390 net/mlx5: Hairpin pair core object setup
Low level code to setup hairpin pair core object, deals with:
 - create hairpin RQs/SQs
 - destroy hairpin RQs/SQs
 - modifying hairpin RQs/SQs - pairing (rst2rdy) and unpairing (rdy2rst)

Unlike conventional RQs/SQs, the memory used for the packet and descriptor
buffers is allocated by the firmware and not the driver. The driver sets
the overall data size (log).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-01-09 07:40:48 +02:00
Gal Pressman
cd4a87dff7 net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08 20:53:14 -05:00
Daniel Jurgens
c4b76d8d95 net/mlx5: Set num_vhca_ports capability
Set the current capability to the max capability. Doing so enables dual
port RoCE functionality if supported by the firmware.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:48 -07:00
Daniel Jurgens
cfe4e37fdc {net, IB}/mlx5: Change set_roce_gid to take a port number
When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:23 -07:00
Daniel Jurgens
32f69e4be2 {net, IB}/mlx5: Manage port association for multiport RoCE
When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:22 -07:00
Daniel Jurgens
8737f818ca net/mlx5: Set software owner ID during init HCA
Generate a unique 128bit identifier for each host and pass that value to
firmware in the INIT_HCA command if it reports the sw_owner_id
capability. Each device bound to the mlx5_core driver will have the same
software owner ID.

In subsequent patches mlx5_core devices will be bound via a new VPort
command so that they can operate together under a single InfiniBand
device. Only devices that have the same software owner ID can be bound,
to prevent traffic intended for one host arriving at another.

The INIT_HCA command length was expanded by 128 bits. The command
length is provided as an input FW commands. Older FW does not have a
problem receiving this command in the new longer form.

Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:20 -07:00
Daniel Jurgens
734dc065fc net/mlx5: Fix race for multiple RoCE enable
There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2aff6 ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:42:20 -07:00
Moni Shoua
dd44572aeb net/mlx5: Enable DC transport
Enable DC transport in the firmware to provide its functionality.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:50 -07:00
Moni Shoua
57cda166bb net/mlx5: Add DCT command interface
Add a missing command interface to work with a DCT. It includes: creating,
destroying and get events for.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-01-08 11:38:49 -07:00
David S. Miller
7f0b800048 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-01-07

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a start of a framework for extending struct xdp_buff without
   having the overhead of populating every data at runtime. Idea
   is to have a new per-queue struct xdp_rxq_info that holds read
   mostly data (currently that is, queue number and a pointer to
   the corresponding netdev) which is set up during rxqueue config
   time. When a XDP program is invoked, struct xdp_buff holds a
   pointer to struct xdp_rxq_info that the BPF program can then
   walk. The user facing BPF program that uses struct xdp_md for
   context can use these members directly, and the verifier rewrites
   context access transparently by walking the xdp_rxq_info and
   net_device pointers to load the data, from Jesper.

2) Redo the reporting of offload device information to user space
   such that it works in combination with network namespaces. The
   latter is reported through a device/inode tuple as similarly
   done in other subsystems as well (e.g. perf) in order to identify
   the namespace. For this to work, ns_get_path() has been generalized
   such that the namespace can be retrieved not only from a specific
   task (perf case), but also from a callback where we deduce the
   netns (ns_common) from a netdevice. bpftool support using the new
   uapi info and extensive test cases for test_offload.py in BPF
   selftests have been added as well, from Jakub.

3) Add two bpftool improvements: i) properly report the bpftool
   version such that it corresponds to the version from the kernel
   source tree. So pick the right linux/version.h from the source
   tree instead of the installed one. ii) fix bpftool and also
   bpf_jit_disasm build with bintutils >= 2.9. The reason for the
   build breakage is that binutils library changed the function
   signature to select the disassembler. Given this is needed in
   multiple tools, add a proper feature detection to the
   tools/build/features infrastructure, from Roman.

4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
   stacktrace map. It is currently unimplemented, but there are
   use cases where user space needs to walk all stacktrace map
   entries e.g. for dumping or deleting map entries w/o having to
   close and recreate the map. Add BPF selftests along with it,
   from Yonghong.

5) Few follow-up cleanups for the bpftool cgroup code: i) rename
   the cgroup 'list' command into 'show' as we have it for other
   subcommands as well, ii) then alias the 'show' command such that
   'list' is accepted which is also common practice in iproute2,
   and iii) remove couple of newlines from error messages using
   p_err(), from Jakub.

6) Two follow-up cleanups to sockmap code: i) remove the unused
   bpf_compute_data_end_sk_skb() function and ii) only build the
   sockmap infrastructure when CONFIG_INET is enabled since it's
   only aware of TCP sockets at this time, from John.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-07 21:26:31 -05:00
Jesper Dangaard Brouer
ae75415de1 mlx4: setup xdp_rxq_info
Driver hook points for xdp_rxq_info:
 * reg  : mlx4_en_create_rx_ring
 * unreg: mlx4_en_destroy_rx_ring

Tested on actual hardware.

Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:21 -08:00
Jesper Dangaard Brouer
0ddf543226 xdp/mlx5: setup xdp_rxq_info
The mlx5 driver have a special drop-RQ queue (one per interface) that
simply drops all incoming traffic. It helps driver keep other HW
objects (flow steering) alive upon down/up operations.  It is
temporarily pointed by flow steering objects during the interface
setup, and when interface is down. It lacks many fields that are set
in a regular RQ (for example its state is never switched to
MLX5_RQC_STATE_RDY). (Thanks to Tariq Toukan for explanation).

The XDP RX-queue info for this drop-RQ marked as unused, which
allow us to use the same takedown/free code path as other RX-queues.

Driver hook points for xdp_rxq_info:
 * reg   : mlx5e_alloc_rq()
 * unused: mlx5e_alloc_drop_rq()
 * unreg : mlx5e_free_rq()

Tested on actual hardware with samples/bpf program

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:20 -08:00
Arnd Bergmann
74bd5d56bf net/mlx5e: hide an unused variable
The uplink_rpriv variable was added at the start of the function but
only used inside of an #ifdef:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_route_lookup_ipv6':
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:1549:25: error: unused variable 'uplink_rpriv' [-Werror=unused-variable]

This moves the declaration into that #ifdef as well.

Fixes: 5ed99fb421 ("net/mlx5e: Move ethernet representors data into separate struct")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-05 10:55:34 -05:00
Ido Schimmel
90045fc9c7 mlxsw: spectrum: Relax sanity checks during enslavement
Since commit 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that
have uppers") the driver forbids enslavement to netdevs that already
have uppers of their own, as this can result in various ordering
problems.

This requirement proved to be too strict for some users who need to be
able to enslave ports to a bridge that already has uppers. In this case,
we can allow the enslavement if the bridge is already known to us, as
any configuration performed on top of the bridge was already reflected
to the device.

Fixes: 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that have uppers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:38:26 -05:00
Ido Schimmel
8764a8267b mlxsw: spectrum_router: Fix NULL pointer deref
When we remove the neighbour associated with a nexthop we should always
refuse to write the nexthop to the adjacency table. Regardless if it is
already present in the table or not.

Otherwise, we risk dereferencing the NULL pointer that was set instead
of the neighbour.

Fixes: a7ff87acd9 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02 12:37:16 -05:00
David S. Miller
6bb8824732 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/ipv6/ip6_gre.c is a case of parallel adds.

include/trace/events/tcp.h is a little bit more tricky.  The removal
of in-trace-macro ifdefs in 'net' paralleled with moving
show_tcp_state_name and friends over to include/trace/events/sock.h
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-29 15:42:26 -05:00
Linus Torvalds
19286e4a7a Third pull request for 4.15-rc
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
   The problem was a driver bug in the handling of shutting down a QP.
 - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
   unwind and a use after free bug
 - Improper congestion counter values on mlx5 when link aggregation is enabled
 - ipoib lockdep regression introduced in this merge window
 - hfi1 regression supporting the device in a VM introduced in a recent patch
 - Typo that breaks future uAPI compatibility in the verbs core
 - More SELinux related oops fixing
 - Fix an oops during error unwind in mlx5
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
 LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
 ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
 aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
 dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
 NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
 Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
 WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
 lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
 uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
 4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
 395aK2aZSShIkZvIwbI8
 =IkGi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "This is the next batch of for-rc patches from RDMA. It includes the
  fix for the ipoib regression I mentioned last time, and the result of
  a fairly major debugging effort to get iser working reliably on cxgb4
  hardware - it turns out the cxgb4 driver was not handling QP error
  flushing properly causing iser to fail.

   - cxgb4 fix for an iser testing failure as debugged by Steve and
     Sagi. The problem was a driver bug in the handling of shutting down
     a QP.

   - Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
     error unwind and a use after free bug

   - Improper congestion counter values on mlx5 when link aggregation is
     enabled

   - ipoib lockdep regression introduced in this merge window

   - hfi1 regression supporting the device in a VM introduced in a
     recent patch

   - Typo that breaks future uAPI compatibility in the verbs core

   - More SELinux related oops fixing

   - Fix an oops during error unwind in mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/mlx5: Fix mlx5_ib_alloc_mr error flow
  IB/core: Verify that QP is security enabled in create and destroy
  IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
  IB/mlx5: Serialize access to the VMA list
  IB/hfi: Only read capability registers if the capability exists
  IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
  IB/mlx5: Fix congestion counters in LAG mode
  RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
  RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
  RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
  iw_cxgb4: when flushing, complete all wrs in a chain
  iw_cxgb4: reflect the original WR opcode in drain cqes
  iw_cxgb4: Only validate the MSN for successful completions
2017-12-28 23:06:01 -08:00
David S. Miller
d367341b25 mlx5-shared-4.16-1
mlx5 shared code for both rdma-next and net-next trees.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJaRXPQAAoJEEg/ir3gV/o+H4wH/2CkV3tOLfRekNd4CFSoH78A
 zH0Gjwa3P7aXybTmhXbMNCYLEoVEZ5pSlToOmjz1FrmxhH62JQ80WyKOcYtiHMBg
 3x5tFZboLc9tMGwPhyBJBjyiH+Gh9ZMoD6hBFgSvIG/hNPUb1W48/Pc+R61gOrMw
 6ADU+6mIf5cHNQ4c/V/SBlfiQjSXN4Y38knhTeZy8dLcZZVg1eMn+pj7W/haAyb6
 t3IMEaUmlDYwQmtxTT2snK4VutEPfxYGv1gyKSkZXmY74aRvSzlgV7PqXM3qsV4W
 8ZEhEHZJGi6NXC2hk5FQSSPWhQOhAmpjTHm8aImK0SIf68YajjzaZnT9S+eMmdY=
 =uMjj
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-shared-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 E-Switch updates 2017-12-19

This series includes updates for mlx5 E-Switch infrastructures,
to be merged into net-next and rdma-next trees.

Mark's patches provide E-Switch refactoring that generalize the mlx5
E-Switch vf representors interfaces and data structures. The serious is
mainly focused on moving ethernet (netdev) specific representors logic out
of E-Switch (eswitch.c) into mlx5e representor module (en_rep.c), which
provides better separation and allows future support for other types of vf
representors (e.g. RDMA).

Gal's patches at the end of this serious, provide a simple syntax fix and
two other patches that handles vport ingress/egress ACL steering name
spaces to be aligned with the Firmware/Hardware specs.

V1->V2:
 - Addressed coding style comments in patches #1 and #7
 - The series is still based on rc4, as now I see net-next is also @rc4.

V2->V3:
 - Fixed compilation warning, reported by Dave.

Please pull and let me know if there's any problem.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 19:32:59 -05:00
Gal Pressman
9b93ab981e net/mlx5: Separate ingress/egress namespaces for each vport
Each vport has its own root flow table for the ACL flow tables and root
flow table is per namespace, therefore we should create a namespace for
each vport.

Fixes: efdc810ba3 ("net/mlx5: Flow steering, Add vport ACL support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
4484e29948 net/mlx5: Fix ingress/egress naming mistake
The functions names do not represent their actions, switch the mistaken
ingress/egress naming.

Fixes: fba53f7b57 ("net/mlx5: Introduce mlx5_flow_steering structure")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:52 +02:00
Gal Pressman
18a89ab766 net/mlx5e: E-Switch, Use the name of static array instead of its address
Using the address of a static array is the same as using its name (in
this specific use-case), but it's confusing and makes the code less
readable.

Fixes: 1bd27b11c1 ("net/mlx5: Introduce E-switch QoS management")
Fixes: bd77bf1cb5 ("net/mlx5: Add SRIOV VF max rate configuration support")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
2c47bf80e8 net/mlx5e: E-Switch, Move send-to-vport rule struct to en_rep
Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch
code to mlx5e and rename it to better reflect where it belongs

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:51 +02:00
Mark Bloch
a4b97ab421 net/mlx5: E-Switch, Create generic header struct to be used by representors
Now that we don't store type dependent data in struct mlx5_eswitch_rep
we can create a generic interface, and representor type.

struct mlx5_eswitch_rep will store an array of interfaces, each
interface is used by a different representor type.

Once we moved to a more generic interface, rdma driver representors can
be added and utilize the same mechanism as the Ethernet driver
representors use.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-29 00:43:50 +02:00
Moni Shoua
a42b63c1ac net/mlx4_en: Change default QoS settings
Change the default mapping between TC and TCG as follows:

Prio     |             TC/TCG
         |      from             to
         |    (set by FW)      (set by SW)
---------+-----------------------------------
0        |      0/0              0/7
1        |      1/0              0/6
2        |      2/0              0/5
3        |      3/0              0/4
4        |      4/0              0/3
5        |      5/0              0/2
6        |      6/0              0/1
7        |      7/0              0/0

These new settings cause that a pause frame for any prio stops
traffic for all prios.

Fixes: 564c274c3d ("net/mlx4_en: DCB QoS support")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
fd4a3e2828 net/mlx4_core: Cleanup FMR unmapping flow
Remove redundant and not essential operations in fmr unmap/free.
According to device spec, in FMR unmap it is sufficient to set
ownership bit to SW. This allows remapping afterwards.

Fixes: 8ad11fb6b0 ("IB/mlx4: Implement FMRs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
dc484851ed net/mlx4_en: RX csum, reorder branches
Use early goto commands, and save else branches.
This uses less indentations and brackets, making the code
more readable.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Tariq Toukan
345ef18c24 net/mlx4_en: RX csum, remove redundant branches and checks
Do not check IPv6 bit in cqe status if CONFIG_IPV6 is not enabled.
Function check_csum() is reached only with IPv4 or IPv6 set (if enabled),
if IPv6 is not set (or is not enabled) it is redundant to test the
IPv4 bit.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-28 12:24:05 -05:00
Mark Bloch
5ed99fb421 net/mlx5e: Move ethernet representors data into separate struct
Ethernet representors have a need to store data which is applicable
only for them. Create a priv void pointer in struct mlx5_eswitch_rep
and move mlx5e to store the relevant data there. As part of this change
we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the
E-Switch code will copy a priv value which is garbage.

We also rename mlx5_eswitch_get_uplink_netdev() to
mlx5_eswitch_get_uplink_priv() and make it return void *.
This way E-Switch code doesn't need to deal with net devices and
we leave the task of getting it to mlx5e.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
159fe63922 net/mlx5: E-Switch, Create a dedicated send to vport rule deletion function
In order for representors to send packets directly to VFs we use an
E-Switch function which insert special rules into the HW. For symmetry
create an E-Switch function that deletes these rules as well.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
f7a68945a5 net/mlx5: E-Switch, Move mlx5e only logic outside E-Switch
In our pursuit to cleanup e-switch sub-module from mlx5e specific code,
we move the functions that insert/remove the flow steering rules that
allow mlx5e representors to send packets directly to VFs into the EN
driver code.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
4c66df01f5 net/mlx5: E-Switch, Simplify representor load/unload callback API
In the load() callback for loading representors we don't really need
struct mlx5_eswitch but struct mlx5_core_dev, pass it directly.

In the unload() callback for unloading representors we don't need the
struct mlx5_eswitch argument, remove it.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
6ed1803abe net/mlx5: E-Switch, Refactor load/unload of representors
Refactor the load/unload stages for better code reuse.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Mark Bloch
e8d31c4d65 net/mlx5: E-Switch, Refactor vport representors initialization
Refactor the init stage of vport representors registration.
vport number and hw id can be assigned by the E-Switch driver and not by
the netdevice driver. While here, make the error path of mlx5_eswitch_init()
a reverse order of the good path, also use kcalloc to allocate an array
instead of kzalloc.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-28 12:36:33 +02:00
Jason Gunthorpe
76a895d9e1 Merge branch 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
Patches for 4.16 that are dependent on patches sent to 4.15-rc.

These are small clean ups for the vmw_pvrdma and i40iw drivers.

* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
  RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
  RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
  RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
  RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
  RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
  i40iw: Change accelerated flag to bool
2017-12-27 21:50:46 -07:00
David S. Miller
fba961ab29 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of overlapping changes.  Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.

Include a necessary change by Jakub Kicinski, with log message:

====================
cls_bpf no longer takes care of offload tracking.  Make sure
netdevsim performs necessary checks.  This fixes a warning
caused by TC trying to remove a filter it has not added.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-22 11:16:31 -05:00
Majd Dibbiny
71a0ff65a2 IB/mlx5: Fix congestion counters in LAG mode
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.

Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2017-12-21 16:06:07 -07:00
Moshe Shemesh
a2fba188fd net/mlx5: Stay in polling mode when command EQ destroy fails
During unload, on mlx5_stop_eqs we move command interface from events
mode to polling mode, but if command interface EQ destroy fail we move
back to events mode.
That's wrong since even if we fail to destroy command interface EQ, we
do release its irq, so no interrupts will be received.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:05 +02:00
Moshe Shemesh
d6b2785cd5 net/mlx5: Cleanup IRQs in case of unload failure
When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
In such failure flow the function will return without
releasing all EQs irqs and then pci_free_irq_vectors will fail.
Fix by only warn on destroy EQ failure and continue to release other
EQs and their irqs.

It fixes the following kernel trace:
kernel: kernel BUG at drivers/pci/msi.c:352!
...
...
kernel: Call Trace:
kernel: pci_disable_msix+0xd3/0x100
kernel: pci_free_irq_vectors+0xe/0x20
kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:05 +02:00
Maor Gottlieb
139ed6c6c4 net/mlx5: Fix steering memory leak
Flow steering priority and namespace are software only objects that
didn't have the proper destructors and were not freed during steering
cleanup.

Fix it by adding destructor functions for these objects.

Fixes: bd71b08ec2 ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:04 +02:00
Gal Pressman
0c1cc8b221 net/mlx5e: Prevent possible races in VXLAN control flow
When calling add/remove VXLAN port, a lock must be held in order to
prevent race scenarios when more than one add/remove happens at the
same time.
Fix by holding our state_lock (mutex) as done by all other parts of the
driver.
Note that the spinlock protecting the radix-tree is still needed in
order to synchronize radix-tree access from softirq context.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:03 +02:00
Gal Pressman
23f4cc2cd9 net/mlx5e: Add refcount to VXLAN structure
A refcount mechanism must be implemented in order to prevent unwanted
scenarios such as:
- Open an IPv4 VXLAN interface
- Open an IPv6 VXLAN interface (different socket)
- Remove one of the interfaces

With current implementation, the UDP port will be removed from our VXLAN
database and turn off the offloads for the other interface, which is
still active.
The reference count mechanism will only allow UDP port removals once all
consumers are gone.

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:03 +02:00
Gal Pressman
6323514116 net/mlx5e: Fix possible deadlock of VXLAN lock
mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
context) and mlx5e_features_check (softirq), but the lock acquired does
not disable bottom half and might result in deadlock. Fix it by simply
replacing spin_lock() with spin_lock_bh().
While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().

lockdep's WARNING: inconsistent lock state
[  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
[  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[  654.028528] {SOFTIRQ-ON-W} state was registered at:
[  654.028607]   _raw_spin_lock+0x3c/0x70
[  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
[  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
[  654.028878]   process_one_work+0x1e9/0x640
[  654.028942]   worker_thread+0x4a/0x3f0
[  654.029002]   kthread+0x141/0x180
[  654.029056]   ret_from_fork+0x24/0x30
[  654.029114] irq event stamp: 579088
[  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
[  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
[  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
[  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
[  654.029684] other info that might help us debug this:
[  654.029781]  Possible unsafe locking scenario:

[  654.029868]        CPU0
[  654.029908]        ----
[  654.029947]   lock(&(&vxlan_db->lock)->rlock);
[  654.030045]   <Interrupt>
[  654.030090]     lock(&(&vxlan_db->lock)->rlock);
[  654.030162]
 *** DEADLOCK ***

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:02 +02:00
Moni Shoua
dbff26e44d net/mlx5: Fix error flow in CREATE_QP command
In error flow, when DESTROY_QP command should be executed, the wrong
mailbox was set with data, not the one that is written to hardware,
Fix that.

Fixes: 09a7d9eca1 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:02 +02:00
Eugenia Emantayev
777ec2b2a3 net/mlx5: Fix misspelling in the error message and comment
Fix misspelling in word syndrome.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:01 +02:00
Eugenia Emantayev
696a97cf9f net/mlx5e: Fix defaulting RX ring size when not needed
Fixes the bug when turning on/off CQE compression mechanism
resets the RX rings size to default value when it is not
needed.

Fixes: 2fc4bfb725 ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:00 +02:00
Gal Pressman
2989ad1ec0 net/mlx5e: Fix features check of IPv6 traffic
The assumption that the next header field contains the transport
protocol is wrong for IPv6 packets with extension headers.
Instead, we should look the inner-most next header field in the buffer.
This will fix TSO offload for tunnels over IPv6 with extension headers.

Performance testing: 19.25x improvement, cool!
Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap.
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
TSO: Enabled
Before: 4,926.24  Mbps
Now   : 94,827.91 Mbps

Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:24:00 +02:00
Huy Nguyen
ff0891915c net/mlx5e: Fix ETS BW check
Fix bug that allows ets bw sum to be 0% when ets tc type exists.

Fixes: 08fb1dacdd ('net/mlx5e: Support DCBNL IEEE ETS')
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:59 +02:00
Eran Ben Elisha
37e92a9d4f net/mlx5: Fix rate limit packet pacing naming and struct
In mlx5_ifc, struct size was not complete, and thus driver was sending
garbage after the last defined field. Fixed it by adding reserved field
to complete the struct size.

In addition, rename all set_rate_limit to set_pp_rate_limit to be
compliant with the Firmware <-> Driver definition.

Fixes: 7486216b3a ("{net,IB}/mlx5: mlx5_ifc updates")
Fixes: 1466cc5b23 ("net/mlx5: Rate limit tables support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
Saeed Mahameed
231243c827 Revert "mlx5: move affinity hints assignments to generic code"
Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with  -EIO

This reverts commit a435393aca.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393aca ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jes Sorensen <jsorensen@fb.com>
Reported-by: Jes Sorensen <jsorensen@fb.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:58 +02:00
Kamal Heib
bae115a2bb net/mlx5: FPGA, return -EINVAL if size is zero
Currently, if a size of zero is passed to
mlx5_fpga_mem_{read|write}_i2c()
the "err" return value will not be initialized, which triggers gcc
warnings:

[..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error:
uninitialized symbol 'err'.
[..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error:
uninitialized symbol 'err'.

fix that.

Fixes: a9956d35d1 ('net/mlx5: FPGA, Add SBU infrastructure')
Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-12-19 23:23:57 +02:00
Petr Machata
8ba6b30ef7 mlxsw: spectrum_router: Remove batch neighbour deletion causing FW bug
This reverts commit 63dd00fa3e.

RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
error code.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Fixes: 63dd00fa3e ("mlxsw: spectrum_router: Add batch neighbour deletion")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-19 11:08:27 -05:00
David S. Miller
c30abd5e40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-16 22:11:55 -05:00
Yuval Mintz
fccff08628 mlxsw: spectrum: Disable MAC learning for ovs port
Learning is currently enabled for ports which are OVS slaves -
even though OVS doesn't need this indication.
Since we're not associating a fid with the port, HW would continuously
notify driver of learned [& aged] MACs which would be logged as errors.

Fixes: 2b94e58df5 ("mlxsw: spectrum: Allow ports to work under OVS master")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 10:47:36 -05:00
Eran Ben Elisha
5a1647c391 net/mlx4_en: Fill all counters under one call of stats lock
Before this patch, the stats_lock was acquired twice. In between the
locks Driver sent command to gather some more statistics (per priority
and counter statistics). If the stats lock was acquired by get
statistics NDO in between we would have report out of sync counters.

Fix this by collecting all stats from Firmware in advance and then
fill the Software structs under one lock.

Fixes: 0b131561a7 ("net/mlx4_en: Add Flow control statistics display via ethtool")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:37 -05:00
Eran Ben Elisha
0bb9fc4f54 net/mlx4_core: Fix wrong calculation of free counters
The field res_free indicates the total number of counters which are
available for allocation (reserved and unreserved). Fixed a bug where
the reserved counters were subtracted from res_free before any
allocation was performed.

Before this fix, free counters which were not reserved could not be
allocated.

Fixes: 9de92c60be ("net/mlx4_core: Adjust counter grant policy in the resource tracker")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Eugenia Emantayev
78034f5fdd net/mlx4_en: Fix selftest for small MTUs
Set the minimal MTU threshold for running loopback selftest.
MTU should be big enough to include packet payload, NET_IP_ALIGN,
Ethernet headers and preamble length.

Fixes: e7c1c2c462 ("mlx4_en: Added self diagnostics test implementation")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-13 16:38:36 -05:00
Jiri Pirko
9454d9307e mlxsw: spectrum: handle NETIF_F_HW_TC changes correctly
Currently, whenever the NETIF_F_HW_TC feature changes, we silently
always allow it, but we actually do not disable the flows in HW
on disable. That breaks user's expectations. So just forbid
the feature disable in case there are any filters offloaded.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06 15:11:17 -05:00
Cong Wang
9f8a739e72 act_mirred: get rid of tcfm_ifindex from struct tcf_mirred
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.

If we would support moving target netdev across netns, using
pointer would be better than ifindex.

This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.

Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06 14:50:13 -05:00
Jakub Kicinski
bd0b2e7fe6 net: xdp: make the stack take care of the tear down
Since day one of XDP drivers had to remember to free the program
on the remove path.  This leads to code duplication and is error
prone.  Make the stack query the installed programs on unregister
and if something is installed, remove the program.  Freeing of
program attached to XDP generic is moved from free_netdev() as well.

Because the remove will now be called before notifiers are
invoked, BPF offload state of the program will not get destroyed
before uninstall.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-03 00:27:57 +01:00
Petr Machata
09dbf6297f mlxsw: spectrum_router: Update nexthop RIF on update
The function mlxsw_sp_nexthop_rif_update() walks the list of nexthops
associated with a RIF, and updates the corresponding entries in the
switch. It is used in particular when a tunnel underlay netdevice moves
to a different VRF, and all the nexthops are migrated over to a new RIF.
The problem is that each nexthop holds a reference to its RIF, and that
is not updated. So after the old RIF is gone, further activity on these
nexthops (such as downing the underlay netdevice) dereferences a
dangling pointer.

Fix the issue by updating rif of impacted nexthops before calling
mlxsw_sp_nexthop_rif_update().

Fixes: 0c5f1cd5ba ("mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 09:55:48 -05:00
Petr Machata
d97cda5f46 mlxsw: spectrum_router: Handle encap to demoted tunnels
Some tunnels that are offloadable on their own can nonetheless be
demoted to slow path if their local address is in conflict with that of
another tunnel. When a route is formed for such a tunnel,
mlxsw_sp_nexthop_ipip_init() fails to find the corresponding IPIP entry,
and that triggers a FIB abort.

Resolve the problem by not assuming that a tunnel for which
mlxsw_sp_ipip_ops.can_offload() holds also automatically has an IPIP
entry.

Fixes: af641713e9 ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 09:55:47 -05:00
Petr Machata
cab43d9c87 mlxsw: spectrum_router: Demote tunnels on VRF migration
The mlxsw driver currently doesn't offload GRE tunnels if they have the
same local address and use the same underlay VRF. When such a situation
arises, the tunnels in conflict are demoted to slow path.

However, the current code only verifies this condition on tunnel
creation and tunnel change, not when a tunnel is moved to a different
VRF. When the tunnel has no bound device, underlay and overlay are the
same. Thus moving a tunnel moves the underlay as well, and that can
cause local address conflict.

So modify mlxsw_sp_netdevice_ipip_ol_vrf_event() to check if there are
any conflicting tunnels, and demote them if yes.

Fixes: af641713e9 ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 09:55:47 -05:00
Petr Machata
57c77ce470 mlxsw: spectrum_router: Offload decap only for up tunnels
When a new local route is added, an IPIP entry is looked up to determine
whether the route should be offloaded as a tunnel decap or as a trap.
That decision should take into account whether the tunnel netdevice in
question is actually IFF_UP, and only install a decap offload if it is.

Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-28 09:55:47 -05:00
Ido Schimmel
bf4e9f24a8 mlxsw: spectrum: Do not try to create non-existing ports during unsplit
On some systems, when we unsplit a port we need to re-create two ports
instead. On other systems, only one needs to be re-created.

Do not try to create a port if during driver initialization it was
assigned a negative module number, which is invalid.

This avoids the following error during unsplit:
[  941.012478] mlxsw_spectrum 0000:01:00.0: Port 43: Failed to map module

The error is harmless and caused by the fact that a local port is
already mapped to module 0.

Fixes: be94535f95 ("mlxsw: spectrum: Make split flow match firmware requirements")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-21 20:15:22 +09:00
Linus Torvalds
7c225c69f8 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - a few misc bits

 - ocfs2 updates

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
  memory hotplug: fix comments when adding section
  mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
  mm: simplify nodemask printing
  mm,oom_reaper: remove pointless kthread_run() error check
  mm/page_ext.c: check if page_ext is not prepared
  writeback: remove unused function parameter
  mm: do not rely on preempt_count in print_vma_addr
  mm, sparse: do not swamp log with huge vmemmap allocation failures
  mm/hmm: remove redundant variable align_end
  mm/list_lru.c: mark expected switch fall-through
  mm/shmem.c: mark expected switch fall-through
  mm/page_alloc.c: broken deferred calculation
  mm: don't warn about allocations which stall for too long
  fs: fuse: account fuse_inode slab memory as reclaimable
  mm, page_alloc: fix potential false positive in __zone_watermark_ok
  mm: mlock: remove lru_add_drain_all()
  mm, sysctl: make NUMA stats configurable
  shmem: convert shmem_init_inodecache() to void
  Unify migrate_pages and move_pages access checks
  mm, pagevec: rename pagevec drained field
  ...
2017-11-15 19:42:40 -08:00
Mel Gorman
453f85d43f mm: remove __GFP_COLD
As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of.  Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact.  Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache.  Hence, there is only a potential
benefit for microbenchmarks that alloc/free pages in a tight loop.  It's
even worse when THP is taken into account which has little or no chance
of getting a cache-hot page as the per-cpu list is bypassed and the
zeroing of multiple pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path.  A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising
given that the __GFP_COLD branches are a miniscule percentage of the
fault path.

Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Linus Torvalds
ad0835a930 Updates for 4.15 kernel merge window
- Add iWARP support to qedr driver
 - Lots of misc fixes across subsystem
 - Multiple update series to hns roce driver
 - Multiple update series to hfi1 driver
 - Updates to vnic driver
 - Add kref to wait struct in cxgb4 driver
 - Updates to i40iw driver
 - Mellanox shared pull request
 - timer_setup changes
 - massive cleanup series from Bart Van Assche
 - Two series of SRP/SRPT changes from Bart Van Assche
 - Core updates from Mellanox
 - i40iw updates
 - IPoIB updates
 - mlx5 updates
 - mlx4 updates
 - hns updates
 - bnxt_re fixes
 - PCI write padding support
 - Sparse/Smatch/warning cleanups/fixes
 - CQ moderation support
 - SRQ support in vmw_pvrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
 eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
 3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
 V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
 b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
 GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
 TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
 E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
 EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
 jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
 Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
 3JMrjjLxtCIvpUyzCvDW
 =A1oL
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a fairly plain pull request. Lots of driver updates across the
  stack, a huge number of static analysis cleanups including a close to
  50 patch series from Bart Van Assche, and a number of new features
  inside the stack such as general CQ moderation support.

  Nothing really stands out, but there might be a few conflicts as you
  take things in. In particular, the cleanups touched some of the same
  lines as the new timer_setup changes.

  Everything in this pull request has been through 0day and at least two
  days of linux-next (since Stephen doesn't necessarily flag new
  errors/warnings until day2). A few more items (about 30 patches) from
  Intel and Mellanox showed up on the list on Tuesday. I've excluded
  those from this pull request, and I'm sure some of them qualify as
  fixes suitable to send any time, but I still have to review them
  fully. If they contain mostly fixes and little or no new development,
  then I will probably send them through by the end of the week just to
  get them out of the way.

  There was a break in my acceptance of patches which coincides with the
  computer problems I had, and then when I got things mostly back under
  control I had a backlog of patches to process, which I did mostly last
  Friday and Monday. So there is a larger number of patches processed in
  that timeframe than I was striving for.

  Summary:
   - Add iWARP support to qedr driver
   - Lots of misc fixes across subsystem
   - Multiple update series to hns roce driver
   - Multiple update series to hfi1 driver
   - Updates to vnic driver
   - Add kref to wait struct in cxgb4 driver
   - Updates to i40iw driver
   - Mellanox shared pull request
   - timer_setup changes
   - massive cleanup series from Bart Van Assche
   - Two series of SRP/SRPT changes from Bart Van Assche
   - Core updates from Mellanox
   - i40iw updates
   - IPoIB updates
   - mlx5 updates
   - mlx4 updates
   - hns updates
   - bnxt_re fixes
   - PCI write padding support
   - Sparse/Smatch/warning cleanups/fixes
   - CQ moderation support
   - SRQ support in vmw_pvrdma"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
  RDMA/core: Rename kernel modify_cq to better describe its usage
  IB/mlx5: Add CQ moderation capability to query_device
  IB/mlx4: Add CQ moderation capability to query_device
  IB/uverbs: Add CQ moderation capability to query_device
  IB/mlx5: Exposing modify CQ callback to uverbs layer
  IB/mlx4: Exposing modify CQ callback to uverbs layer
  IB/uverbs: Allow CQ moderation with modify CQ
  iw_cxgb4: atomically flush the qp
  iw_cxgb4: only call the cq comp_handler when the cq is armed
  iw_cxgb4: Fix possible circular dependency locking warning
  RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
  IB/core: Only maintain real QPs in the security lists
  IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
  RDMA/core: Make function rdma_copy_addr return void
  RDMA/vmw_pvrdma: Add shared receive queue support
  RDMA/core: avoid uninitialized variable warning in create_udata
  RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
  RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
  RDMA/bnxt_re: Set QP state in case of response completion errors
  RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
  ...
2017-11-15 14:54:53 -08:00
Linus Torvalds
5bbcc0f595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Maintain the TCP retransmit queue using an rbtree, with 1GB
      windows at 100Gb this really has become necessary. From Eric
      Dumazet.

   2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

   3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
      Lunn.

   4) Add meter action support to openvswitch, from Andy Zhou.

   5) Add a data meta pointer for BPF accessible packets, from Daniel
      Borkmann.

   6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

   7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

   8) More work to move the RTNL mutex down, from Florian Westphal.

   9) Add 'bpftool' utility, to help with bpf program introspection.
      From Jakub Kicinski.

  10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
      Dangaard Brouer.

  11) Support 'blocks' of transformations in the packet scheduler which
      can span multiple network devices, from Jiri Pirko.

  12) TC flower offload support in cxgb4, from Kumar Sanghvi.

  13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
      Leitner.

  14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

  15) Add RED qdisc offloadability, and use it in mlxsw driver. From
      Nogah Frankel.

  16) eBPF based device controller for cgroup v2, from Roman Gushchin.

  17) Add some fundamental tracepoints for TCP, from Song Liu.

  18) Remove garbage collection from ipv6 route layer, this is a
      significant accomplishment. From Wei Wang.

  19) Add multicast route offload support to mlxsw, from Yotam Gigi"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
  tcp: highest_sack fix
  geneve: fix fill_info when link down
  bpf: fix lockdep splat
  net: cdc_ncm: GetNtbFormat endian fix
  openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
  netem: remove unnecessary 64 bit modulus
  netem: use 64 bit divide by rate
  tcp: Namespace-ify sysctl_tcp_default_congestion_control
  net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
  ipv6: set all.accept_dad to 0 by default
  uapi: fix linux/tls.h userspace compilation error
  usbnet: ipheth: prevent TX queue timeouts when device not ready
  vhost_net: conditionally enable tx polling
  uapi: fix linux/rxrpc.h userspace compilation errors
  net: stmmac: fix LPI transitioning for dwmac4
  atm: horizon: Fix irq release error
  net-sysfs: trigger netlink notification on ifalias change via sysfs
  openvswitch: Using kfree_rcu() to simplify the code
  openvswitch: Make local function ovs_nsh_key_attr_size() static
  openvswitch: Fix return value check in ovs_meter_cmd_features()
  ...
2017-11-15 11:56:19 -08:00
Ido Schimmel
63dd00fa3e mlxsw: spectrum_router: Add batch neighbour deletion
In commit 4a3c67a6e7 ("mlxsw: spectrum_router: Don't batch neighbour
deletion") I removed the support for batch deletion of neighbours on a
router interface (RIF) since at that time the firmware did not support
it for IPv6 neighbours.

This is now supported by the version enforced by the driver, so there is
no reason to delete neighbours one by one anymore.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 21:17:07 +09:00
Shalom Toledo
2f53fbd521 mlxsw: spectrum: Update minimum firmware version to 13.1530.152
This new firmware contains:
 - Support Spectrum A1 revision
 - Batch deletion of IPv6 neighbours
 - Remove incorrect VPD capability

Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 21:17:07 +09:00
Linus Torvalds
8e9a2dba86 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle are:

   - Another attempt at enabling cross-release lockdep dependency
     tracking (automatically part of CONFIG_PROVE_LOCKING=y), this time
     with better performance and fewer false positives. (Byungchul Park)

   - Introduce lockdep_assert_irqs_enabled()/disabled() and convert
     open-coded equivalents to lockdep variants. (Frederic Weisbecker)

   - Add down_read_killable() and use it in the VFS's iterate_dir()
     method. (Kirill Tkhai)

   - Convert remaining uses of ACCESS_ONCE() to
     READ_ONCE()/WRITE_ONCE(). Most of the conversion was Coccinelle
     driven. (Mark Rutland, Paul E. McKenney)

   - Get rid of lockless_dereference(), by strengthening Alpha atomics,
     strengthening READ_ONCE() with smp_read_barrier_depends() and thus
     being able to convert users of lockless_dereference() to
     READ_ONCE(). (Will Deacon)

   - Various micro-optimizations:

        - better PV qspinlocks (Waiman Long),
        - better x86 barriers (Michael S. Tsirkin)
        - better x86 refcounts (Kees Cook)

   - ... plus other fixes and enhancements. (Borislav Petkov, Juergen
     Gross, Miguel Bernal Marin)"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE
  rcu: Use lockdep to assert IRQs are disabled/enabled
  netpoll: Use lockdep to assert IRQs are disabled/enabled
  timers/posix-cpu-timers: Use lockdep to assert IRQs are disabled/enabled
  sched/clock, sched/cputime: Use lockdep to assert IRQs are disabled/enabled
  irq_work: Use lockdep to assert IRQs are disabled/enabled
  irq/timings: Use lockdep to assert IRQs are disabled/enabled
  perf/core: Use lockdep to assert IRQs are disabled/enabled
  x86: Use lockdep to assert IRQs are disabled/enabled
  smp/core: Use lockdep to assert IRQs are disabled/enabled
  timers/hrtimer: Use lockdep to assert IRQs are disabled/enabled
  timers/nohz: Use lockdep to assert IRQs are disabled/enabled
  workqueue: Use lockdep to assert IRQs are disabled/enabled
  irq/softirqs: Use lockdep to assert IRQs are disabled/enabled
  locking/lockdep: Add IRQs disabled/enabled assertion APIs: lockdep_assert_irqs_enabled()/disabled()
  locking/pvqspinlock: Implement hybrid PV queued/unfair locks
  locking/rwlocks: Fix comments
  x86/paravirt: Set up the virt_spin_lock_key after static keys get initialized
  block, locking/lockdep: Assign a lock_class per gendisk used for wait_for_completion()
  workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
  ...
2017-11-13 12:38:26 -08:00
Slava Shwartsman
a1b8714593 net/mlx4: Use Kconfig flag to remove support of old gen2 Mellanox devices
Since Mellanox focus is on newer adapters, we would like to have the
ability to disable the support for old gen2 adapters.

This can be done by turning off the MLX4_CORE_GEN2 Kconfig flag.
We keep it turned on by default.

Signed-off-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-13 10:27:51 +09:00
David S. Miller
fdae5f37a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-11-12 09:17:05 +09:00
Eugenia Emantayev
d1c61e6d79 net/mlx5e: Increase Striding RQ minimum size limit to 4 multi-packet WQEs
This is to prevent the case of working with a single MPWQE
(1 WQE is always reserved as RQ is linked-list).
When the WQE is fully consumed, HW should still have available buffer
in order not to drop packets.

Fixes: 461017cb00 ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10 15:39:21 +09:00
Inbar Karmy
2e50b26195 net/mlx5e: Set page to null in case dma mapping fails
Currently, when dma mapping fails, put_page is called,
but the page is not set to null. Later, in the page_reuse treatment in
mlx5e_free_rx_descs(), mlx5e_page_release() is called for the second time,
improperly doing dma_unmap (for a non-mapped address) and an extra put_page.
Prevent this by nullifying the page pointer when dma_map fails.

Fixes: accd588332 ("net/mlx5e: Introduce RX Page-Reuse")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10 15:39:21 +09:00
Saeed Mahameed
2a8d6065e7 net/mlx5e: Fix napi poll with zero budget
napi->poll can be called with budget 0, e.g. in netpoll scenarios
where the caller only wants to poll TX rings
(poll_one_napi@net/core/netpoll.c).

The below commit changed RX polling from "while" loop to "do {} while",
which caused to ignore the initial budget and handle at least one RX
packet.

This fixes the following warning:
[ 2852.049194] mlx5e_napi_poll+0x0/0x260 [mlx5_core] exceeded budget in poll
[ 2852.049195] ------------[ cut here ]------------
[ 2852.049195] WARNING: CPU: 0 PID: 25691 at net/core/netpoll.c:171 netpoll_poll_dev+0x18a/0x1a0

Fixes: 4b7dfc9925 ("net/mlx5e: Early-return on empty completion queues")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Martin KaFai Lau <kafai@fb.com>
Cc: kernel-team@fb.com
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10 15:39:20 +09:00
Huy Nguyen
d2aa060d40 net/mlx5: Cancel health poll before sending panic teardown command
After the panic teardown firmware command, health_care detects the error
in PCI bus and calls the mlx5_pci_err_detected. This health_care flow is
no longer needed because the panic teardown firmware command will bring
down the PCI bus communication with the HCA.

The solution is to cancel the health care timer and its pending
workqueue request before sending panic teardown firmware command.

Kernel trace:
mlx5_core 0033:01:00.0: Shutdown was called
mlx5_core 0033:01:00.0: health_care:154:(pid 9304): handling bad device here
mlx5_core 0033:01:00.0: mlx5_handle_bad_state:114:(pid 9304): NIC state 1
mlx5_core 0033:01:00.0: mlx5_pci_err_detected was called
mlx5_core 0033:01:00.0: mlx5_enter_error_state:96:(pid 9304): start
mlx5_3:mlx5_ib_event:3061:(pid 9304): warning: event on port 0
mlx5_core 0033:01:00.0: mlx5_enter_error_state:104:(pid 9304): end
Unable to handle kernel paging request for data at address 0x0000003f
Faulting instruction address: 0xc0080000434b8c80

Fixes: 8812c24d28 ('net/mlx5: Add fast unload support in shutdown flow')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10 15:39:20 +09:00
Huy Nguyen
b8cce68bf1 net/mlx5: Loop over temp list to release delay events
list_splice_init initializing waiting_events_list after splicing it to
temp list, therefore we should loop over temp list to fire the events.

Fixes: 4ca637a20a ("net/mlx5: Delay events till mlx5 interface's add complete for pci resume")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-10 15:39:20 +09:00
David S. Miller
4fdc3023c6 Merge tag 'mlx5-updates-2017-11-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2017-11-09

This series introduces vlan offloads related improvements for mlx5
ethernet netdev driver, from Gal Pressman.

 - Add support for 802.1ad vlan filter
 - Add support for 802.1ad vlan insertion
 - Add vlan offloads statistics to ethtool (inserted/stripped vlans)
 - CHECKSUM_COMPLETE support for vlan traffic when vlan stripping is off! (Finally)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10 13:44:46 +09:00
David S. Miller
4dc6758d78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Simple cases of overlapping changes in the packet scheduler.

Must easier to resolve this time.

Which probably means that I screwed it up somehow.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-10 10:00:18 +09:00
Gal Pressman
f938daeee9 net/mlx5e: CHECKSUM_COMPLETE offload for VLAN/QinQ packets
When the VLAN tag is present in the packet buffer (i.e VLAN stripping disabled, QinQ)
the driver will currently report CHECKSUM_UNNECESSARY.
Instead of using CHECKSUM_COMPLETE offload for packets with first
ethertype of IPv4/6, use it for packets with last ethertype of IPv4/6 to
cover the former cases as well.

The checksum field present in the CQE is calculated from the IP header
until the end of the packet. When the first ethertype is different than
IPv4/6 (for ex. 802.1Q VLAN) a checksum of the VLAN header/s should be
added. The small header/s checksum calculation will allow us to use
CHECKSUM_COMPLETE instead of CHECKSUM_UNNECESSARY.

Testing bandwidth of one and 8 TCP streams to a single RQ,
LRO and VLAN stripping offloads disabled:
CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
NIC: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

Before:
+--------------+--------------------+---------------------+----------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] |   Checksum offload   |
+--------------+--------------------+---------------------+----------------------+
| Untagged     |          28,247.35 |           24,716.88 | CHECKSUM_COMPLETE    |
| VLAN         |          27,516.69 |           23,752.26 | CHECKSUM_UNNECESSARY |
| QinQ         |           6,961.30 |           20,667.04 | CHECKSUM_UNNECESSARY |
+--------------+--------------------+---------------------+----------------------+

Now:
+--------------+--------------------+---------------------+-------------------+
| Traffic type | 1 Stream BW [Mbps] | 8 Streams BW [Mbps] | Checksum offload  |
+--------------+--------------------+---------------------+-------------------+
| Untagged     |          28,521.28 |           24,926.32 | CHECKSUM_COMPLETE |
| VLAN         |          27,389.37 |           23,715.34 | CHECKSUM_COMPLETE |
| QinQ         |           6,901.77 |           20,845.73 | CHECKSUM_COMPLETE |
+--------------+--------------------+---------------------+-------------------+

No performance degradation observed.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:28:29 +09:00
Gal Pressman
f24686e878 net/mlx5e: Add VLAN offloads statistics
The following counters are now exposed through ethtool -S:
rx[i]_removed_vlan_packets (per channel)
rx_removed_vlan_packets
tx[i]_added_vlan_packets (per channel)
tx_added_vlan_packets

rx_removed_vlan_packets: The number of packets that had their
outer VLAN header stripped to the CQE by the hardware.
tx_added_vlan_packets: The number of packets that had their
outer VLAN header inserted by the hardware.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:28:22 +09:00
Gal Pressman
4382c7b92a net/mlx5e: Add 802.1ad VLAN insertion support
Report VLAN insertion support for S-tagged packets and add support by
choosing the correct VLAN type in the WQE.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:27:35 +09:00
Gal Pressman
7d92d58033 net/mlx5e: Add 802.1ad VLAN filter steering rules
When a user chooses to use 802.1ad VLAN the proper steering rules will
be added to the VLAN flow table (matching the specific S-tag VID).
Due to current hardware limitation, when using 802.1ad, we must disable
C-tag VLAN stripping on the RQs.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:27:08 +09:00
Gal Pressman
03eda9541f net/mlx5e: Declare bitmap using kernel macro
Replace explicit declaration of bitmap with DECLARE_BITMAP kernel macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:27:02 +09:00
Gal Pressman
355368d530 net/mlx5e: Add rollback on add VLAN failure
When add VLAN rule fails the active vlan bit should be cleared.

Fixes: afb736e933 ("net/mlx5: Ethernet resource handling files")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:26:56 +09:00
Gal Pressman
2b52a28390 net/mlx5e: Rename VLAN related variables and functions
Rename VLAN related symbols to better reflect the fact that they
are associated to C-tag VLAN.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-09 13:26:32 +09:00
Wei Yongjun
d86fd113eb mlxsw: spectrum: Fix error return code in mlxsw_sp_port_create()
Fix to return a negative error code from the VID  create error handling
case instead of 0, as done elsewhere in this function.

Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 13:25:15 +09:00
Nogah Frankel
3670756fe6 mlxsw: spectrum: Support general qdisc stats
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_QDISC_STATS. This call updates the generic qdisc stats from the
cache if the handle ID that is asked for matching the root qdisc ID and
fails otherwise.
Currently doesn't support qlen and rqueues.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Nogah Frankel
861fb8294d mlxsw: spectrum: Support RED xstats
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_RED_XSTATS. This call returns the RED qdisc xstats from the cache
if the handle ID that is asked for matching the root qdisc ID and fails
otherwise.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Nogah Frankel
075ab8adaf mlxsw: spectrum: Collect tclass related stats periodically
Add more statistics to be collected from the HW periodically. These stats
are tclass based (beside ECN marked packet, that exist only port based).
They are needed to expose RED qdisc stats and xstats correctly.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Yuval Mintz
0afc1221ff mlxsw: reg: Add ext and tc-cong counter groups
This adds the counter group definitions for 2 new counter groups
which are necessary for gaining ECN & wred counters.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Nogah Frankel
96f17e0776 mlxsw: spectrum: Support RED qdisc offload
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED.
This call sets RED qdisc on a traffic class.
This patch supports RED qdisc only as a root qdisc and set in on the
default tclass. It can be set with or without ECN.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Nogah Frankel
ad53fa06c1 mlxsw: reg: Add cwtp & cwtpm registers
This patch adds 2 new registers:
 - Congestion WRED ECN TClass Profile Register [CWTP]
 - Congestion WRED ECN TClass and Pool Mapping Register [CWTPM]

These registers would later be needed to offload RED-related
functionality to the HW.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Nogah Frankel
575ed7d39e net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:23:38 +09:00
Gustavo A. R. Silva
39a4b86f0d net/mlx5e/core/en_fs: fix pointer dereference after free in mlx5e_execute_l2_action
hn is being kfree'd in mlx5e_del_l2_from_hash and then dereferenced
by accessing hn->ai.addr

Fix this by copying the MAC address into a local variable for its safe use
in all possible execution paths within function mlx5e_execute_l2_action.

Addresses-Coverity-ID: 1417789
Fixes: eeb66cdb68 ("net/mlx5: Separate between E-Switch and MPFS")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 10:41:32 +09:00
Ingo Molnar
8c5db92a70 Merge branch 'linus' into locking/core, to resolve conflicts
Conflicts:
	include/linux/compiler-clang.h
	include/linux/compiler-gcc.h
	include/linux/compiler-intel.h
	include/uapi/linux/stddef.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:32:44 +01:00
David S. Miller
488e5b30d3 Merge tag 'mlx5-updates-2017-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:

====================
mlx5-updates-2017-11-04

This series includes:

From Huy: dscp to priority mapping for Ethernet packet.

===================================================
First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.

Firmware interface:
Mellanox firmware provides two control knobs for this feature:
  QPTS register allow changing the trust state between dscp and
  pcp mode. The default is pcp mode. Once in dscp mode, firmware will
  route the packet based on its dscp value if the dscp field exists.

  QPDPM register allow mapping a specific dscp (0 to 63) to a
  specific priority (0 to 7). By default, all the dscps are mapped to
  priority zero.

Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.

The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.

When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.

When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
===================================================

Plus to the dscp series, some small misc changes are include as well:

From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
From Or Gerlitz, Enlarge the NIC TC offload table size
From Rabie, Initialize destination_flow struct to 0
From Feras, Add inner TTC table to IPoIB flow steering
From Tal, Enable CQE based moderation on TX CQ
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 23:25:02 +09:00
Jakub Kicinski
f4e63525ee net: bpf: rename ndo_xdp to ndo_bpf
ndo_xdp is a control path callback for setting up XDP in the
driver.  We can reuse it for other forms of communication
between the eBPF stack and the drivers.  Rename the callback
and associated structures and definitions.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 22:26:18 +09:00
Tal Gilboa
0088cbbc4b net/mlx5e: Enable CQE based moderation on TX CQ
By using CQE based moderation on TX CQ we can reduce the number of TX
interrupt rate. Besides the benefit of less interrupts, this also
allows the kernel to better utilize TSO. Since TSO has some CPU overhead,
it might not aggregate when CPU is under high stress. By reducing the
interrupt rate and the CPU utilization, we can get better aggregation
and better overall throughput.
The feature is enabled by default and has a private flag in ethtool
for control.

Throughput, interrupt rate and TSO utilization improvements:
(ConnectX-4Lx 40GbE, unidirectional, 1/16 TCP streams, 64B packets)
---------------------------------------------------------
Metric   | Streams | CQE Based | EQE Based | improvement
---------------------------------------------------------
BW       |    1    |  2.4Gb/s  | 2.15Gb/s  |  +11.6%
IR       |    1    |  27Kips   | 50.6Kips  |  -46.7%
TSO Util |    1    |  74.6%    | 71%       |  +5%
BW       |    16   |  29Gb/s   | 25.85Gb/s |  +12.2%
IR       |    16   |  482Kips  | 745Kips   |  -35.3%
TSO Util |    16   |  69.1%    | 49%       |  +41.1%

*BW = Bandwidth, IR = Interrupt rate, ips = interrupt per second.
TSO Util = bytes in TSO sessions / all bytes transferred

Signed-off-by: Tal Gilboa <talgi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:27:15 -07:00
Feras Daoud
458821c72b net/mlx5e: IPoIB, Add inner TTC table to IPoIB flow steering
For supported platforms, add inner TTC flow table to enhanced IPoIB
flow steering.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:27:11 -07:00
Rabie Loulou
4c5009c525 net/mlx5: Initialize destination_flow struct to 0
This is needed in order to enlarge it with more members that will get
value of 0 when not set.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:27:06 -07:00
Or Gerlitz
21b9c1449d net/mlx5: Enlarge the NIC TC offload table size
The NIC TC offload table size was hard coded to 1k. Change it to be

      min(max NIC RX table size,
	  min(max flow counters, 64k) * num flow groups)

where the max values are read from the firmware and the number of
flow groups is hard-coded as before this change.

We don't know upfront the division of flows to groups (== different masks).
This setup allows each group to be of size up to the where we want to go
(when supported, all offloaded flows use counters). Thus, we don't expect
multiple occurences for a group which in turn would add steering hops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:27:01 -07:00
Inbar Karmy
5da8bc3eff net/mlx5e: DCBNL, Add debug messages log
Add debug print when changing the configuration of QoS through dcbnl.
Use ethtool -s <devname> msglvl hw on/off to toggle debug messages.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:26:56 -07:00
Gal Pressman
79c48764e1 net/mlx5e: Add support for ethtool msglvl support
Use ethtool -s <devname> msglvl <type> on/off to toggle debug messages.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:26:47 -07:00
Huy Nguyen
fbcb127e89 net/mlx5e: Support DSCP trust state to Ethernet's IP packet on SQ
If the port is in DSCP trust state, packets are placed in the right
priority queue based on the dscp value. This is done by selecting
the transmit queue based on the dscp of the skb.

Until now select_queue honors priority only from the vlan header.
However that is not sufficient in cases where port trust state is DSCP
mode as packet might not even contain vlan header. Therefore if the port
is in dscp trust state and vport's min inline mode is not NONE,
copy the IP header to the eseg's inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features such
as xdpsq, icosq are not modified.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:26:42 -07:00
Huy Nguyen
2a5e7a1344 net/mlx5e: Add dcbnl dscp to priority support
This patch implements dcbnl hooks to set and delete DSCP to priority map
as defined by the DCB subsystem. Device maintains internal trust state
which needs to be set to DSCP state for performing DSCP to priority mapping.

When the first dscp to priority APP entry is added by the user, the
trust state is changed to dscp.

When the last dscp to priority APP entry is deleted by the user, the
trust state is changed to pcp.

If user sends multiple dscp to priority APP entries on the same dscp,
the last sent one will take effect. All the previous sent will be
deleted.

The dscp to priority APP entries are added and deleted in the net/dcb
APP database using dcb_ieee_setapp/getapp.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:26:31 -07:00
Huy Nguyen
415a64aa8d net/mlx5: QPTS and QPDPM register firmware command support
The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.

The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:26:21 -07:00
Huy Nguyen
c02762eb20 net/mlx5: QCAM register firmware command support
The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-11-04 21:24:14 -07:00
David S. Miller
2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Petr Machata
44b0fff1d8 mlxsw: spectrum_router: Handle down of tunnel underlay
When the bound device of a tunnel device is down, encapsulated packets
are not egressed anymore, but tunnel decap still works. Extend
mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when
deciding whether a given next hop should be offloaded.

Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this
fixes the case where a newly-added tunnel has a down bound device, which
would previously be fully offloaded. Now the down state of the bound
device is noted and next hops forwarding to such tunnel are not
offloaded.

In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device
to force refresh of tunnel encap route offloads.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:18 +09:00
Petr Machata
89c2b7daba mlxsw: spectrum_ipip: Handle underlay device change
When a bound device of an IP-in-IP tunnel changes, such as through
'ip tunnel change name $name dev $dev', the loopback backing the tunnel
needs to be recreated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:18 +09:00
Petr Machata
4cf04f3ff4 mlxsw: spectrum: Handle NETDEV_CHANGE on L3 tunnels
Changes to L3 tunnel netdevices (through `ip tunnel change' as well as
`ip link set') lead to NETDEV_CHANGE being generated on the tunnel
device. Because what is relevant for the tunnel in question depends on
the tunnel type, handling of the event is dispatched to the IPIP module
through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change().

IPIP tunnels now remember the last set of tunnel parameters in struct
mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has
changed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:18 +09:00
Petr Machata
61481f2fce mlxsw: spectrum: Support IPIP underlay VRF migration
When a bound device of a tunnel netdevice changes VRF, the loopback RIF
that backs the tunnel needs to be updated and existing encapsulating
routes need to be refreshed.

Note that several tunnels can share the same bound device, in which case
all the impacted tunnels need to be updated.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:18 +09:00
Petr Machata
af641713e9 mlxsw: spectrum_router: Onload conflicting tunnels
The approach for offloading IP tunnels implemented currently by mlxsw
doesn't allow two tunnels that have the same local IP address in the
same (underlay) VRF. Previously, offloads were introduced on demand as
encap routes were formed. When such a route was created that would cause
offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would
detect it and return -EEXIST, which would propagate up and cause FIB
abort.

Now however IPIP entries are created as soon as an offloadable netdevice
is created, and the failure prevents creation of such device.
Furthermore, if the driver is installed at the point where such
conflicting tunnels exist, the failure actually prevents successful
modprobe.

Furthermore, follow-up patches implement handling of NETDEV_CHANGE due
to the local address change. However, NETDEV_CHANGE can't be vetoed. The
failure merely means that the offloads weren't updated, but the change
in Linux configuration is not rolled back. It is thus desirable to have
a robust way of handling these conflicts, which can later be reused for
handling NETDEV_CHANGE as well.

To fix this, when a conflicting tunnel is created, instead of failing,
simply pull the old tunnel to slow path and reject offloading the
new one.

Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and
mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both
public, because they will be useful later on in this patchset.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:18 +09:00
Petr Machata
4526cc8aed mlxsw: spectrum_router: Fix saddr deduction in mlxsw_sp_ipip_entry_create()
When trying to determine whether there are other offloaded tunnels with
the same local address, mlxsw_sp_ipip_entry_create() should look for a
tunnel with matching UL protocol, matching saddr, in the same VRF.
However instead of taking into account the UL protocol of the tunnel
netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to
the UL protocol of inspected IPIP entry), it deduces the UL protocol
from the inspected IPIP entry (and that's compared to itself).

This is currently immaterial, because only one tunnel type is offloaded,
and therefore the UL protocol always matches, but introducing support
for a tunnel with IPv6 underlay would uncover this error.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
0c5f1cd5ba mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()
The work that needs to be done to update HW configuration in response to
changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already
does, but with a number of twists: each change requires a different
subset of things to happen. Extend the function to support all these
uses, and allow finely-grained configuration of what should happen at
each call through a suite of function arguments.

Publish the updated function to allow use from the spectrum_ipip module.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
65a6121b30 mlxsw: spectrum_router: Extract __mlxsw_sp_ipip_entry_update_tunnel()
The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good
basis for a more versatile function that would take care of all sorts of
tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract
that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as
well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
7e75af6366 mlxsw: spectrum: Propagate extack for tunnel events
The function mlxsw_sp_rif_create() takes an extack parameter. So far,
for creation of loopback interfaces, NULL was passed. For some events
however the extack can be extracted and passed along. So do that for
NETDEV_CHANGEUPPER handler.

Use the opportunity to update the type of info argument that
mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will
introduce handling of more changes, and some of them carry an extack as
well, but in an info structure of a different type. Though not strictly
erroneous (the pointer could be cast whichever way), it makes no sense
to pretend the value is always of a certain type, when in fact it isn't.
So change the prototype of the above-mentioned function as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
47518ca5d2 mlxsw: spectrum_router: Extract mlxsw_sp_ipip_entry_ol_up_event()
The piece of logic to promote decap route, if any, is useful for generic
tunnel updates, not just for handling of NETDEV_UP events on tunnel
interfaces. Extract it to a separate function.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
6d4de44550 mlxsw: spectrum_router: Make mlxsw_sp_netdevice_ipip_ol_up_event() void
This function only ever returns 0, so don't pretend it returns anything
useful and just make it void.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
a3fe198ecd mlxsw: spectrum_router: Extract mlxsw_sp_ipip_entry_ol_down_event()
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
9fb7bd77d1 mlxsw: spectrum_ipip: Split accessor functions
To implement NETDEV_CHANGE notifications on IP-in-IP tunnels, the
handler needs to figure out what actually changed, to understand how
exactly to update the offloads. It will do so by storing struct
ip_tunnel_parm with previous configuration, and comparing that to the
new version.

To facilitate these comparisons, extract the code that operates on
struct ip_tunnel_parm from the existing accessor functions, and make
those a thin wrapper that extracts tunnel parameters and dispatches.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
474f0ff618 mlxsw: spectrum: Move mlxsw_sp_ipip_netdev_{s, d}addr{, 4}()
These functions ideologically belong to the IPIP module, and some
follow-up work will benefit from their presence there.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
cafdb2a0d4 mlxsw: spectrum_router: Extract mlxsw_sp_netdevice_ipip_can_offload()
Some of the code down the road needs this logic as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Petr Machata
796ec7769d mlxsw: spectrum: Rename IPIP-related netdevice handlers
To distinguish between events related to tunnel device itself and its
bound device, rename a number of functions related to handling tunneling
netdevice events to include _ol_ (for "overlay") in the name. That
leaves room in the namespace for underlay-related functions, which would
have _ul_ in the name.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:15:17 +09:00
Ido Schimmel
28678f07f1 mlxsw: spectrum_router: Update multipath hash parameters upon netevents
Make sure the device and the kernel are performing the multipath hash
according to the same parameters by updating the device whenever the
relevant netevent is generated.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 15:40:41 +09:00
Ido Schimmel
af658b6a0e mlxsw: spectrum_router: Align multipath hash parameters with kernel's
Up until now we used the hardware's defaults for multipath hash
computation. This patch aligns the hardware's multipath parameters with
the kernel's.

For IPv4 packets, the parameters are determined according to the
'fib_multipath_hash_policy' sysctl during module initialization. In case
L3-mode is requested, only the source and destination IP addresses are
used. There is no special handling of ICMP error packets.

In case L4-mode is requested, a 5-tuple is used: source and destination
IP addresses, source and destination ports and IP protocol. Note that
the layer 4 fields are not considered for fragmented packets.

For IPv6 packets, the source and destination IP addresses are used, as
well as the flow label and the next header fields.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 15:40:41 +09:00
Ido Schimmel
e471859b72 mlxsw: reg: Add Router ECMP Configuration Register Version 2
The RECRv2 register is used for setting up the router's ECMP hash
configuration.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 15:40:41 +09:00
Ido Schimmel
ceb8881ddf mlxsw: spectrum_router: Properly name netevent work struct
The struct containing the work item queued from the netevent handler is
named after the only event it is currently used for, which is neighbour
updates.

Use a more appropriate name for the struct, as we are going to use it
for more events.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 15:40:41 +09:00
Ido Schimmel
48fac88526 mlxsw: spectrum_router: Embed netevent notifier block in router struct
We are going to need to respond to netevents notifying us about
multipath hash updates by configuring the device's hash parameters.

Embed the netevent notifier in the router struct so that we could
retrieve it upon notifications and use it to configure the device.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-03 15:40:41 +09:00
Linus Torvalds
ead751507d License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
 makes it harder for compliance tools to determine the correct license.
 
 By default all files without license information are under the default
 license of the kernel, which is GPL version 2.
 
 Update the files which contain no license information with the 'GPL-2.0'
 SPDX license identifier.  The SPDX identifier is a legally binding
 shorthand, which can be used instead of the full boiler plate text.
 
 This patch is based on work done by Thomas Gleixner and Kate Stewart and
 Philippe Ombredanne.
 
 How this work was done:
 
 Patches were generated and checked against linux-4.14-rc6 for a subset of
 the use cases:
  - file had no licensing information it it.
  - file was a */uapi/* one with no licensing information in it,
  - file was a */uapi/* one with existing licensing information,
 
 Further patches will be generated in subsequent months to fix up cases
 where non-standard license headers were used, and references to license
 had to be inferred by heuristics based on keywords.
 
 The analysis to determine which SPDX License Identifier to be applied to
 a file was done in a spreadsheet of side by side results from of the
 output of two independent scanners (ScanCode & Windriver) producing SPDX
 tag:value files created by Philippe Ombredanne.  Philippe prepared the
 base worksheet, and did an initial spot review of a few 1000 files.
 
 The 4.13 kernel was the starting point of the analysis with 60,537 files
 assessed.  Kate Stewart did a file by file comparison of the scanner
 results in the spreadsheet to determine which SPDX license identifier(s)
 to be applied to the file. She confirmed any determination that was not
 immediately clear with lawyers working with the Linux Foundation.
 
 Criteria used to select files for SPDX license identifier tagging was:
  - Files considered eligible had to be source code files.
  - Make and config files were included as candidates if they contained >5
    lines of source
  - File already had some variant of a license header in it (even if <5
    lines).
 
 All documentation files were explicitly excluded.
 
 The following heuristics were used to determine which SPDX license
 identifiers to apply.
 
  - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.
 
    For non */uapi/* files that summary was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0                                              11139
 
    and resulted in the first patch in this series.
 
    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note                        930
 
    and resulted in the second patch in this series.
 
  - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point).  Results summary:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note                       270
    GPL-2.0+ WITH Linux-syscall-note                      169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
    LGPL-2.1+ WITH Linux-syscall-note                      15
    GPL-1.0+ WITH Linux-syscall-note                       14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
    LGPL-2.0+ WITH Linux-syscall-note                       4
    LGPL-2.1 WITH Linux-syscall-note                        3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
 
    and that resulted in the third patch in this series.
 
  - when the two scanners agreed on the detected license(s), that became
    the concluded license(s).
 
  - when there was disagreement between the two scanners (one detected a
    license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.
 
  - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply (and
    which scanner probably needed to revisit its heuristics).
 
  - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.
 
  - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.
 
 In total, over 70 hours of logged manual review was done on the
 spreadsheet to determine the SPDX license identifiers to apply to the
 source files by Kate, Philippe, Thomas and, in some cases, confirmation
 by lawyers working with the Linux Foundation.
 
 Kate also obtained a third independent scan of the 4.13 code base from
 FOSSology, and compared selected files where the other two scanners
 disagreed against that SPDX file, to see if there was new insights.  The
 Windriver scanner is based on an older version of FOSSology in part, so
 they are related.
 
 Thomas did random spot checks in about 500 files from the spreadsheets
 for the uapi headers and agreed with SPDX license identifier in the
 files he inspected. For the non-uapi files Thomas did random spot checks
 in about 15000 files.
 
 In initial set of patches against 4.14-rc6, 3 files were found to have
 copy/paste license identifier errors, and have been fixed to reflect the
 correct identifier.
 
 Additionally Philippe spent 10 hours this week doing a detailed manual
 inspection and review of the 12,461 patched files from the initial patch
 version early this week with:
  - a full scancode scan run, collecting the matched texts, detected
    license ids and scores
  - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct
  - reviewing anything where there was no detection but the patch license
    was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
    SPDX license was correct
 
 This produced a worksheet with 20 files needing minor correction.  This
 worksheet was then exported into 3 different .csv files for the
 different types of files to be modified.
 
 These .csv files were then reviewed by Greg.  Thomas wrote a script to
 parse the csv files and add the proper SPDX tag to the file, in the
 format that the file expected.  This script was further refined by Greg
 based on the output to detect more types of files automatically and to
 distinguish between header and source .c files (which need different
 comment types.)  Finally Greg ran the script using the .csv files to
 generate the patches.
 
 Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
 Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
 Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
 6dVh26uchcEQLN/XqUDt
 =x306
 -----END PGP SIGNATURE-----

Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull initial SPDX identifiers from Greg KH:
 "License cleanup: add SPDX license identifiers to some files

  Many source files in the tree are missing licensing information, which
  makes it harder for compliance tools to determine the correct license.

  By default all files without license information are under the default
  license of the kernel, which is GPL version 2.

  Update the files which contain no license information with the
  'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
  binding shorthand, which can be used instead of the full boiler plate
  text.

  This patch is based on work done by Thomas Gleixner and Kate Stewart
  and Philippe Ombredanne.

  How this work was done:

  Patches were generated and checked against linux-4.14-rc6 for a subset
  of the use cases:

   - file had no licensing information it it.

   - file was a */uapi/* one with no licensing information in it,

   - file was a */uapi/* one with existing licensing information,

  Further patches will be generated in subsequent months to fix up cases
  where non-standard license headers were used, and references to
  license had to be inferred by heuristics based on keywords.

  The analysis to determine which SPDX License Identifier to be applied
  to a file was done in a spreadsheet of side by side results from of
  the output of two independent scanners (ScanCode & Windriver)
  producing SPDX tag:value files created by Philippe Ombredanne.
  Philippe prepared the base worksheet, and did an initial spot review
  of a few 1000 files.

  The 4.13 kernel was the starting point of the analysis with 60,537
  files assessed. Kate Stewart did a file by file comparison of the
  scanner results in the spreadsheet to determine which SPDX license
  identifier(s) to be applied to the file. She confirmed any
  determination that was not immediately clear with lawyers working with
  the Linux Foundation.

  Criteria used to select files for SPDX license identifier tagging was:

   - Files considered eligible had to be source code files.

   - Make and config files were included as candidates if they contained
     >5 lines of source

   - File already had some variant of a license header in it (even if <5
     lines).

  All documentation files were explicitly excluded.

  The following heuristics were used to determine which SPDX license
  identifiers to apply.

   - when both scanners couldn't find any license traces, file was
     considered to have no license information in it, and the top level
     COPYING file license applied.

     For non */uapi/* files that summary was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0                                              11139

     and resulted in the first patch in this series.

     If that file was a */uapi/* path one, it was "GPL-2.0 WITH
     Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
     was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0 WITH Linux-syscall-note                        930

     and resulted in the second patch in this series.

   - if a file had some form of licensing information in it, and was one
     of the */uapi/* ones, it was denoted with the Linux-syscall-note if
     any GPL family license was found in the file or had no licensing in
     it (per prior point). Results summary:

       SPDX license identifier                            # files
       ---------------------------------------------------|------
       GPL-2.0 WITH Linux-syscall-note                       270
       GPL-2.0+ WITH Linux-syscall-note                      169
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
       LGPL-2.1+ WITH Linux-syscall-note                      15
       GPL-1.0+ WITH Linux-syscall-note                       14
       ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
       LGPL-2.0+ WITH Linux-syscall-note                       4
       LGPL-2.1 WITH Linux-syscall-note                        3
       ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
       ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

     and that resulted in the third patch in this series.

   - when the two scanners agreed on the detected license(s), that
     became the concluded license(s).

   - when there was disagreement between the two scanners (one detected
     a license but the other didn't, or they both detected different
     licenses) a manual inspection of the file occurred.

   - In most cases a manual inspection of the information in the file
     resulted in a clear resolution of the license that should apply
     (and which scanner probably needed to revisit its heuristics).

   - When it was not immediately clear, the license identifier was
     confirmed with lawyers working with the Linux Foundation.

   - If there was any question as to the appropriate license identifier,
     the file was flagged for further research and to be revisited later
     in time.

  In total, over 70 hours of logged manual review was done on the
  spreadsheet to determine the SPDX license identifiers to apply to the
  source files by Kate, Philippe, Thomas and, in some cases,
  confirmation by lawyers working with the Linux Foundation.

  Kate also obtained a third independent scan of the 4.13 code base from
  FOSSology, and compared selected files where the other two scanners
  disagreed against that SPDX file, to see if there was new insights.
  The Windriver scanner is based on an older version of FOSSology in
  part, so they are related.

  Thomas did random spot checks in about 500 files from the spreadsheets
  for the uapi headers and agreed with SPDX license identifier in the
  files he inspected. For the non-uapi files Thomas did random spot
  checks in about 15000 files.

  In initial set of patches against 4.14-rc6, 3 files were found to have
  copy/paste license identifier errors, and have been fixed to reflect
  the correct identifier.

  Additionally Philippe spent 10 hours this week doing a detailed manual
  inspection and review of the 12,461 patched files from the initial
  patch version early this week with:

   - a full scancode scan run, collecting the matched texts, detected
     license ids and scores

   - reviewing anything where there was a license detected (about 500+
     files) to ensure that the applied SPDX license was correct

   - reviewing anything where there was no detection but the patch
     license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
     applied SPDX license was correct

  This produced a worksheet with 20 files needing minor correction. This
  worksheet was then exported into 3 different .csv files for the
  different types of files to be modified.

  These .csv files were then reviewed by Greg. Thomas wrote a script to
  parse the csv files and add the proper SPDX tag to the file, in the
  format that the file expected. This script was further refined by Greg
  based on the output to detect more types of files automatically and to
  distinguish between header and source .c files (which need different
  comment types.) Finally Greg ran the script using the .csv files to
  generate the patches.

  Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
  Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
  Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  License cleanup: add SPDX license identifier to uapi header files with a license
  License cleanup: add SPDX license identifier to uapi header files with no license
  License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02 10:04:46 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Jiri Pirko
44ae12a768 net: sched: move the can_offload check from binding phase to rule insertion phase
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.

Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02 16:10:39 +09:00
David S. Miller
ed29668d1a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Smooth Cong Wang's bug fix into 'net-next'.  Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-02 15:23:39 +09:00
Vadim Pasternak
d70eaa386b mlxsw: i2c: Fix buffer increment counter for write transaction
It fixes a problem for the last chunk where 'chunk_size' is smaller than
MLXSW_I2C_BLK_MAX and data is copied to the wrong offset, overriding
previous data.

Fixes: 6882b0aee1 ("mlxsw: Introduce support for I2C bus")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-01 20:40:58 +09:00
Ido Schimmel
62b0e9243f mlxsw: reg: Add high and low temperature thresholds
The ASIC has the ability to generate events whenever a sensor indicates
the temperature goes above or below its high or low thresholds,
respectively.

In new firmware versions the firmware enforces a minimum of 5
degrees Celsius difference between both thresholds. Make the driver
conform to this requirement.

Note that this is required even when the events are disabled, as in
certain systems interrupts are generated via GPIO based on these
thresholds.

Fixes: 85926f8770 ("mlxsw: reg: Add definition of temperature management registers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-01 12:25:43 +09:00
David Ahern
1f279233af mlxsw: spectrum_router: Return extack message on abort due to fib rules
Adding a FIB rule on a spectrum platform silently aborts FIB offload:
    $ ip ru add pref 99 from all to 192.168.1.1 table 10
    $ dmesg -c
    [  623.144736] mlxsw_spectrum 0000:03:00.0: FIB abort triggered. Note that FIB entries are no longer being offloaded to this device.

This patch reworks FIB rule handling to return a message to the user:
    $ ip ru add pref 99 from all to 8.8.8.8 table 11
    Error: spectrum: FIB rules not supported. Aborting offload.

spectrum currently only checks whether the fib rule is a default rule or
an l3mdev rule, both of which it knows how to handle. Any other it aborts
FIB offload. Move the processing to check the rule type inline with the
user request. If the rule is an unsupported one, then a work queue entry
is used to abort the offload. Change the rule delete handling to just
return since it does nothing at the moment.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-01 11:50:43 +09:00
Kamal Heib
1fe850062c net/mlx5e: Switch channels counters to use stats group API
Switch the channels counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:48 -07:00
Kamal Heib
e185d43f59 net/mlx5e: Switch ipsec counters to use stats group API
Switch the ipsec counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
0e6f01a49d net/mlx5e: Switch pme counters to use stats group API
Switch the pme counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
4377bea276 net/mlx5e: Switch per prio pfc counters to use stats group API
Switch the per prio pfc counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
e6000651cf net/mlx5e: Switch per prio traffic counters to use stats group API
Switch the per prio traffic counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
9fd2b5f137 net/mlx5e: Switch pcie counters to use stats group API
Switch the pcie counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
3488bd4c35 net/mlx5e: Switch ethernet extended counters to use stats group API
Switch the ethernet extended counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
2e4df0b241 net/mlx5e: Switch physical statistical counters to use stats group API
Switch the physical statistical counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
e0e0def9e2 net/mlx5e: Switch RFC 2819 counters to use stats group API
Switch the RFC 2819 counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
fc8e64a311 net/mlx5e: Switch RFC 2863 counters to use stats group API
Switch the RFC 2863 counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
6e6ef814d2 net/mlx5e: Switch IEEE 802.3 counters to use stats group API
Switch the IEEE 802.3 counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:47 -07:00
Kamal Heib
40cab9f16c net/mlx5e: Switch vport counters to use the stats group API
Switch the vport counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:46 -07:00
Kamal Heib
fd8dcdb8d2 net/mlx5e: Switch Q counters to use the stats group API
Switch the Q counters to use the new stats group API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:46 -07:00
Kamal Heib
c0752f2bd6 net/mlx5e: Introduce stats group API
Currently the mlx5e driver has multiple groups of stats, each group is
used for different purposes and it may depend on hardware capabilities
or not. The problem with the current implementation is that there is no
clear API to create a new group of stats.

This change define a new API to create a group of stats and simplifies
the way of handling them by defining a new struct "mlx5e_stats_grp" which
have the following three function pointers:
- get_num_stats() - return the number of counters in the group.
- fill_strings() - fill counters strings within the group.
- fill_stats() - fill counters values within the group.

The above function pointers are used within the ethtool callbaks while
calling "ethtool -S" from userspace. This change also switch the SW
group to use the new API.

Signed-off-by: Kamal Heib <kamalh@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-31 14:20:46 -07:00
David S. Miller
e1ea2f9856 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts here.

NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.

Parallel additions to net/pkt_cls.h and net/sch_generic.h

A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.

The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30 21:09:24 +09:00
Kees Cook
0365b047de drivers/net: mellanox: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-28 19:09:49 +09:00
Nogah Frankel
3e8c1fd318 mlxsw: reg: Avoid magic number in PPCNT
Replace recurring magic number in PPCNT register with a define.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27 23:25:55 +09:00
Nogah Frankel
9deef43ddf mlxsw: spectrum: Change stats cache to be local
Change the HW stats cache to be local. Rename it for better clarity.
It holds the results of the last result of HW stats that are being read
periodically, in order to have answer for stats request immediately.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-27 23:25:55 +09:00
Huy Nguyen
be0f161ef1 net/mlx5e: DCBNL, Implement tc with ets type and zero bandwidth
Previously, tc with ets type and zero bandwidth is not accepted
by driver. This behavior does not follow the IEEE802.1qaz spec.

If there are tcs with ets type and zero bandwidth, these tcs are
assigned to the lowest priority tc_group #0. We equally distribute
100% bw of the tc_group #0 to these zero bandwidth ets tcs.
Also, the non zero bandwidth ets tcs are assigned to tc_group #1.

If there is no zero bandwidth ets tc, the non zero bandwidth ets tcs
are assigned to tc_group #0.

Fixes: cdcf11212b ("net/mlx5e: Validate BW weight values of ETS")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-26 00:47:27 -07:00
Or Gerlitz
3c37745ec6 net/mlx5e: Properly deal with encap flows add/del under neigh update
Currently, the encap action offload is handled in the actions parse
function and not in mlx5e_tc_add_fdb_flow() where we deal with all
the other aspects of offloading actions (vlan, modify header) and
the rule itself.

When the neigh update code (mlx5e_tc_encap_flows_add()) recreates the
encap entry and offloads the related flows, we wrongly call again into
mlx5e_tc_add_fdb_flow(), this for itself would cause us to handle
again the offloading of vlans and header re-write which puts things
in non consistent state and step on freed memory (e.g the modify
header parse buffer which is already freed).

Since on error, mlx5e_tc_add_fdb_flow() detaches and may release the
encap entry, it causes a corruption at the neigh update code which goes
over the list of flows associated with this encap entry, or double free
when the tc flow is later deleted by user-space.

When neigh update (mlx5e_tc_encap_flows_del()) unoffloads the flows related
to an encap entry which is now invalid, we do a partial repeat of the eswitch
flow removal code which is wrong too.

To fix things up we do the following:

(1) handle the encap action offload in the eswitch flow add function
    mlx5e_tc_add_fdb_flow() as done for the other actions and the rule itself.

(2) modify the neigh update code (mlx5e_tc_encap_flows_add/del) to only
    deal with the encap entry and rules delete/add and not with any of
    the other offloaded actions.

Fixes: 232c001398 ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-26 00:47:27 -07:00
Huy Nguyen
4ca637a20a net/mlx5: Delay events till mlx5 interface's add complete for pci resume
mlx5_ib_add is called during mlx5_pci_resume after a pci error.
Before mlx5_ib_add completes, there are multiple events which trigger
function mlx5_ib_event. This cause kernel panic because mlx5_ib_event
accesses unitialized resources.

The fix is to extend Erez Shitrit's patch <97834eba7c19>
("net/mlx5: Delay events till ib registration ends") to cover
the pci resume code path.

Trace:
mlx5_core 0001:01:00.6: mlx5_pci_resume was called
mlx5_core 0001:01:00.6: firmware version: 16.20.1011
mlx5_core 0001:01:00.6: mlx5_attach_interface:164:(pid 779):
mlx5_ib_event:2996:(pid 34777): warning: event on port 1
mlx5_ib_event:2996:(pid 34782): warning: event on port 1
Unable to handle kernel paging request for data at address 0x0001c104
Faulting instruction address: 0xd000000008f411fc
Oops: Kernel access of bad area, sig: 11 [#1]
...
...
Call Trace:
[c000000fff77bb70] [d000000008f4119c] mlx5_ib_event+0x64/0x470 [mlx5_ib] (unreliable)
[c000000fff77bc60] [d000000008e67130] mlx5_core_event+0xb8/0x210 [mlx5_core]
[c000000fff77bd10] [d000000008e4bd00] mlx5_eq_int+0x528/0x860[mlx5_core]

Fixes: 97834eba7c ("net/mlx5: Delay events till ib registration ends")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-26 00:47:27 -07:00
Moshe Shemesh
6377ed0bba net/mlx5: Fix health work queue spin lock to IRQ safe
spin_lock/unlock of health->wq_lock should be IRQ safe.
It was changed to spin_lock_irqsave since adding commit 0179720d6b
("net/mlx5: Introduce trigger_health_work function") which uses
spin_lock from asynchronous event (IRQ) context.
Thus, all spin_lock/unlock of health->wq_lock should have been moved
to IRQ safe mode.
However, one occurrence on new code using this lock missed that
change, resulting in possible deadlock:
  kernel: Possible unsafe locking scenario:
  kernel:       CPU0
  kernel:       ----
  kernel:  lock(&(&health->wq_lock)->rlock);
  kernel:  <Interrupt>
  kernel:    lock(&(&health->wq_lock)->rlock);
  kernel: #012 *** DEADLOCK ***

Fixes: 2a0165a034 ("net/mlx5: Cancel delayed recovery work when unloading the driver")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-26 00:47:27 -07:00
Mark Rutland
6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Yotam Gigi
ea00aa3a27 mlxsw: spectrum: mr_tcam: Include the mr_tcam header file
Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
6a30dc29a4 mlxsw: spectrum: mr: Make the function mlxsw_sp_mr_dev_vif_lookup static
The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.

Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Yotam Gigi
de3872cd18 mlxsw: spectrum: mr: Fix various endianness issues
Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.

Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:212:31:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31:    got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
  base types)
spectrum_mr.c:214:32:    expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32:    got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:07:13 +09:00
Arkadi Sharshevsky
69715dd50d mlxsw: spectrum_dpipe: Fix entries dump of the adjacency table
During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.

Fixes: 190d38a52a ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-24 19:02:02 +09:00
Ido Schimmel
330e2cc65d mlxsw: spectrum: Add another partition to KVD linear
The KVD linear is currently partitioned into two partitions. One for
single entries and another for groups of 32 entries.

Add another partition consisting of groups of 512 entries which will
allow us to more accurately represent the nexthop weights in non-equal
cost multi-path routing.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
f11fbaf8b5 mlxsw: spectrum: Increase number of linear entries
The memory region where adjacency entries (nexthops) are stored is
called the KVD linear and is configured during initialization with a
size of 64K.

Extend this area with 32K more entries, that will be partitioned into 64
groups of 0.5K entries, thereby allowing us to support weighted nexthops
with high accuracy.

Change the ratio between both types of hash entries, so as to prevent
reduction in the number of double hash entries, which are used for IPv6
neighbours and routes with a prefix length greater than 64.

Note that the user will be able to control all these sizes once the
devlink resource manager is introduced.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
eb789980d0 mlxsw: spectrum_router: Populate adjacency entries according to weights
Up until now the driver assumed all the nexthops have an equal weight
and wrote each to a single adjacency entry.

This patch takes the `weight` parameter into account and populates the
adjacency group according to the relative weight of each nexthop.

Specifically, the weights of all the nexthops that should be offloaded
are first normalized and then used to calculate the upper adjacency
index of each nexthop. This is done according to the hash-threshold
algorithm used by the kernel for IPv4 multi-path routing.

Adjacency groups are currently limited to 32 entries which limits the
weights that can be used, but follow-up patches will introduce groups of
512 entries.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
425a08c673 mlxsw: spectrum_router: Prepare for large adjacency groups
The device has certain restrictions regarding the size of an adjacency
group.

Have the router determine the size of the adjacency group according to
available KVDL allocation sizes and these restrictions.

This was not needed until now since only allocations of up 32 entries
were supported and these are all valid sizes for an adjacency group.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
408bd946bf mlxsw: spectrum_router: Store weight in nexthop struct
As the first step towards non-equal-cost multi-path support, store each
nexthop's weight.

For IPv6 nexthops always set the weight to 1, as it only supports ECMP.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
d672aec45f mlxsw: spectrum: Add ability to query KVDL allocation size
The current KVDL allocation API allows the user to specify the requested
number of entries, but the user has no way of knowing how many entries
were actually allocated.

This works because existing users (e.g., router) request the exact
number they end up using. With the introduction of large adjacency
groups, this will change, as the router will have the ability to choose
from several allocation sizes, where larger allocations provide higher
accuracy with respect to requested weights and better resilience against
nexthop failures.

One option is to have the router try several allocations of descending
size until one succeeds, but a better way is to simply allow it to query
the actual allocation size and then size its request accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
a875a2ee2d mlxsw: spectrum: Better represent KVDL partitions
The KVD linear (KVDL) allocator currently consists of a very large
bitmap that reflects the KVDL's usage. The boundaries of each partition
as well as their allocation size are represented using defines.

This representation requires us to patch all the functions that act on a
partition whenever the partitioning scheme is changed. In addition, it
does not enable the dynamic configuration of the KVDL using the
up-coming resource manager.

Add objects to represent these partitions as well as the accompanying
code that acts on them to perform allocations and de-allocations.

In the following patches, this will allow us to easily add another
partition as well as new operations to act on these partitions.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:06 +01:00
Ido Schimmel
e69cd9d75e mlxsw: spectrum_dpipe: Add adjacency group size
The adjacency group size is part of the match on the adjacency group and
should therefore be exposed using dpipe.

When non-equal-cost multi-path support will be introduced, the group's
size will help users understand the exact number of adjacency entries
each nexthop occupies, as a nexthop will no longer correspond to a
single entry.

The output for a multi-path route with two nexthops, one with weight 255
and the second 1 will be:

Example:

$ devlink dpipe table dump pci/0000:01:00.0 name mlxsw_adj
pci/0000:01:00.0:
  index 0
  match_value:
    type field_exact header mlxsw_meta field adj_index value 65536
    type field_exact header mlxsw_meta field adj_size value 512
    type field_exact header mlxsw_meta field adj_hash_index value 0
  action_value:
    type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:64
    type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 3 value 1

  index 1
  match_value:
    type field_exact header mlxsw_meta field adj_index value 65536
    type field_exact header mlxsw_meta field adj_size value 512
    type field_exact header mlxsw_meta field adj_hash_index value 510
  action_value:
    type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:65
    type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 4 value 2

Thus, the first nexthop occupies 510 adjacency entries and the second 2,
which leads to a ratio of 255 to 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-23 05:23:05 +01:00
David S. Miller
f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Elena Reshetova
dd8e19456d drivers, net, mlx5: convert fs_node.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable fs_node.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:39 +01:00
Elena Reshetova
a4b51a9f83 drivers, net, mlx5: convert mlx5_cq.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx5_cq.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Elena Reshetova
17ac99b2b8 drivers, net, mlx4: convert mlx4_srq.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_srq.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Elena Reshetova
0068895ff8 drivers, net, mlx4: convert mlx4_qp.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_qp.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Elena Reshetova
ff61b5e3f0 drivers, net, mlx4: convert mlx4_cq.refcount from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mlx4_cq.refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:22:38 +01:00
Petr Machata
dcbda2820f mlxsw: spectrum_router: Configure TIGCR on init
Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
do. Configure TIGCR on router init so that the TTL of tunnel packets is
copied from the overlay packets.

Fixes: ee954d1a91 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:19:03 +01:00
Petr Machata
14aefd9011 mlxsw: reg: Add Tunneling IPinIP General Configuration Register
The TIGCR register is used for setting up the IPinIP Tunnel
configuration.

Fixes: ee954d1a91 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 02:19:03 +01:00
Jiri Pirko
8d26d5636d net: sched: avoid ndo_setup_tc calls for TC_SETUP_CLS*
All drivers are converted to use block callbacks for TC_SETUP_CLS*.
So it is now safe to remove the calls to ndo_setup_tc from cls_*

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:08 +01:00
Jiri Pirko
855afa0932 mlx5e_rep: Convert ndo_setup_tc offloads to block callbacks
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:08 +01:00
Jiri Pirko
d6c862baaf mlx5e: Convert ndo_setup_tc offloads to block callbacks
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for flower offloads to block callbacks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
Jiri Pirko
eb49cfaa6b mlxsw: spectrum: Convert ndo_setup_tc offloads to block callbacks
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for matchall and flower offloads to block
callbacks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 03:04:07 +01:00
David Ahern
3c75f9b1b4 spectrum: Convert fib event handlers to use container_of on info arg
Use container_of to convert the generic fib_notifier_info into
the event specific data structure.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-21 01:45:17 +01:00
David Ahern
f8fa9b4e6d mlxsw: spectrum_router: Add extack message for RIF and VRF overflow
Add extack argument down to mlxsw_sp_rif_create and mlxsw_sp_vr_create
to set an error message on RIF or VR overflow. Now on overflow of
either resource the user gets an informative message as opposed to
failing with EBUSY.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:15:07 +01:00
David Ahern
89d5dd2efd mlxsw: spectrum: router: Add support for address validator notifier
Add support for inetaddr_validator and inet6addr_validator. The
notifiers provide a means for validating ipv4 and ipv6 addresses
before the addresses are installed and on failure the error
is propagated back to the user.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-20 13:15:07 +01:00
Doug Ledford
894b82c427 Merge branch 'timer_setup' into for-next
Conflicts:
	drivers/infiniband/hw/cxgb4/cm.c
	drivers/infiniband/hw/qib/qib_driver.c
	drivers/infiniband/hw/qib/qib_mad.c

There were minor fixups needed in these files.  Just minor context diffs
due to patches from independent sources touching the same basic area.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:12:09 -04:00
Doug Ledford
754137a769 Merge branch 'for-next-early' into for-next
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base.  I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-18 13:07:13 -04:00
Ido Schimmel
d965465b60 mlxsw: core: Fix possible deadlock
When an EMAD is transmitted, a timeout work item is scheduled with a
delay of 200ms, so that another EMAD will be retried until a maximum of
five retries.

In certain situations, it's possible for the function waiting on the
EMAD to be associated with a work item that is queued on the same
workqueue (`mlxsw_core`) as the timeout work item. This results in
flushing a work item on the same workqueue.

According to commit e159489baa ("workqueue: relax lockdep annotation
on flush_work()") the above may lead to a deadlock in case the workqueue
has only one worker active or if the system in under memory pressure and
the rescue worker is in use. The latter explains the very rare and
random nature of the lockdep splats we have been seeing:

[   52.730240] ============================================
[   52.736179] WARNING: possible recursive locking detected
[   52.742119] 4.14.0-rc3jiri+ #4 Not tainted
[   52.746697] --------------------------------------------
[   52.752635] kworker/1:3/599 is trying to acquire lock:
[   52.758378]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
[   52.767837]
               but task is already holding lock:
[   52.774360]  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[   52.784495]
               other info that might help us debug this:
[   52.791794]  Possible unsafe locking scenario:
[   52.798413]        CPU0
[   52.801144]        ----
[   52.803875]   lock(mlxsw_core_driver_name);
[   52.808556]   lock(mlxsw_core_driver_name);
[   52.813236]
                *** DEADLOCK ***
[   52.819857]  May be due to missing lock nesting notation
[   52.827450] 3 locks held by kworker/1:3/599:
[   52.832221]  #0:  (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[   52.842846]  #1:  ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[   52.854537]  #2:  (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
[   52.863021]
               stack backtrace:
[   52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
[   52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[   52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
[   52.894060] Call Trace:
[   52.909122]  __lock_acquire+0xf6f/0x2a10
[   53.025412]  lock_acquire+0x158/0x440
[   53.047557]  flush_work+0x3c4/0x5e0
[   53.087571]  __cancel_work_timer+0x3ca/0x5e0
[   53.177051]  cancel_delayed_work_sync+0x13/0x20
[   53.182142]  mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
[   53.194571]  mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
[   53.225365]  mlxsw_reg_query+0x10/0x20 [mlxsw_core]
[   53.230882]  mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
[   53.237801]  process_one_work+0x8f1/0x12f0
[   53.321804]  worker_thread+0x1fd/0x10c0
[   53.435158]  kthread+0x28e/0x370
[   53.448703]  ret_from_fork+0x2a/0x40
[   53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
[   53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
[   53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
[   53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications

Fix this by creating another workqueue for EMAD timeouts, thereby
preventing the situation of a work item trying to flush a work item
queued on the same workqueue.

Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18 12:19:15 +01:00
Petr Machata
4cccb737d2 mlxsw: spectrum: Drop refcounting of IPIP entries
Formerly, IPIP entries were created lazily by next hops that referenced
an offloadable IP-in-IP netdevice. However now that they are created
eagerly as a reaction to events on such netdevices, the reference
counting is useless. Hence drop it.

The routes whose next hops reference an offloaded IP-in-IP netdevice
actually linger around a bit after their device is unregistered.
However, mlxsw_sp_ipip_entry_destroy() also destroys the backing
loopback, and mlxsw_sp_rif_destroy() transitively (via
mlxsw_sp_nexthop_rif_gone_sync()) calls mlxsw_sp_nexthop_ipip_fini(),
which unlinks the IPIP entry from a next hop. Thus no dangling pointers
are left behind for the brief window after netdevice is gone, but routes
not yet.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:30:33 +01:00
Petr Machata
f63ce4e54a mlxsw: spectrum: Support IPIP overlay VRF migration
IPIP entries are created as soon as an offloadable device is created.
That means that when such a device is later moved to a different VRF,
the loopback device that backs the tunnel is wrong.

Thus when an offloadable encapsulating netdevice moves from one VRF to
another, make sure that the loopback is updated as necessary.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:30:33 +01:00
Petr Machata
0063587d35 mlxsw: spectrum: Support decap-only IP-in-IP tunnels
Current code for offloading IP-in-IP tunneling assumes that there is no
decap without encap. But that's never true for IPv6 overlays, and is not
true for IPv4 ones either, if net.ipv4.conf.*.rp_filter is unset.

To support decap-only tunnels, an IPIP entry is now created as soon as
an offloadable tunneling device is created. When that netdevice is up'd,
a decap route is looked up and possibly offloaded. Thus decap is not
handled implicitly as part of mlxsw_sp_ipip_entry_get() call anymore,
but needs to be done explicitly after the get, if desired.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:30:32 +01:00
Petr Machata
6698c168bf mlxsw: spectrum_router: Move mlxsw_sp_netdev_ipip_type()
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:30:32 +01:00
Petr Machata
c30f5d012e mlxsw: spectrum: Move netdevice NB to struct mlxsw_sp
So far, all netdevice notifications that the driver cared about were
related to its own ports, and mlxsw_sp could be retrieved from the
netdevice's private data. For IP-in-IP offloading however, the driver
cares about events on foreign netdevices, and getting at mlxsw_sp or
router data structures from the handler is inconvenient.

Therefore move the netdevice notifier blocks from global scope to struct
mlxsw_sp to allow retrieval from the notifier block pointer itself.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:30:32 +01:00
David S. Miller
af28f6f26a Merge tag 'mlx5-updates-2017-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
mlx5-updates-2017-10-11: IPoIB Multi Pkey support

This series provides the support for IPoIB Multi Pkey.
InfiniBand Pkeys are the equivalent of Ethernet vlans.
Currently IPoIB device driver supports only default Pkey and IPoIB Pkey child
interfaces are not supported with IPoIB offloads mode, this series will add
the support for that by allowing creating mlx5 multiple IPoIB netdevices with
a non-default Pkey.

mlx5 IPoIB Pkey child interface is smaller version of mlx5i IPoIB interfaces and shares
most of its resources with the parent IPoIB interface, namely RX steering and ring
queue resources.

The only mlx5 resources a child Pkey interface will be creating are the TX rings,
since they should be assigned to a specific Pkey.

mlx5i Pkey netdev is implemented via new mlx5e netdev profile implemented in
mlx5/core/ipoib/ipoib_vlan.c.

The series starts with a refactoring of mlx5e PTP and mlx5 clock implementation
to move the code to be part of mlx5 core rather than mlx5e netdevice, in order to
make mlx5 clock and PTP registration part of the core to be shared with mlx5e
master Ethernet netdev/IPoIB parent netdev and mlx5_ib in the near future.

Add the support for attaching multiple underlay QPs for the different Pkeys
in mlx5 core RX steering.

Add Pkey index to rdma_netdev to add the ability to set PKEY index to lower
IPoIB offload netdev.

Use hash-table to map between DQPN (Destination QP number) to child netdev
for the IPoIB parent netdev to forward RX packets to the corresponding
child Pkey netdev, since the RX rings are shared.

The reset of the series adds the ipoib child Pkey: mlx5e netdev profile,
netdev nods implementation and minimal set of ethtool callbacks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 05:42:41 +01:00
Alex Vesker
b5ae577741 net/mlx5e: IPoIB, Modify rdma netdev allocate and free to support PKEY
Resources such as FT, QPN HT and mdev resources should be allocated
only by parent netdev. Shared resources are allocated and freed by the
parent interface since the parent is always present and created
before the IPoIB PKEY sub-interface.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:12 -07:00
Alex Vesker
6a910233c1 net/mlx5e: IPoIB, Add PKEY child interface ethtool ops
Similar to VLAN interfaces child interfaces have limited ethtool
support. In current code the main limitation that does not
allow child interface ethtool configuration is due to shared
resources which are managed by the parent.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:12 -07:00
Alex Vesker
af98cebcb3 net/mlx5e: IPoIB, Add PKEY child interface ndos
Child interface ndos will be called to support child interface
specific behaviour.

ndo_init flow:
-Acquire shared QPN to net-device HT from parent
-Continue with the same flow as parent interface

ndo_open flow:
-Initialize child underlay QP and connect to shared FT
-Create child send TIS
-Open child send channels

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:11 -07:00
Alex Vesker
4c6c615e3f net/mlx5e: IPoIB, Add PKEY child interface nic profile
Child interface profile will be called to support child interface
specific behaviour. The child code is sparse compared to the parent
since the RX channels are shared between the interfaces.
Creating a septate profile for child and parent will make a smother
code with a better ability for future expansion.
The profile stuct is exposed to the parent using a getter function.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:10 -07:00
Alex Vesker
7e7f4780c3 net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev
This change is needed for PKEY support, since the RQs are shared
between the child interface and the parent. The parent is responsible
for NAPI and the precessing of RX completions. Using the dqpn in the
completion descriptor we set the corresponding child IPoIB netdevice
on the SKB.
The mapping between the dqpn and the netdevice is done using a HT,
each mlx5 IPoIB interface registers its mapping on creation.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:10 -07:00
Alex Vesker
da34f1a85b net/mlx5e: IPoIB, Support for setting PKEY index to underlay QP
Added a function to set PKEY index to IPoIB device driver using the
already present set_id function. PKEY index is attached to the QP
during state modification.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
2017-10-14 11:22:09 -07:00
Alex Vesker
dae37456c8 net/mlx5: Support for attaching multiple underlay QPs to root flow table
Previous support allowed connecting only a single QPN to the FT.
Now using a linked list multiple QPNs can be attached to the same FT.

Supporting attaching multiple underlay QPs is required for PKEY
support in which child and parent share the same FT.

The actual attaching/detaching FW commands will be called inside the
function symmetrically.

This change requires a change in IPoIB open and close functions, the
attaching/detaching to/from the FT is done each time we open/close.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-10-14 11:22:07 -07:00
Alex Vesker
c8249eda7f net/mlx5e: IPoIB, Move underlay QP init/uninit to separate functions
During the creation of the underlay QP the PKEY index is unknown, the
PKEY index is known only when calling ndo_open.
PKEY index attached to the QP during state modification.

Splitting the functions will also make the code symmetric and more
readable. This split is also required for later PKEY support to be
called with the PKEY index during ndo_open.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2017-10-14 11:22:06 -07:00
Feras Daoud
7c39afb394 net/mlx5: PTP code migration to driver core section
PTP code is moved to core section of mlx5 driver in order to share
it between ethernet and infiniband. This movement involves the following
changes:
- Change mlx5e_ prefix to be mlx5_
- Add clock structs to Core
- Add clock object to mlx5_core_dev
- Call Init/Uninit clock from core init/cleanup
- Rename mlx5e_tstamp to be mlx5_clock

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-14 11:22:06 -07:00
Feras Daoud
ae904beaea net/mlx5: File renaming towards ptp core implementation
en_clock.c renamed clock.c and moved to lib/ as first step
towards relocating code to core part of the driver to allow
sharing between Ethernet and Infiniband.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-10-14 11:22:05 -07:00
Tariq Toukan
f025fd6061 net/mlx4_en: XDP_TX, assign constant values of TX descs on ring creaion
In XDP_TX, some fields in tx_info and tx_desc are constants across
all entries of the different XDP_TX rings.
Assign values to these fields on ring creation time, rather than in
data-path.

Patchset performance tests:
Tested on ConnectX3Pro, Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Single queue no-RSS optimization ON.

XDP_TX packet rate:
------------------------------
Before    | After     | Gain |
13.7 Mpps | 14.0 Mpps | %2.2 |
------------------------------

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan
f6f0aa9741 net/mlx4_en: Obsolete call to generic write_desc in XDP xmit flow
Function mlx4_en_tx_write_desc() is not optimized to use of XDP xmit.
Use the relevant parts inline instead.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Tariq Toukan
5dad61b838 net/mlx4_en: Replace netdev parameter with priv in XDP xmit function
The struct net_device parameter was passed only to extract
struct mlx4_en_priv out of it.
Here we pass the priv parameter directly.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:21:23 -07:00
Jiri Pirko
717503b9cf net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra
The only user of cls_flower->egress_dev is mlx5. So do the conversion
there alongside with the code originating the call in cls_flower
function fl_hw_replace_filter to the newly introduced egress device
callback infrastucture.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-11 20:15:43 -07:00
Inbar Karmy
80a8dc75ee net/mlx4_en: Increase number of default RX rings
Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy
b8d394367a net/mlx4_en: Limit the number of RX rings
Limit the number of RX rings by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
Inbar Karmy
7e1dc5e926 net/mlx4_en: Limit the number of TX rings
Limit the number of TX rings per UP by the number of cores
in the system.

Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-10 13:11:22 -07:00
David S. Miller
d93fa2ba64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-10-09 20:11:09 -07:00
Tariq Toukan
7ba5e7bd64 net/mlx4_en: Use __force to fix a sparse warning in TX datapath
In TX data-path, we intentionally do not byte-swap, as documented
in code and in the cited commit log.
This fixes sparse warning:
en_tx.c:720:23: warning: incorrect type in argument 1 (different base types)
en_tx.c:720:23:    expected unsigned int [unsigned] [usertype] <noident>
en_tx.c:720:23:    got restricted __be32 [usertype] doorbell_qpn

Fixes: 492f5add4b ("net/mlx4_en: Doorbell is byteswapped in Little Endian archs")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Tariq Toukan
b71322d9db net/mlx4_core: Fix cast warning in fw.c
Fix the following SPARSE warning, in MLX4_GET() macro:
drivers/net/ethernet/mellanox/mlx4/fw.c:233:9: warning: cast to restricted __be64

Fixes: 17d5ceb6e4 ("net/mlx4_core: Fix unaligned accesses")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Tariq Toukan
bb428a5c4d net/mlx4: Fix endianness issue in qp context params
Should take care of the endianness before assigning to params2 field.

Fixes: 53f33ae295 ("net/mlx4_core: Port aggregation upper layer interface")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:33:05 -07:00
Yotam Gigi
593bc28ae2 mlxsw: spectrum_switchdev: Support bridge mrouter notifications
Support the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port attribute switchdev
notification.

To do that, add the mrouter flag to struct mlxsw_sp_bridge_device, which
indicates whether the bridge device was set to be mrouter port. This field
is set when:
 - A new bridge is created, where the value is taken from the kernel
   bridge value.
 - A switchdev SWITCHDEV_ATTR_ID_BRIDGE_MROUTER notification is sent.

In addition, change the bridge MID entries to include the router port when
the bridge device is configured to be mrouter port. The MID entries are
updated in the following cases:
 - When a new MID entry is created, update the router port according to the
   bridge mrouter state.
 - When a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER notification is sent, update all
   the bridge's MID entries.

This is aligned with the case where a bridge slave is configured to be
mrouter port.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:18:11 -07:00
Yotam Gigi
c4db953f00 mlxsw: spectrum_switchdev: Add support for router port in SMID entries
In Spectrum, MDB entries point to MID entries, that indicate which ports a
packet should be forwarded to. Add the support in creating MID entries that
forward the packet to the Spectrum router port.

This will be later used to handle the bridge mrouter port switchdev
notifications.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:18:11 -07:00
Yotam Gigi
b35750f191 mlxsw: spectrum: router: Export the mlxsw_sp_router_port function
In Spectrum hardware, the router port is a virtual port that is the gateway
to the routing mechanism. Hence, in order for a packet to be L3 forwarded,
it must first be L2 forwarded to the router port inside the hardware.

Further patches in this patchset are going to introduce support in bridge
device used as an mrouter port. In this case, the router port index will be
needed in order to update the MDB entries to include the router port. Thus,
export the mlxsw_sp_router_port function, which returns the index of the
Spectrum router port.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-09 10:18:11 -07:00
Kees Cook
55c0fcc3de net/mlx4_core: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-10-09 12:19:41 -04:00
David S. Miller
51a0c00c6b Merge tag 'mlx5-updates-2017-10-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
Mellanox, mlx5 updates 2017-10-06

This series includes some shared code updates for kernel 4.15 to both
net-next and rdma-next trees.

The series includes mlx5 low level flow steering updates and optimizations
to support firmware command parallelism for flow steering requests from
Maor Gottlieb and two other small fixes from Matan and Maor.

One fix from Matan adds error handling for when the destination
list of the flow steering rule is full.

Maor introduced a patch to avoid NULL pointer dereference on steering cleanup.

Then Some refactoring patches needed by the series for code sharing purposes.
and split the Flow Table Entry (FTE) and Flow Group (FG) creation code to two parts:
    1) Object allocation - allocate the steering node and initialize
    its resources.

    2) The firmware command execution.

This change will give us the ability to take write lock on the
parent node (e.g. FG for FTE creating) only on the software data struct allocation
and creation part of the procedure where the synchronization is really required,
and will allow us to execute multiple firmware commands simultaneously and overcome the
firmware bottleneck.

Refactor the locking scheme of the mlx5 core flow steering as follows:

1) Replace the mutex lock with readers-writers semaphore and take
    the write lock only when necessary (e.g. allocating a new flow
    table entry index or adding a node to the parent's children list).
    When we try to find a suitable child in the parent's children list
    (e.g. search for flow group with the same match_criteria of the rule)
    then we only take the read lock.

2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST)
    will have an incremental version. The version is increased when the
    entity is changed (e.g. when a new FTE was added to FG - the FG's
    version is increased).
    Versioning is used in order to determine if the last traverse of an
    entity's children is valid or a rescan under write lock is required.

Last patch adds FGs and FTEs memory pool, It is useful because these objects
are not small and could be allocated/deallocated many times.

This support improves the insertion rate of steering rules
from ~5k/sec to ~40k/sec.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 21:07:11 -07:00
Ido Schimmel
9b63ef88d3 mlxsw: spectrum: Propagate extack further for bridge enslavements
The code that actually takes care of bridge offload introduces a few
more non-trivial constraints with regards to bridge enslavements.
Propagate extack there to indicate the reason.

$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add link enp1s0np1 name enp1s0np1.20 type vlan id 20
$ ip link add name br0 type bridge
$ ip link set dev enp1s0np1.10 master br0
$ ip link set dev enp1s0np1.20 master br0
Error: spectrum: Can not bridge VLAN uppers of the same port.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 10:07:21 -07:00
Ido Schimmel
c1f2c6d025 mlxsw: spectrum: Add extack for VLAN enslavements
Similar to physical ports, enslavement of VLAN devices can also fail.
Use extack to indicate why the enslavement failed.

$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add name bond0 type bond mode 802.3ad
$ ip link set dev enp1s0np1.10 master bond0
Error: spectrum: VLAN devices only support bridge and VRF uppers.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 10:07:21 -07:00
Ido Schimmel
a69518cf0b mlxsw: spectrum_router: Avoid expensive lookup during route removal
In commit fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.

In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.

With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.

For now, avoid this optimization until it can be properly re-added in
net-next.

Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-08 10:05:27 -07:00
David S. Miller
53954cf8c5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-05 18:19:22 -07:00
David Ahern
e58376e1df mlxsw: spectrum: Add extack messages for enslave failures
mlxsw fails device enslavement for a number of reasons. Use the extack
facility to return an error message to the user stating why the enslave
is failing.

Messages are prefixed with "spectrum" so users know it is a constraint
imposed by the hardware driver. For example:
    $ ip li add br0.11 link br0 type vlan id 11
    $ ip li set swp11 master br0
    Error: spectrum: Enslaving a port to a device that already has an upper device is not supported.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 21:39:34 -07:00
Dan Carpenter
b5c7d4e54c mlxsw: spectrum: Add missing error code on allocation failure
We accidentally return success if the kmalloc_array() call fails.

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:26:58 -07:00
Dan Carpenter
b508e0b6e4 mlxsw: spectrum: Fix check for IS_ERR() instead of NULL
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL
on error.

Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:26:58 -07:00
Yotam Gigi
f60c254998 mlxsw: spectrum: mr: Support trap-and-forward routes
Add the support of trap-and-forward route action in the multicast routing
offloading logic. A route will be set to trap-and-forward action if one (or
more) of its output interfaces is not offload-able, i.e. does not have a
valid Spectrum RIF.

This way, a route with mixed output VIFs list, which contains both
offload-able and un-offload-able devices can go through partial offloading
in hardware, and the rest will be done in the kernel ipmr module.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
607feadef8 mlxsw: spectrum: mr_tcam: Add trap-and-forward multicast route
In addition to the current multicast route actions, which include trap
route action and a forward route action, add the trap-and-forward multicast
route action, and implement it in the multicast routing hardware logic.

To implement that, add a trap-and-forward ACL action as the last action in
the route flexible action set. The used trap is the ACL2 trap, which marks
the packets with offload_mr_forward_mark, to prevent the packet from being
forwarded again by the kernel.

Note: At that stage the offloading logic does not support trap-and-forward
multicast routes. This patch adds the support only in the hardware logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
a0040c8c93 mlxsw: spectrum: Add trap for multicast trap-and-forward routes
When a multicast route is configured with trap-and-forward action, the
packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
the packets from being forwarded again by the kernel ipmr module.

Due to this, it is not possible to use the already existing multicast trap
(MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
the offload_mr_fwd_mark skb field in its handler.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Yotam Gigi
2678724355 mlxsw: acl: Introduce ACL trap and forward action
Use trap/discard flex action to implement trap and forward. The action will
later be used for multicast routing, as the multicast routing mechanism is
done using ACL flexible actions in Spectrum hardware. Using that action, it
will be possible to implement a trap-and-forward route.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-03 10:06:30 -07:00
Petr Machata
85f44a15b1 mlxsw: spectrum_router: Drop a redundant condition
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:20:22 -07:00
Petr Machata
7ff176f81d mlxsw: spectrum_router: Fix a typo
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:20:22 -07:00
Petr Machata
de0f43c01a mlxsw: spectrum_router: Track RIF of IPIP next hops
When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
corresponding to a given route, and decides based on whether the next
hop's offloaded flag was set. When looking for the matching next hop, it
also takes into account the device of the route, which must match next
hop's RIF.

IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
routes forwarding traffic to IP-in-IP netdevices are never marked as
offloaded, even when they actually are.

Thus track RIF of IPIP next hops the same way as that of ETHERNET next
hops.

Fixes: 8f28a30976 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:18:57 -07:00
Petr Machata
28a04c7b7b mlxsw: spectrum_router: Move VRF refcounting
When creating a new RIF, bumping RIF count of the containing VR is the
last thing to be done. Symmetrically, when destroying a RIF, RIF count
is first dropped and only then the rest of the cleanup proceeds.

That's a problem for loopback RIFs. Those hold two VR references: one
for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
overlay one, and the deconfigure() callback the underlay one. But if
both overlay and underlay are the same, and if there are no other
artifacts holding the VR alive, this put actually destroys the VR. Later
on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
the VR will already have been released and the kernel crashes with NULL
pointer dereference.

The underlying problem is that the RIF under destruction ends up
referencing the overlay VR much longer than it claims: all the way until
the call to mlxsw_sp_vr_put(). So line up the reference counting
properly to reflect this. Make corresponding changes in
mlxsw_sp_rif_create() as well for symmetry.

Fixes: 6ddb7426a7 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-02 11:18:57 -07:00
Colin Ian King
45bfbc013b mlxsw: spectrum: fix uninitialized value in err
In the unlikely event that mfc->mfc_un.res.ttls[i] is 255 for all
values of i from 0 to MAXIVS-1, the err is not set at all and hence
has a garbage value on the error return at the end of the function,
so initialize it to 0.  Also, the error return check on err and goto
to err: inside the for loop makes it impossible for err to be zero
at the end of the for loop, so we can remove the redundant err check
at the end of the loop.

Detected by CoverityScan CID#1457207 ("Unitialized scalar value")

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 23:05:54 -07:00
Or Gerlitz
353f59f4d4 net/mlx5: Fix wrong indentation in enable SRIOV code
Smatch is screaming:

drivers/net/ethernet/mellanox/mlx5/core/sriov.c:112
	mlx5_device_enable_sriov() warn: inconsistent indenting

fix that.

Fixes: 7ecf6d8ff1 ('IB/mlx5: Restore IB guid/policy for virtual functions')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Matan Barak
480df991b8 net/mlx5: Fix static checker warning on steering tracepoints code
Fix this sparse complaint:

drivers/net/ethernet/mellanox/mlx5/core/./diag/fs_tracepoint.h:172:1:
	warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

Fixes: d9fea79171ee ('net/mlx5: Add tracepoints')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Gal Pressman
603e1f5bd3 net/mlx5e: Fix calculated checksum offloads counters
Instead of calculating the offloads counters, count them explicitly.
The calculations done for these counters would result in bugs in some
cases, for example:
When running TCP traffic over a VXLAN tunnel with TSO enabled the following
counters would increase:
       tx_csum_partial: 1,333,284
       tx_csum_partial_inner: 29,286
       tx4_csum_partial_inner: 384
       tx7_csum_partial_inner: 8
       tx9_csum_partial_inner: 34
       tx10_csum_partial_inner: 26,807
       tx11_csum_partial_inner: 287
       tx12_csum_partial_inner: 27
       tx16_csum_partial_inner: 6
       tx25_csum_partial_inner: 1,733

Seems like tx_csum_partial increased out of nowhere.
The issue is in the following calculation in mlx5e_update_sw_counters:
s->tx_csum_partial = s->tx_packets - tx_offload_none - s->tx_csum_partial_inner;

While tx_packets increases by the number of GSO segments for each SKB,
tx_csum_partial_inner will only increase by one, resulting in wrong
tx_csum_partial counter.

Fixes: bfe6d8d1d4 ("net/mlx5e: Reorganize ethtool statistics")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Gal Pressman
1456f69ff5 net/mlx5e: Don't add/remove 802.1ad rules when changing 802.1Q VLAN filter
Toggling of C-tag VLAN filter should not affect the "any S-tag" steering rule.

Fixes: 8a271746a2 ("net/mlx5e: Receive s-tagged packets in promiscuous mode")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Gal Pressman
b20eab15a1 net/mlx5e: Print netdev features correctly in error message
Use the correct formatting for netdev features.

Fixes: 0e405443e8 ("net/mlx5e: Improve set features ndo resiliency")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Vlad Buslov
b281208911 net/mlx5e: Check encap entry state when offloading tunneled flows
Encap entries cached by the driver could be invalidated due to
tunnel destination neighbour state changes.
When attempting to offload a flow that uses a cached encap entry,
we must check the entry validity and defer the offloading
if the entry exists but not valid.

When EAGAIN is returned, the flow offloading to hardware takes place
by the neigh update code when the tunnel destination neighbour
becomes connected.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:10 +03:00
Or Gerlitz
bdd66ac0ae net/mlx5e: Disallow TC offloading of unsupported match/action combinations
When offloading header re-write, the HW may need to adjust checksums along
the packet. For IP traffic, and a case where we are asked to modify fields in
the IP header, current HW supports that only for TCP and UDP. Enforce it, in
this case fail the offloading attempt for non TCP/UDP packets.

Fixes: d7e75a325c ('net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions')
Fixes: 2f4fe4cab0 ('net/mlx5e: Add offloading of NIC TC pedit (header re-write) actions')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:09 +03:00
Paul Blakey
ace743214e net/mlx5e: Fix erroneous freeing of encap header buffer
In case the neighbour for the tunnel destination isn't valid,
we send a neighbour update request but we free the encap
header buffer. This is wrong, because we still need it for
allocating a HW encap entry once the neighbour is available.

Fix that by skipping freeing it if we wait for neighbour.

Fixes: 232c001398 ('net/mlx5e: Add support to neighbour update flow')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:09 +03:00
Raed Salem
16f1c5bb3e net/mlx5: Check device capability for maximum flow counters
Added check for the maximal number of flow counters attached
to rule (FTE).

Fixes: bd5251dbf1 ('net/mlx5_core: Introduce flow steering destination of type counter')
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:09 +03:00
Inbar Karmy
99d3cd27f7 net/mlx5: Fix FPGA capability location
Currently, FPGA capability is located in (mdev)->caps.hca_cur,
change the location to be (mdev)->caps.fpga,
since hca_cur is reserved for HCA device capabilities.

Fixes: e29341fb3a ("net/mlx5: FPGA, Add basic support for Innova")
Signed-off-by: Inbar Karmy <inbark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:09 +03:00
Roi Dayan
38e8a5c040 net/mlx5e: IPoIB, Fix access to invalid memory address
When cleaning rdma netdevice we need to save the mdev pointer
because priv is released when we release netdev.

This bug was found using the kernel address sanitizer (KASAN).
use-after-free in mlx5_rdma_netdev_free+0xe3/0x100 [mlx5_core]

Fixes: 48935bbb7a ("net/mlx5e: IPoIB, Add netdevice profile skeleton")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-28 07:23:09 +03:00
Yotam Gigi
664375e956 mlxsw: spectrum: router: Don't ignore IPMR notifications
Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB
notifications.

Past commits added the IPMR VIF and MFC add/del notifications via the
fib_notifier chain. In addition, a code for handling these notifications in
the Spectrum router logic was added. Make the Spectrum router logic not
ignore these notifications and forward the requests to the Spectrum
multicast router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Yotam Gigi
fd890fe98f mlxsw: spectrum: Notify multicast router on RIF MTU changes
Due to the fact that multicast routes hold the minimum MTU of all the
egress RIFs and trap packets that don't meet it, notify the mulitcast
router code on RIF MTU changes.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Yotam Gigi
d42b0965b1 mlxsw: spectrum_router: Add multicast routes notification handling functionality
Add functionality for calling the multicast routing offloading logic upon
MFC and VIF add and delete notifications. In addition, call the multicast
routing upon RIF addition and deletion events.

As the multicast routing offload logic may sleep, the actual calls are done
in a deferred work. To ensure the MFC object is not freed in that interval,
a reference is held to it. In case of a failure, the abort mechanism is
used, which ejects all the routes from the hardware and triggers the
traffic to flow through the kernel.

Note: At that stage, the FIB notifications are still ignored, and will be
enabled in a further patch.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Yotam Gigi
7e50d43575 mlxsw: spectrum: router: Squash the default route table to main
Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN
FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the
RT_TABLE_MAIN table to allow local routes to be offloaded too.

By default, multicast MFC routes which are not assigned to any user
requested table are put in the RT_TABLE_DEFAULT table.

Due to the fact that offloading multicast MFC routes support in Spectrum
router logic is going to be introduced soon, squash the default table to
MAIN too.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Yotam Gigi
0e14c7777a mlxsw: spectrum: Add the multicast routing hardware logic
Implement the multicast routing hardware API introduced in previous patch
for the specific spectrum hardware.

The spectrum hardware multicast routes are written using the RMFT2 register
and point to an ACL flexible action set. The actions used for multicast
routes are:
 - Counter action, which allows counting bytes and packets on multicast
   routes.
 - Multicast route action, which provide RPF check and do the actual packet
   duplication to a list of RIFs.
 - Trap action, in the case the route action specified by the called is
   trap.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Yotam Gigi
c011ec1bbf mlxsw: spectrum: Add the multicast routing offloading logic
Add the multicast router offloading logic, which is in charge of handling
the VIF and MFC notifications and translating it to the hardware logic API.

The offloading logic has to overcome several obstacles in order to safely
comply with the kernel multicast router user API:
 - It must keep track of the mapping between VIFs to netdevices. The user
   can add an MFC cache entry pointing to a VIF, delete the VIF and add
   re-add it with a different netdevice. The offloading logic has to handle
   this in order to be compatible with the kernel logic.
 - It must keep track of the mapping between netdevices to spectrum RIFs,
   as the current hardware implementation assume having a RIF for every
   port in a multicast router.
 - It must handle routes pointing to pimreg device to be trapped to the
   kernel, as the packet should be delivered to userspace.
 - It must handle routes pointing tunnel VIFs. The current implementation
   does not support multicast forwarding to tunnels, thus routes that point
   to a tunnel should be trapped to the kernel.
 - It must be aware of proxy multicast routes, which include both (*,*)
   routes and duplicate routes. Currently proxy routes are not offloaded
   and trigger the abort mechanism: removal of all routes from hardware and
   triggering the traffic to go through the kernel.

The multicast routing offloading logic also updates the counters of the
offloaded MFC routes in a periodic work.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-27 11:33:28 -07:00
Jiri Pirko
b2925957ec mlxsw: spectrum_flower: Offload "ok" termination action
If action is "gact_ok", offload it to HW.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:26:45 -07:00
Jiri Pirko
2a52a8c6e5 mlxsw: spectrum_acl: Propagate errors from mlxsw_afa_block_jump/continue
Propagate error instead of doing WARN_ON right away.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:26:45 -07:00
Arkadi Sharshevsky
427e652aa3 mlxsw: spectrum_dpipe: Add support for controlling nexthop counters
Add support for controlling nexthop counters via dpipe.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
190d38a52a mlxsw: spectrum_dpipe: Add support for adjacency table dump
Add support for adjacency table dump.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
a5390278a5 mlxsw: spectrum: Add support for setting counters on nexthops
Add support for setting counters on nexthops based on dpipe's adjacency
table counter status. This patch also adds the ability for getting the
counter value, which will be used by the dpipe adjacency table dump
implementation in the next patches.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
f4de25fb53 mlxsw: reg: Add support for counters on RATR
In order to add the ability for setting counters on nexthops the RATR
register should be extended.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
c538adb3c6 mlxsw: spectrum_dpipe: Add initial support for the router adjacency table
Add initial support for router adjacency table. The table does lookup
based on the nexthop-group index and the local nexthop offset. After
locating the nexthop entry it sets the destination MAC address and the
egress RIF.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
c556cd2893 mlxsw: spectrum_router: Add helpers for nexthop access
This is done as a preparation before introducing the ability to dump the
adjacency table via dpipe, and to count the table size. The current table
implementation avoids tunnel entries, thus a helper for checking if
the nexthop group contains tunnel entries is also provided. The mlxsw's
nexthop representative struct stays private to the router module.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
ec2437f42b mlxsw: spectrum_router: Use helper to check for last neighbor
Use list_is_last helper to check for last neighbor.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
dbe4598c1e mlxsw: spectrum_router: Keep nexthops in a linked list
Keep nexthops in a linked list for easy access.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
c0859d697c mlxsw: Add fields for mlxsw's meta header for adjacency table
This patch adds field for mlxsw's meta header which will be used to
describe the match/action behavior of the adjacency table.

The fields are:
1. Adj_index - The global index of the nexthop group in the adjacency
   table.

2. Adj_hash_index - Local index offset which is based on packets hash
   mod the nexthop group size.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Arkadi Sharshevsky
be2336ebfd mlxsw: spectrum_dpipe: Fix indentation in header description
Fix indentation in mlxsw_meta header's description.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 20:04:35 -07:00
Daniel Borkmann
de8f3a83b0 bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 13:36:44 -07:00
Maor Gottlieb
a369d4ac4d net/mlx5: Add FGs and FTEs memory pool
Add memory pool allocation for flow groups and flow
table entry.

It is useful because these objects are not small and could
be allocated/deallocated many times.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:05 +03:00
Maor Gottlieb
f5c2ff179f net/mlx5: Allocate FTE object without lock
Allocation of new FTE is a massive operation, part of
it could be done without taking the flow group write lock.
Split the FTE allocation to two functions of actions which
need to be under lock and action which don't have.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:04 +03:00
Maor Gottlieb
bd71b08ec2 net/mlx5: Support multiple updates of steering rules in parallel
Most of the time spent on adding new flow steering rule
is executing the firmware command.
The most common action is adding a new flow steering entry.
In order to enhance the update rate we parallelize the
commands by doing the following:

1) Replace the mutex lock with readers-writers semaphore and take
the write lock only when necessary (e.g. allocating a new flow
table entry index or adding a node to the parent's children list).
When we try to find a suitable child in the parent's children list
(e.g. search for flow group with the same match_criteria of the rule)
then we only take the read lock.

2) Add versioning mechanism - each steering entity (FT, FG, FTE, DST)
will have an incremental version. The version is increased when the
entity is changed (e.g. when a new FTE was added to FG - the FG's
version is increased).
Versioning is used in order to determine if the last traverse of an
entity's children is valid or a rescan under write lock is required.

This support improves the insertion rate of steering rules
from ~5k/sec to ~40k/sec.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:03 +03:00
Maor Gottlieb
c7784b1c8a net/mlx5: Replace fs_node mutex with reader/writer semaphore
Currently, steering object is protected by mutex lock, replace
the mutex lock with reader/writer semaphore .
In this patch we still use only write semaphore. In downstream
patches we will switch part of the write locks to read locks.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:03 +03:00
Maor Gottlieb
19f100fef4 net/mlx5: Refactor FTE and FG creation code
Split the creation code to two parts:
1) Object allocation - allocate the steering node and initialize
its resources.

2) The firmware command execution.

Adding active flag to each node - this flag indicates if the
object exists in the hardware or not, if not we don't free
the hardware resource in error flow.

This change will give us the ability to take write lock on the
parent node (e.g. FG for FTE creationg) only on the first part.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:02 +03:00
Maor Gottlieb
46719d77d5 net/mlx5: Export building of matched flow groups list
Refactor the code and export the build of the matched flow groups
list to separate function.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:02 +03:00
Maor Gottlieb
75d1d187b2 net/mlx5: Move the entry index allocator to flow group
When new flow table entry is added, we search for free index
in the flow group and not in the flow table, therefore we can move
the allocator from flow table to flow group.
In downstream patches it will enable us to lock smaller part
of the steering tree.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:01 +03:00
Maor Gottlieb
800350a3f1 net/mlx5: Avoid NULL pointer dereference on steering cleanup
On cleanup, when the node is the last child of parent then it calls to
tree_put_node on the parent, if the parent's reference count
is decremented to 0 (for e.g. when deleting last destination of FTE)
then we free the parent as well and vice versa. In such a case
we will try to free the parent node again.
Increment the parent reference count before cleaning it's children
will prevent implicit release of the parent object.

Fixes: 0da2d66666 ('net/mlx5: Properly remove all steering objects')
signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:52:00 +03:00
Matan Barak
b92af5a72c net/mlx5: Fix creating a new FTE when an existing but full FTE exists
Currently, when a flow steering rule is added, we look for a FTE with
an identical value. If we find a match, we try to merge the required
destinations with the existing ones. In a case where the existing
destination list is full, the code should return an error to its
consumer. However, the current code just tries to create another FTE.
Fixing that by returning an error in this special scenario.

Fixes: f478be79a22e ("net/mlx5: Add hash table for flow groups in flow table")
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-26 20:51:51 +03:00
Tobias Klauser
92978ee801 net/mlx5: Remove redundant unlikely()
IS_ERR() already implies unlikely(), so it can be omitted.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 10:15:44 -07:00
Allen Pais
d2a0012e76 drivers: net: mlx4: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the
    function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:44:42 -07:00
Allen Pais
590deff6e7 drivers: net: mlx5: use setup_timer() helper.
Use setup_timer function instead of initializing timer with the
    function and data fields.

Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:44:42 -07:00
Nogah Frankel
ded711c87a mlxsw: spectrum_switchdev: Consider mrouter status for mdb changes
When a mrouter is registered or leaves a mid, don't update the HW.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
0166277706 mlxsw: spectrum_switchdev: Remove mrouter flood in mdb flush
In mdb flush the port is being removed from all the mids it is registered
to. But if the port is mrouter, all the mids floods to it.
This patch remove mrouter ports from mids it is not registered to in the
mdb flush.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
3ddda1178e mlxsw: spectrum_switchdev: Update the mdb of mrouter port change
Whenever a port starts / stops being mrouter, update all the mdb entries
in the HW to flood / stop flooding mc packets there.
The change should happen only if the port is not in the mid. (If it is,
the mid should flood mc packets to this port anyway)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
3fba877cb6 mlxsw: spectrum_switchdev: Flood all mc packets to mrouter ports
When mc is enabled, whenever a mc packet doesn't hit any mdb entry it is
being flood to the ports marked as mrouters. However, all mc packets should
be flooded to them even if they match an entry in the mdb.
This patch adds the mrouter ports to every mdb entry that is being written
to the HW.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
bb5355b27c mlxsw: spectrum_switchdev: Flush the mdb when a port is being removed
When a port is being removed from a bridge, flush the bridge mdb to remove
the mids of that port.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
9dad51bdaa mlxsw: spectrum_switchdev: Flood mc when mc is disabled by user flag
When multicast is disabled, flood mc packets only to port that are marked
BR_MCAST_FLOOD (instead to all).

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:13 -07:00
Nogah Frankel
218a8f8a63 mlxsw: spectrum_switchdev: Use generic mc flood function
Use the generic mc flood function to decide whether to flood mc to a port
when mc is being enabled / disabled.
Move this function in the file to avoid forward declaration.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
2e3496cd34 mlxsw: spectrum_switchdev: Disable mdb when mc is disabled
Remove all the mdb entries from the HW when mc is being disabled and
re-write them when it is being enabled.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
846fd8a0e7 mlxsw: spectrum_switchdev: Don't write mids to the HW when mc is disabled
Don't write multicast related data to the HW when mc is disabled.
Also, don't allocate mid id to new mids (so the remove function could know
that they weren't wrote to the HW)

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
061e55bfb8 mlxsw: spectrum_switchdev: Break mid deletion into two function
Break mid deletion into two function, so it will be possible in the future
to delete a mid entry for other reasons then switchdev command (like port
deletion).

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
73b433e803 mlxsw: spectrum_switchdev: Attach mid id allocation to HW write
Attach mid getting and releasing mid id to the HW write / remove, and add
a flag to indicate whether the mid is in the HW. It is done because mid id
is also HW index to this mid.
This change allows adding in the following patches the ability to have a
mid in the mdb cache but not in the HW. It will be useful for being able
to disable the multicast.
It means that the mdb is being written / delete to the HW in the mid
allocation / removing function, not after them.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
5f9abc597c mlxsw: spectrum_switchdev: Break smid write function
Break the smid write function into two, one that cleans the ports that
might be still written there and one that changes an exiting mid entry.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
b80888a919 mlxsw: spectrum_switchdev: Save mids list per bridge device
Instead of saving all the mids in the same list, save them per vlan
device. This change allows a more efficient mid find.
Also, in the next patches, there will be added a lot of loops over all the
mids in bridge device for multicast disable, mrouter change and ndb flush.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
0161b9505a mlxsw: spectrum_switchdev: Remove reference count from mid
Since there is a bitmap for the ports registered to each mid, there is no
need for a ref count, since it will always be the number of set bits in
this bitmap. Any check of the ref count was replaced with checking if the
bitmap is empty.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
4cdc35e4eb mlxsw: spectrum_switchdev: Add a ports bitmap to the mid db
Add a bitmap of ports to the mid struct to hold the ports that are
registered to this mid.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Nogah Frankel
dff37b58ca mlxsw: spectrum_switchdev: Change mc_router to mrouter
Change the naming of mc_router to mrouter to keep consistency.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-20 18:03:12 -07:00
Yotam Gigi
b48cfc80ce mlxsw: spectrum: Add multicast router traps and trap groups
Add three new traps needed for multicast routing:
 - PIM: Trap for PIM protocol control packets.
 - RPF: Trap for packets that fail the RPF check on a specific hardware
   route entry.
 - MULTICAST: Generic trap for multicast. It is used for routes that trap
   the packets to the CPU.

The RPF and MULTICAST traps have rate limiters as these traps may have
line-rate of packets trapped. The PIM trap has a rate limiter similarly to
other L3 control protocols. The rate limiters are implemented by adding
three new trap groups for the newly introduced traps.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
91e4d59a46 mlxsw: spectrum_router: Export RIF dev access function
The mlxsw_sp_rif struct, defined as private struct in spectrum_router.c
will be used in the multicast router source file. Due to the fact that the
dev field will be needed by the multicast router logic, add an access
function to it.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
4af5964e58 mlxsw: reg: Configure RIF to forward IPv4 multicast packets by default
Turn on two bits on the Spectrum RIF configuration:
 - IPv4 multicast: when a multicast packet arrives on a RIF, send it to go
   through multicast routes lookup.
 - IPv4 multicast forwarding enable: when multicast packet arrives on a
   RIF, allow it to be forwarded by multicast routes. If this bit is not
   set, multicast packets will go through multicast routing lookup but will
   be dropped at the egress of the ports.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
4fc92846f6 mlxsw: reg: Add Router Rules Copy Register
The RRCR register is used for copying and moving TCAM multicast routes
from different offsets. It will be used to allow routes relocation for
parman ops as part of the multicast router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
2e654e33c5 mlxsw: reg: Add the Router Multicast Forwarding Table Version 2 register
The RMFT-V2 register is used to configure and query the multicast table and
will be used by the multicast router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
771ced742a mlxsw: resources: Add multicast ERIF list entries resource
The multicast ERIF list entries resource indicates the number of entries
that can be put in one rigr2 register operation. While the register can
hold up to MLXSW_REG_RIGR2_MAX_ERIFS ( = 32) ERIF entries, the actual
number allowed by firmware is indicated with this resource.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
5080c7e917 mlxsw: reg: Add the Router Interface Group Version 2 register
The RIGR-V2 register is used to add, remove and query egress interface list
of a multicast forwarding entry and it will be used by the multicast
router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
46a7054eba mlxsw: reg: Add The Router TCAM Allocation register
This register is used for allocation of regions in the TCAM table and it
will be used by the multicast router offloading logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
5872656551 mlxsw: reg: Rename the flexible action set length field
The MLXSW_REG_PXXX_FLEX_ACTION_SET_LEN is relevant for the multicast router
registers too, so rename it to have a general name which is not bound to a
specific register.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
9cb3fa940e mlxsw: acl: Change trap ACL action to get the trap_id as a parameter
Allow the trap ACL action to be configured with different traps. This
allows the multicast router offloading code to use that same ACL action
with the multicast router traps. By using different traps, the multicast
router can have different trap policies and can handle the packet
differently.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
4b8a79ff27 mlxsw: acl: Introduce mcrouter ACL action
The Spectrum multicast forwarding is done using an ACL action. Add the
mcrouter ACL action that will be used to offload the multicast router
logic.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
d3b939b8f9 mlxsw: spectrum: Move ACL flexible actions instance to spectrum
A flexible action instance allows, given a set of ops, creating, committing
and sharing a set of ACL action blocks. The flexible action instance in
question is using the spectrum KVD linear space to store the flexible
action sets.

Move this flexible action instance to the common spectrum struct to allow
other users (such as multicast router) to get that functionality.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:40 -07:00
Yotam Gigi
e2b2d35a05 mlxsw: spectrum: Change init order
The multicast router offloading code is going to require the counter_pools
initialization to occur before the router initialization, thus, change the
spectrum initialization order to fix it.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-19 14:21:39 -07:00
Ido Schimmel
8e29f97979 mlxsw: spectrum_router: Only handle IPv4 and IPv6 events
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.

This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.

Fixes: 65e65ec137 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-16 09:21:43 -07:00
Yuval Mintz
6399ebcccf mlxsw: spectrum: Prevent mirred-related crash on removal
When removing the offloading of mirred actions under
matchall classifiers, mlxsw would find the destination port
associated with the offloaded action and utilize it for undoing
the configuration.

Depending on the order by which ports are removed, it's possible that
the destination port would get removed before the source port.
In such a scenario, when actions would be flushed for the source port
mlxsw would perform an illegal dereference as the destination port is
no longer listed.

Since the only item necessary for undoing the configuration on the
destination side is the port-id and that in turn is already maintained
by mlxsw on the source-port, simply stop trying to access the
destination port and use the port-id directly instead.

Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-12 20:42:29 -07:00
Arkadi Sharshevsky
4400081b63 mlxsw: spectrum: Fix EEPROM access in case of SFP/SFP+
The current code does not handle correctly the access to the upper page
in case of SFP/SFP+ EEPROM. In that case the offset should be local
and the I2C address should be changed.

Fixes: 2ea109039c ("mlxsw: spectrum: Add support for access cable info via ethtool")
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-11 10:40:59 -07:00
Linus Torvalds
aae3dbb477 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

 2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

 3) Allow generic XDP to work on virtual devices, from John Fastabend.

 4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

 5) Remove UFO offloads from the tree, gave us little other than bugs.

 6) Remove the IPSEC flow cache, from Florian Westphal.

 7) Support ipv6 route offload in mlxsw driver.

 8) Support VF representors in bnxt_en, from Sathya Perla.

 9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

13) Add new jump instructions to eBPF, from Daniel Borkmann.

14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

15) Support XDP in tap driver, from Jason Wang.

16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

17) Add Huawei hinic ethernet driver.

18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
  i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
  i40e: avoid NVM acquire deadlock during NVM update
  drivers: net: xgene: Remove return statement from void function
  drivers: net: xgene: Configure tx/rx delay for ACPI
  drivers: net: xgene: Read tx/rx delay for ACPI
  rocker: fix kcalloc parameter order
  rds: Fix non-atomic operation on shared flag variable
  net: sched: don't use GFP_KERNEL under spin lock
  vhost_net: correctly check tx avail during rx busy polling
  net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
  rxrpc: Make service connection lookup always check for retry
  net: stmmac: Delete dead code for MDIO registration
  gianfar: Fix Tx flow control deactivation
  cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
  cxgb4: Fix pause frame count in t4_get_port_stats
  cxgb4: fix memory leak
  tun: rename generic_xdp to skb_xdp
  tun: reserve extra headroom only when XDP is set
  net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
  net: dsa: bcm_sf2: Advertise number of egress queues
  ...
2017-09-06 14:45:08 -07:00
Thomas Meyer
691223ec97 net/mlx4_core: Use ARRAY_SIZE macro
Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
yourself.
Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
/ARRAY_SIZE(\1)/g' and manual check/verification.

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-05 11:49:16 -07:00
David S. Miller
18a4ded9d1 mlx5-updates-2017-09-03
This series from Tariq includes micro data path optimization for mlx5e
 netdevice driver.
 
 Mainly Tariq introduces the following changes to NAPI and RX handling
 path of the driver:
  - RX ring structure reorganizing
  - Trivial code refactoring and optimization
  - NAPI busy-poll for when fast UMR is in progress
  - Non-atomic state operations in NAPI context
  - Remove unnecessary fields from fast path structures
  - page-cache micro optimization
  - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts
  - Stop NAPI when irq changes affinity
  - Distribute RSS table among all RX rings
 
 Thanks,
 Saeed.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZq3r/AAoJEEg/ir3gV/o+ZtwIAK6LcyUxNSa+Q8a7+57EIQgj
 xwA76EG1x5MskZ30QRJpZG6VP6C2WEOtG0/WDi6yfEzZh5J0+clqVv1cHVIJIFhC
 vB+0FCl8GIlTE/VMpRFqFTZapz6/BWCWNQEW3a1raHb026cpeRzq7c+g1x4lKXx5
 RN0QhOd/G+yUz6A+xt6GCRlHsIkvFpigL90rhfQqcvg/T8QepxZ1trJiytpu2J51
 OhEtnl9mIapgj0Z9nQMMKV+BnLSaJxlJ2j5xGWa5x8zuySrGv/P26TYDsAvYp8pV
 +f9OgISoQr7d2mWyus3IYSi31F3AkbGE01K3vBQTXpyd8pdEqv5bzXXWJtQASeQ=
 =dmvv
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2017-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-09-03

This series from Tariq includes micro data path optimization for mlx5e
netdevice driver.

Mainly Tariq introduces the following changes to NAPI and RX handling
path of the driver:
 - RX ring structure reorganizing
 - Trivial code refactoring and optimization
 - NAPI busy-poll for when fast UMR is in progress
 - Non-atomic state operations in NAPI context
 - Remove unnecessary fields from fast path structures
 - page-cache micro optimization
 - Rely on NAPI to avoid missing an IRQ for RX/TX shared NAPI contexts
 - Stop NAPI when irq changes affinity
 - Distribute RSS table among all RX rings
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 21:17:07 -07:00
Petr Machata
ee954d1a91 mlxsw: spectrum_router: Support GRE tunnels
This patch introduces callbacks and tunnel type to offload GRE tunnels.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
92107cfb41 mlxsw: spectrum_router: Add loopback accessors
struct mlxsw_sp_rif is a router-private structure, and therefore
everything related to it is as well: parameters, and derived RIF types
including loopbacks. IPIP module needs access to some details of
loopback interfaces, but exporting all the RIF shebang would create too
large an interface.

So instead export just the bare minimum necessary: accessors for RIF
index and underlay VRF ID.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
86484de2c9 mlxsw: spectrum: Register for IPIP_DECAP_ERROR trap
These traps are generated for packets that fail checks for source IP,
encapsulation type, or GRE key. Trap these packets to CPU for follow-up
handling by the kernel, which will send ICMP destination unreachable
responses.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
1cc38fb144 mlxsw: spectrum_router: Use existing decap route
The local route that points at IPIP's underlay device (decap route) can
be present long before the GRE device. Thus when an encap route is
added, it's necessary to look inside the underlay FIB if the decap route
is already present. If so, the current trap offload needs to be
withdrawn and replaced with a decap offload.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
4607f6d269 mlxsw: spectrum_router: Support IPv4 underlay decap
Unlike encapsulation, which is represented by a next hop forwarding to
an IPIP tunnel, decapsulation is a type of local route. It is created
for local routes whose prefix corresponds to the local address of one of
offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
next hops are removed), the decap offload is migrated back to a trap for
resolution in slow path.

This patch assumes that decap route is already present when encap route
is added. A follow-up patch will fix this issue.

Note that this patch only supports IPv4 underlay. Support for IPv6
underlay will be subject to follow-up work apart from this patchset.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
8f28a30976 mlxsw: spectrum_router: Support IPv6 overlay encap
Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
offloading of IPv6 overlay encapsulation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:26 -07:00
Petr Machata
1012b9ac28 mlxsw: spectrum_router: Support IPv4 overlay encap
This introduces some common code for tracking of offloaded IP-in-IP
tunnels, and support for offloading IPv4 overlay encapsulating routes in
particular. A follow-up patch will introduce IPv6 overlay as well.

Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
objects hooked up in mlxsw_sp_router. A network device that represents
the tunnel is used as a key to look up the corresponding IPIP entry.
Note that in the future, more general keying mechanism will be needed,
because parts of the tunnel information can be provided by the route.

IPIP entries are reference counted, because several next hops may end up
using the same tunnel, and we only want to offload it once.

Encapsulation path hooks into next hop handling. Routes that forward to
a tunnel are now considered gateway routes, thus giving them the same
treatment that other remote routes get. An IPIP next hop type is
introduced.

Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
known tunnel types, the next-hop is not considered an IPIP next hop.

The list of IPIP tunnel types is currently empty, follow-up patches will
add support for GRE. Traffic to IPIP tunnel types that are not
explicitly recognized by the driver traps and is handled in slow path.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
35225e4740 mlxsw: spectrum_router: Make nexthops typed
In the router, some next hops may reference an encapsulating netdevice,
such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
keep track of whether a given next hop is a regular Ethernet entry, or
an IP-in-IP tunneling entry.

To facilitate this book-keeping, add a type field to struct
mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
variant.

There are several places where next hops are initialized in the IPv4
path. Instead of replicating the logic at every one of them, factor it
out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
but create a corresponding protocoled _fini function that dispatches to
the protocol-neutral one.

The IPv6 path is simpler, but for symmetry with IPv4, create the same
suite of functions with corresponding logic.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
f6050ee6f4 mlxsw: spectrum_router: Extract mlxsw_sp_rt6_is_gateway()
IPv6 counterpart of the previous patch: introduce a function to
determine whether a given route is a gateway route.

The new function takes a mlxsw_sp argument which follow-up patches will
use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
9b01451ad5 mlxsw: spectrum_router: Extract mlxsw_sp_fi_is_gateway()
For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
devices need to be considered gateway routes as well. That involves a
bit more logic, so extract the current test to a separate function,
where the logic can be later added.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
6ddb7426a7 mlxsw: spectrum_router: Introduce loopback RIFs
When offloading L3 tunnels, an adjacency entry is created that loops the
packet back into the underlay router. Loopback interfaces then hold the
corresponding information and are created for IP-in-IP netdevices.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
010cadf916 mlxsw: spectrum_router: Support FID-less RIFs
Loopback RIFs, which will be introduced in a follow-up patch, differ
from other RIFs in that they do not have a FID associated with them.

To support this, demote FID allocation from mlxsw_sp_rif_create to
configure op of the existing RIF types, and likewise the FID release
from mlxsw_sp_rif_destroy to deconfigure op.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
38ebc0f454 mlxsw: spectrum_router: Add mlxsw_sp_ipip_ops
Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
determine whether a constructed RIF should be a loopback, and to decide
whether a next hop references a tunnel.

The list is currently empty, follow-up patches will add support for GRE.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
ff1f06ce9d mlxsw: spectrum_router: Publish mlxsw_sp_l3proto
The spectrum_ipip module that will be introduced in the follow-up
patches needs to know the data type.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
89e419828f mlxsw: reg: Give mlxsw_reg_ratr_pack a type parameter
To support IPIP, the driver needs to be able to construct an IPIP
adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
argument. Adjust the one existing caller.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
9571e828f4 mlxsw: reg: Extract mlxsw_reg_ritr_mac_pack()
Unlike other interface types, loopback RIFs do not have MAC address. So
drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
to a new function. Call that from callers of mlxsw_reg_ritr_pack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:25 -07:00
Petr Machata
1e659ebf58 mlxsw: reg: Add Routing Tunnel Decap Properties Register
The RTDP register is used for configuring the tunnel decap properties of
NVE and IPinIP.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:24 -07:00
Petr Machata
a43da820c8 mlxsw: reg: Add mlxsw_reg_ralue_act_ip2me_tun_pack()
To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type
IP2ME with tunnel validity bit and tunnel pointer set. The necessary
register fields are already available, so add a function to pack the
RALUE as appropriate.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:24 -07:00
Petr Machata
6c4153b1e7 mlxsw: reg: Move enum mlxsw_reg_ratr_trap_id
This enum is used with reg_ratr_trap_id, so move it next to the register
definition.

While at it, drop the enumerator initializers.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:24 -07:00
Petr Machata
7c819de438 mlxsw: reg: Update RATR to support IP-in-IP tunnels
So far, adjacencies have always been of type Ethernet (with value of 0),
and thus there was no need to explicitly support RATR type. However to
support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific
attributes need to be added.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:24 -07:00
Petr Machata
99ae8e3e5e mlxsw: reg: Update RITR to support loopback device
Update the register so that loopback RIFs can be created and loopback
properties specified.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 20:23:24 -07:00
Linus Torvalds
aa9d4648c2 Updates for 4.14 kernel merge window
- Lots of hfi1 driver updates (mixed with a few qib and core updates as
   well)
 - rxe updates
 - various mlx updates
 - Set default roce type to RoCEv2
 - Several larger fixes for bnxt_re that were too big for -rc
 - Several larger fixes for qedr that, likewise, were too big for -rc
 - Misc core changes
 - Make the hns_roce driver compilable on arches other than aarch64 so we
   can more easily debug build issues related to it
 - Add rdma-netlink infrastructure updates
 - Add automatic IRQ affinity infrastructure
 - Add 32bit lid support
 - Lots of misc fixes across the subsystem from random people
 - Autoloading of RDMA netlink modules
 - PCI pool cleanups from Romain Perier
 - mlx5 driver feature additions and fixes
 - Hardware tag matchine feature
 - Fix sleeping in atomic when resolving roce ah
 - Add experimental ioctl interface as posted to linux-api@
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZqBDtAAoJELgmozMOVy/dNlcQAJhYNRGaNUBx0L6+8t2xwUrt
 7ndP6qlMar30DJY9FjTQCzRBw0CRMWkXdJD8rYlyaHy07pjWDKG8LZtxEXu1FLdZ
 oNRvQX6ZJh8Bz7db2SQFBCTF2uWGZZFqWQCrSbQwjj9xxjMDs59u/knmwHVY9dKk
 egjPG4IQBDmcTeNY7h1otG2hXpx7QPIOilQW2EFN5SWAuBAazdF2JKxjjxqhnUfp
 gD2pSdgsm3VSMoo0zpMa6qOP+9GcOu8J97fYFhasRYWCavPdWHyq+XNu9S/eicRd
 xbv+seCYM+9jPb2dsNdjEKll7w3yyWdu7h6tSCMPYv54eN9sDDiO1w2L2ZnESMZa
 JRnSfB+HXru1r4RyHOTPO8peaNhYlR1V4u8bTS5G2dffbHis9BajkWoAR/oSiUcB
 AIjIIDcdJFVGfpF9KIt/pEl+adHNgESibSijzOUYkyw6RNbPqDmdd7YakPHcQhKN
 clE3zQfIsPRLWsToP/nkBE0tUd3tQocRuLy7ote7hXQK+0p7TBz0a6Kkj87MvX33
 8dVbUI+q6WRlEY90l71y0ZdXy/AvkxkFxAc4Y7FQZyJxhEArTaKgfa5fmpRwVxBm
 yi9baoYCspHNRNv6AO4IL86ZCJqmWBuch8CBY1n2X3h8IGfKYEZUAZ+T/mnTTeUq
 A4joXduz94ZD4w23leD1
 =2ntC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is a big pull request.

  Of note is that I'm sending you the new ioctl API for the rdma
  subsystem. We put it up on linux-api@, but didn't get much response.
  The API is complex, but it solves two different problems in one go:

   1) The bi-directional nature of the RDMA file write calls, which
      created the security hole we had to handle (and for which the fix
      is now causing problems for systems in production, we were a bit
      over zealous in the fix and the ability to open a device, then
      fork, then create new queue pairs on the device and use them is
      broken).

   2) The bloat caused by different vendors implementing extensions to
      the base verbs API. Each vendor's hardware is slightly different,
      and the hardware might be suitable for one extension but not
      another.

      By the time we add generic extensions for all the different ways
      that the different hardware can offload things, the API becomes
      bloated. Things like our completion structs have started to exceed
      a cache line in size because of all the elements needed to support
      this. That in turn shows up heavily in the performance graphs with
      a noticable drop in performance on 100Gigabit links as our
      completion structs go from occupying one cache line to 1+.

      This API makes things like the completion structs modular in a
      very similar way to netlink so that your structs can only include
      the items needed for the offloads/features you are actually using
      on a given queue pair. In that way we support everything, but only
      use what we need, and our structs stay smaller.

  The ioctl API is better explained by the posting on linux-api@ than I
  can explain it here, so I'll just leave it at that.

  The rest of the pull request is typical stuff.

  Updates for 4.14 kernel merge window

   - Lots of hfi1 driver updates (mixed with a few qib and core updates
     as well)

   - rxe updates

   - various mlx updates

   - Set default roce type to RoCEv2

   - Several larger fixes for bnxt_re that were too big for -rc

   - Several larger fixes for qedr that, likewise, were too big for -rc

   - Misc core changes

   - Make the hns_roce driver compilable on arches other than aarch64 so
     we can more easily debug build issues related to it

   - Add rdma-netlink infrastructure updates

   - Add automatic IRQ affinity infrastructure

   - Add 32bit lid support

   - Lots of misc fixes across the subsystem from random people

   - Autoloading of RDMA netlink modules

   - PCI pool cleanups from Romain Perier

   - mlx5 driver feature additions and fixes

   - Hardware tag matchine feature

   - Fix sleeping in atomic when resolving roce ah

   - Add experimental ioctl interface as posted to linux-api@"

* tag 'for-linus-ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (328 commits)
  IB/core: Expose ioctl interface through experimental Kconfig
  IB/core: Assign root to all drivers
  IB/core: Add completion queue (cq) object actions
  IB/core: Add legacy driver's user-data
  IB/core: Export ioctl enum types to user-space
  IB/core: Explicitly destroy an object while keeping uobject
  IB/core: Add macros for declaring methods and attributes
  IB/core: Add uverbs merge trees functionality
  IB/core: Add DEVICE object and root tree structure
  IB/core: Declare an object instead of declaring only type attributes
  IB/core: Add new ioctl interface
  RDMA/vmw_pvrdma: Fix a signedness
  RDMA/vmw_pvrdma: Report network header type in WC
  IB/core: Add might_sleep() annotation to ib_init_ah_from_wc()
  IB/cm: Fix sleeping in atomic when RoCE is used
  IB/core: Add support to finalize objects in one transaction
  IB/core: Add a generic way to execute an operation on a uobject
  Documentation: Hardware tag matching
  IB/mlx5: Support IB_SRQT_TM
  net/mlx5: Add XRQ support
  ...
2017-09-03 17:49:17 -07:00
Colin Ian King
942e7e5fc1 net/mlx4_core: fix incorrect size allocation for dev->caps.spec_qps
The current allocation for dev->caps.spec_qps is for the size of the
pointer and not the size of the actual  mlx4_spec_qps structure.  Fix
this by using the correct size.   Also splint allocation over a few
lines to make it cppcheck clean on overly wide lines.

Detected by CoverityScan, CID#1455222 ("Wrong sizeof argument")

Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 10:57:10 -07:00
Colin Ian King
542deb88b0 net/mlx4_core: fix memory leaks on error exit path
The structures hca_param and func_cap are not being kfree'd on an error
exit path causing two memory leaks. Fix this by jumping to the existing
free memory error exit path.

Detected by CoverityScan, CID#1455219, CID#1455224 ("Resource Leak")

Fixes: c73c8b1e47 ("net/mlx4_core: Dynamically allocate structs at mlx4_slave_cap")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-03 10:57:10 -07:00
Tariq Toukan
d4b6c48800 net/mlx5e: Distribute RSS table among all RX rings
In default, uniformly distribute the RSS indirection table entries
among all RX rings, rather than restricting this only to the rings
on the close NUMA node. irqbalancer would anyway dynamically override
the default affinities set to the RX rings.
This gives better multi-stream performance and CPU util.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
a8c2eb1579 net/mlx5e: Stop NAPI when irq balancer changes affinity
NAPI context keeps rescheduling on same CPU as long as it's busy.
This doesn't give the oppurtunity for changes in irq affinities
to take effect.
Fix that by calling napi_complete_done() upon a change in affinity.
This would stop the NAPI and reschedule it on the new CPU.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
7b33aaeaae net/mlx5e: Use kernel's mechanism to avoid missing NAPIs
We used a channel state bit MLX5E_CHANNEL_NAPI_SCHED to make
sure no NAPI is missed when a channel's napi_schedule() is called
for completion events of the different channel's resources/rings
while NAPI is currently running.
Now, as similar mechanism is implemented in kernel,
("39e6c8208d7b net: solve a NAPI race"),
we obsolete our own implementation and rely on the return value
of napi_complete_done().

This patch removes a redundant overhead of atomic bit operations.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
29c2849e0d net/mlx5e: Slightly increase RX page-cache size
In XDP_TX flow, we now get back quicker to each page in page-cache,
and on some occasions refcount does not get back to 1 on time, causing
some costly page allocations.
Slightly increase the size of RX page-cache to significantly decrease
the chances for this to happen.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
70871f1ec4 net/mlx5e: Don't recycle page if moved to far NUMA
Avoid recycling an RX page if it moved to another NUMA node.
Add an ethtool counter to count such events.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
3b56f7b2af net/mlx5e: Remove unnecessary fields in ICO SQ
As of current design, in each NAPI, only a single UMR WQE
completion could be available in the completion queue of the
the internal control operations (ICO) send queue, in addition
to nop operations that require no actions upon completion.
This renders the consume index obsolete, as the wqe_counter
field in CQE is sufficient.

This helps removing a memory barrier, and obsoletes the need
for tracking the num_wqebbs to update the consumer counter.

In addition, remove other unused fields in icosq struct:
pdev, dma_fifo_pc, and prev_cc.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
7cc6d77bb5 net/mlx5e: Type-specific optimizations for RX post WQEs function
Separate the RX post WQEs function of the different RQ types.
This enables RQ type-specific optimizations in data-path.

Poll the ICOSQ completion queue only for Striding RQ,
and only when a UMR post completion could be possibly available.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
a071cb9f25 net/mlx5e: Non-atomic RQ state indicator for UMR WQE in progress
The indication for a UMR WQE in progress is needed only within
the NAPI context, and hence no races possible and no need for
the use of atomic operations.
The only place the flag is read outside of NAPI context is
in closure flow, after RQ is disabled flag is no more accessed
in NAPI.
Use a boolean instead of a bit in ring state, so that its
non-atomic set operations do not race with the atomic sets of
the other bits.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
a1eaba4c5c net/mlx5e: Non-atomic indicator for ring enabled state
Rings enabled state change occurs in control path only, and is always
followed by a napi_sychronize(), so that following NAPIs read the
new value. This read does not need to be atomic.

The RQ auto-moderation bit is not set/cleared in data-path.
No need for atomic read, a regular read operation is sufficient.
In RQ creation time as well, there's no multiple threads trying
to access it yet, hence a regular read can be used.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
604acb193b net/mlx5e: Refactor data-path lro header function
Refactor function mlx5e_lro_update_hdr() to reduce number of
branches.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:09 +03:00
Tariq Toukan
4b7dfc9925 net/mlx5e: Early-return on empty completion queues
NAPI context handles different kinds of completion queues
(RX, TX, and others). Hence, upon a poll trial, some of them
might be empty.
Here we early-return upon empty completion queues, as well as
full rx buffer, and save unnecessary logic and memory barriers.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
4cbb755801 net/mlx5e: NAPI busy-poll when UMR post is in progress
If a UMR post is in progress, it means that there's a missing
WQE in RQ, and that a completion will be shortly available in
ICO SQ completion queue. Prefer busy-poll to handle it as soon
as possible.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
4c2af5cc2b net/mlx5e: Small enhancements for RX MPWQE allocation and free
The dma offset of a MPWQE (Multi-Packet WQE) in memory region
is fixed for all rounds. Calculate it once on creation time,
instead of in runtime. This also obsoletes the wqe argument in
the function.

In addition, optimize dma_info iterator calculation.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
9bafe2adab net/mlx5e: Use memset to init skbs_frags array to zeros
In RX data-path, use memset() instead of loop assignment
to init the whole skbs_frags array.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
b681c481f1 net/mlx5e: Remove unnecessary wqe_sz field from RQ buffer
Field is used only locally within the RQ create function.
The use of a local variable is sufficient.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
89e89f7a9f net/mlx5e: Replace multiplication by stride size with a shift
In RX data-path, use shift operations instead of a regular multiplication
by stride size, as it is a power of two.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
Tariq Toukan
b45d8b50b8 net/mlx5e: Reorganize struct mlx5e_rq
Bring fast-path fields together, and combine RX WQE mutual
exclusive fields into a union.

Page-reuse and XDP are mutually exclusive and cannot be used at
the same time.
Use a union to combine their footprints.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-09-03 06:34:08 +03:00
David S. Miller
6026e043d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Three cases of simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 17:42:05 -07:00