Commit Graph

6374 Commits

Author SHA1 Message Date
Leon Romanovsky
bb7664d369 net/mlx5: Update gid.c new cmd interface
Do mass update of gid.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:09 +03:00
Leon Romanovsky
5d19395f69 net/mlx5: Update lag.c new cmd interface
Do mass update of lag.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:08 +03:00
Leon Romanovsky
59ad21c21f net/mlx5: Update fw.c new cmd interface
Do mass update of fw.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:07 +03:00
Leon Romanovsky
31a0956ea9 net/mlx5: Update fs_core new cmd interface
Do mass update of fs_core to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:07 +03:00
Leon Romanovsky
b316e1866f net/mlx5: Update FPGA to new cmd interface
Do mass update of FPGA to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:06 +03:00
Leon Romanovsky
e08a6832f9 net/mlx5: Update eswitch to new cmd interface
Do mass update of eswitch to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:05 +03:00
Leon Romanovsky
a184cda1bb net/mlx5: Update statistics to new cmd interface
Do mass update of statistics to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:05 +03:00
Leon Romanovsky
49d7fcd127 net/mlx5: Update eq.c to new cmd interface
Do mass update of eq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:04 +03:00
Leon Romanovsky
9aa536ad45 net/mlx5: Update ecpf.c to new cmd interface
Do mass update of ecpf.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:04 +03:00
Leon Romanovsky
e36fb468d2 net/mlx5: Update debugfs.c to new cmd interface
Do mass update of debugfs.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:03 +03:00
Leon Romanovsky
d1f620500c net/mlx5: Update cq.c to new cmd interface
Do mass update of cq.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:03 +03:00
Leon Romanovsky
5d1c9a114a net/mlx5: Update vport.c to new cmd interface
Do mass update of vport.c to reuse newly introduced
mlx5_cmd_exec_in*() interfaces.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-23 21:42:02 +03:00
Zhu Yanjun
dcdf4ce0ff net/mlx5e: Get the latest values from counters in switchdev mode
In the switchdev mode, when running "cat
/sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
accessed to get the latest values. But currently this command can
not get the correct values from ppcnt.

From firmware manual, before getting the 802_3 counters, the 802_3
data layout should be set to the ppcnt register.

When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
run, before updating 802_3 data layout with ppcnt register, the
monitor counters are tested. The test result will decide the
802_3 data layout is updated or not.

Actually the monitor counters do not support to monitor rx/tx
stats of 802_3 in switchdev mode. So the rx/tx counters change
will not trigger monitor counters. So the 802_3 data layout will
not be updated in ppcnt register. Finally this command can not get
the latest values from ppcnt register with 802_3 data layout.

Fixes: 5c7e8bbb02 ("net/mlx5e: Use monitor counters for update stats")
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:30:22 -07:00
Saeed Mahameed
96c34151d1 net/mlx5: Kconfig: convert imply usage to weak dependency
MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
MLXFW and PCI_HYPERV_INTERFACE.

This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
regardless of their config states.

Due to the changes in the cited commit below, the semantics of 'imply'
was changed to not force any restriction on the implied config.

As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
would result in undefined references, as VXLAN now would stay as 'm'.

To fix this we change MLX5_CORE to have a weak dependency on
these modules/configs and make sure they are reachable, by adding:
depend on symbol || !symbol.

For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m

Fixes: def2fbffe6 ("kconfig: allow symbols implied by y to become m")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
2020-04-20 14:30:22 -07:00
Maxim Mikityanskiy
e7e0004abd net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns
XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
ICOSQ. When the application floods the driver with wakeup requests by
calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
the XSK ICOSQ may overflow.

Multiple NOPs are not required and won't accelerate the process, so
avoid posting a second NOP if there is one already on the way. This way
we also avoid increasing the queue size (which might not help anyway).

Fixes: db05815b36 ("net/mlx5e: Add XSK zero-copy support")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:30:22 -07:00
Paul Blakey
70840b66da net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation
After allowing parallel tuple insertion, we get the following trace:

[ 5505.142249] ------------[ cut here ]------------
[ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180
[ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G           OE     5.6.0+ #78
[ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017
[ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table]
[ 5505.323257] RIP: 0010:delete_node+0x16c/0x180
[ 5505.349862] RSP: 0018:ffffb19184eb7b30 EFLAGS: 00010282
[ 5505.356785] RAX: 0000000000000000 RBX: ffff904ac95b86d8 RCX: ffff904b6f938838
[ 5505.365190] RDX: 0000000000000000 RSI: ffff904ac954b908 RDI: ffff904ac954b920
[ 5505.373628] RBP: ffff904b4ac13060 R08: 0000000000000001 R09: 0000000000000000
[ 5505.382155] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000
[ 5505.390527] R13: ffffb19184eb7bfc R14: ffff904b6bef5800 R15: ffff90482c1203c0
[ 5505.399246] FS:  0000000000000000(0000) GS:ffff904c2fc80000(0000) knlGS:0000000000000000
[ 5505.408621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5505.415739] CR2: 00007f5d27006010 CR3: 0000000058c10006 CR4: 00000000001626e0
[ 5505.424547] Call Trace:
[ 5505.428429]  idr_alloc_u32+0x7b/0xc0
[ 5505.433803]  mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core]
[ 5505.441354]  ? mlx5_fc_create+0x23c/0x370 [mlx5_core]
[ 5505.448225]  mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core]
[ 5505.456278]  ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core]
[ 5505.464532]  nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table]
[ 5505.472286]  ? __kmalloc+0x217/0x2f0
[ 5505.477093]  ? flow_rule_alloc+0x1c/0x30
[ 5505.482117]  flow_offload_work_handler+0x1d0/0x290 [nf_flow_table]
[ 5505.489674]  ? process_one_work+0x17c/0x580
[ 5505.494922]  process_one_work+0x202/0x580
[ 5505.500082]  ? process_one_work+0x17c/0x580
[ 5505.505696]  worker_thread+0x4c/0x3f0
[ 5505.510458]  kthread+0x103/0x140
[ 5505.514989]  ? process_one_work+0x580/0x580
[ 5505.520616]  ? kthread_bind+0x10/0x10
[ 5505.525837]  ret_from_fork+0x3a/0x50
[ 5505.570841] ---[ end trace 07995de9c56d6831 ]---

This happens from parallel deletes/adds to idr, as idr isn't protected.
Fix that by using xarray as the tuple_ids allocator instead of idr.

Fixes: 7da182a998 ("netfilter: flowtable: Use work entry per offload command")
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:30:21 -07:00
Niklas Schnelle
a019b36123 net/mlx5: Fix failing fw tracer allocation on s390
On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
allocation as done for the firmware tracer will always fail.

Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
that copies the debug data into the trace array and there is no need for
the allocation to be contiguous in physical memory. We can therefor use
kvzalloc() instead of kzalloc() and get rid of the large contiguous
allcoation.

Fixes: f53aaa31cc ("net/mlx5: FW tracer, implement tracer logic")
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:30:21 -07:00
Hu Haowen
6533380dfd net/mlx5: improve some comments
Replaced "its" with "it's".

Signed-off-by: Hu Haowen <xianfengting221@163.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:21 -07:00
Parav Pandit
c89da067a2 net/mlx5: Read embedded cpu bit only once
Embedded CPU bit doesn't change with PCI resume/suspend.
Hence read it only once while probing the PCI device.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:21 -07:00
Maxim Mikityanskiy
fa3748775b net/mlx5e: Handle errors from netif_set_real_num_{tx,rx}_queues
netif_set_real_num_tx_queues and netif_set_real_num_rx_queues may fail.
Now that mlx5e supports handling errors in the preactivate hook, this
commit leverages that functionality to handle errors from those
functions and roll back all changes on failure.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:20 -07:00
Roi Dayan
d7a42ad062 net/mlx5e: Allow partial data mask for tunnel options
We use mapping to save and restore the tunnel options.
Save also the tunnel options mask.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:20 -07:00
Tariq Toukan
82fe299641 net/mlx5e: Set of completion request bit should not clear other adjacent bits
In notify HW (ring doorbell) flow, we set the bit to request a completion
on the TX descriptor.
When doing so, we should not unset other bits in the same byte.
Currently, this does not fix a real issue, as we still don't have a flow
where both MLX5_WQE_CTRL_CQ_UPDATE and any adjacent bit are set together.

Fixes: 542578c679 ("net/mlx5e: Move helper functions to a new txrx datapath header")
Fixes: 864b2d7153 ("net/mlx5e: Generalize tx helper functions for different SQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:20 -07:00
Raed Salem
7dfee4b1d7 net/mlx5: IPsec, Refactor SA handle creation and destruction
Currently the SA handle is created and managed as part of the common
code for different IPsec supporting HW, this handle is passed to HW
to be used on Rx to identify the SA handle that was used to
return the xfrm state to stack.

The above implementation pose a limitation on managing this handle.

Refactor by moving management of this field to the specific HW code.

Downstream patches will introduce the Connect-X support for IPsec that
will use this handle differently than current implementation.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:20 -07:00
Raed Salem
0aab3e1b04 net/mlx5e: IPSec, Expose IPsec HW stat only for supporting HW
The current HW counters are supported only by Innova, split the ipsec
stats group into two groups, one for HW and one for SW. And expose
the HW counters to ethtool only if Innova HW is used for IPsec offload.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:19 -07:00
Raed Salem
1dbd51d0a7 net/mlx5: Refactor mlx5_accel_esp_create_hw_context parameter list
Currently the FPGA IPsec is the only hw implementation of the IPsec
acceleration api, and so the mlx5_accel_esp_create_hw_context was
wrongly made to suit this HW api, among other in its parameter list
and some of its parameter endianness.

This implementation might not be suitable for different HW.

Refactor by group and pass all function arguments of
mlx5_accel_esp_create_hw_context in common mlx5_accel_esp_xfrm_attrs
struct field of mlx5_accel_esp_xfrm struct and correct the endianness
according to the HW being called.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:19 -07:00
Raed Salem
9425c595bd net/mlx5e: en_accel, Add missing net/geneve.h include
The cited commit relies on include <net/geneve.h> being included
implicitly prior to include "en_accel/en_accel.h".
This mandates that all files that needs to include en_accel.h
to redantantly include net/geneve.h.

Include net/geneve.h explicitly at "en_accel/en_accel.h" to avoid
undesired constrain as above.

Fixes: e3cfc7e6b7 ("net/mlx5e: TX, Add geneve tunnel stateless offload support")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:19 -07:00
Raed Salem
8c8eea07c1 net/mlx5: Use the correct IPsec capability function for FPGA ops
Currently the IPsec acceleration capability function is also used
at IPsec fpga capable device code.

This could cause a future bug as the acceleration layer is agnostic
to the device implementing its API.

Fix by using the IPsec FPGA capability function instead of acceleration
layer capability function in case of FPGA IPsec only related operations.

Downstream patches will add support for Connect-X IPsec, this can avoid
a future bug.

Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-20 14:20:19 -07:00
Ido Schimmel
b7f03b0b2a mlxsw: reg: Increase register field length to 13 bits
The Infrastructure Entry Delete Register (IEDR) is used to delete
entries stored in the KVD linear database. Currently, it is only
possible to delete entries of size up to 2048. Future firmware versions
will support deletion of entries of size up to 4096.

Increase the size of the field so that the driver will be able to
perform such deletions in the future, when required.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20 11:57:33 -07:00
Ido Schimmel
cec2500d44 mlxsw: spectrum_router: Re-increase scale of IPv6 nexthop groups
As explained in commit fc25996e6f ("mlxsw: spectrum_router: Increase
scale of IPv6 nexthop groups"), each nexthop group is hashed by XOR-ing
the interface indexes of all the member nexthop devices.

To avoid many different nexthop groups ending up using the same key, the
above commit started hashing the interface indexes themselves before
they are XOR-ed.

However, in cases in which there are many nexthop groups that all use
the same nexthop device and only differ in the gateway IP, we can still
end up in a situation in which all the groups are using the same key.
This eventually leads to -EBUSY error from rhashtable during insertion.

Improve the situation by also making the gateway IP part of the key.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-20 11:57:33 -07:00
Mark Zhang
59e9e8e4fe net/mlx5: Enable SW-defined RoCEv2 UDP source port
When this is enabled, UDP source port for RoCEv2 packets are defined
by software instead of firmware.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:57:53 +03:00
Leon Romanovsky
a2a322f447 net/mlx5: Refactor HCA capability set flow
Reduce the amount of kzalloc/kfree cycles by allocating
command structure in the parent function and leverage the
knowledge that set_caps() is called for HCA capabilities
only with specific HW structure as parameter to calculate
mailbox size.

Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:57:52 +03:00
Leon Romanovsky
333fbaa025 net/mlx5: Move QP logic to mlx5_ib
The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:21 +03:00
Leon Romanovsky
9c275ee4ad net/mlx5: Delete not-used cmd header
The structures defined in the cmd header are not used and can be safely
removed from the driver. This patch removes that file and deletes all
relevant includes.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
66247fbb28 net/mlx5: Remove Q counter low level helper APIs
mlx5 core users are encouraged to use low level API (mlx5_cmd_exec)
without the need of helper functions, do this for q counters, remove
helper functions and call mlx5_cmd_exec directly from users.

This will help reduce the total amount of code and reduction of the
mlx5_core symbol table.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:20 +03:00
Leon Romanovsky
57a6c5e992 net/mlx5: Replace hand written QP context struct with automatic getters
By changing debugfs to not use any QP related API, convert the debugfs
code to use MLX5_GET() directly to access QP context instead of hand
written struct.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:19 +03:00
Leon Romanovsky
f93f4f4f31 net/mlx5: Remove extra indirection while storing QPN
The FPGA, SW steering and IPoIB need to have only QPN from the
mlx5_core_qp struct, so reduce memory footprint by storing QPN
directly.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:19 +03:00
Leon Romanovsky
a452e0e436 net/mlx5: Open-code modify QP in the IPoIB module
Remove dependency on qp.c from the IPoIB by open coding
modify QP interface.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:19 +03:00
Leon Romanovsky
a6532fd925 net/mlx5: Open-code modify QP in the FPGA module
Remove dependency on qp.c from the FPGA by open coding
modify QP interface.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:18 +03:00
Leon Romanovsky
acab4b88e9 net/mlx5: Open-code modify QP in steering module
Remove dependency on qp.c from SW steering by open
coding modify QP interface.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:18 +03:00
Leon Romanovsky
73a75b96fc net/mlx5: Remove empty QP and CQ events handlers
The QP and CQ events functions do nothing except printing some debug
messages. There is nothing to do with this knowledge and such events,
so remove them.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:18 +03:00
Leon Romanovsky
ec44e72b73 net/mlx5: Open-code create and destroy QP calls
FPGA, IPoIB and SW steering don't need anything from the
mlx5_core_create_qp() and mlx5_core_destroy_qp() except calls
to mlx5_cmd_exec().

Let's open-code it, so we will be able to move qp.c to mlx5_ib.

Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-04-19 15:53:18 +03:00
Eric Dumazet
310660a14b net/mlx4_en: avoid indirect call in TX completion
Commit 9ecc2d8617 ("net/mlx4_en: add xdp forwarding and data write support")
brought another indirect call in fast path.

Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
when/if CONFIG_RETPOLINE=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Willem de Bruijn <willemb@google.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-18 15:30:22 -07:00
Paul Blakey
9808dd0a2a net/mlx5e: CT: Use rhashtable's ct entries instead of a separate list
Fixes CT entries list corruption.

After allowing parallel insertion/removals in upper nf flow table
layer, unprotected ct entries list can be corrupted by parallel add/del
on the same flow table.

CT entries list is only used while freeing a ct zone flow table to
go over all the ct entries offloaded on that zone/table, and flush
the table.

As rhashtable already provides an api to go over all the inserted entries,
fix the race by using the rhashtable iteration instead, and remove the list.

Fixes: 7da182a998 ("netfilter: flowtable: Use work entry per offload command")
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:54 -07:00
Parav Pandit
230a1bc247 net/mlx5e: Fix devlink port netdev unregistration sequence
In cited commit netdevice is registered after devlink port.

Unregistration flow should be mirror sequence of registration flow.
Hence, unregister netdevice before devlink port.

Fixes: 31e87b39ba ("net/mlx5e: Fix devlink port register sequence")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:51 -07:00
Parav Pandit
7482d9cb5b net/mlx5e: Fix pfnum in devlink port attribute
Cited patch missed to extract PCI pf number accurately for PF and VF
port flavour. It considered PCI device + function number.
Due to this, device having non zero device number shown large pfnum.

Hence, use only PCI function number; to avoid similar errors, derive
pfnum one time for all port flavours.

Fixes: f60f315d33 ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs")
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:49 -07:00
Roi Dayan
d5a3c2b640 net/mlx5e: Fix missing pedit action after ct clear action
With ct clear action we should not allocate the action in hw
and not release the mod_acts parsed in advance.
It will be done when handling the ct clear action.

Fixes: 1ef3018f5a ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:46 -07:00
Dmytro Linkin
70f478ca08 net/mlx5e: Fix nest_level for vlan pop action
Current value of nest_level, assigned from net_device lower_level value,
does not reflect the actual number of vlan headers, needed to pop.
For ex., if we have untagged ingress traffic sended over vlan devices,
instead of one pop action, driver will perform two pop actions.
To fix that, calculate nest_level as difference between vlan device and
parent device lower_levels.

Fixes: f3b0a18bb6 ("net: remove unnecessary variables and callback")
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:44 -07:00
Eran Ben Elisha
d19987ccf5 net/mlx5e: Add missing release firmware call
Once driver finishes flashing the firmware image, it should release it.

Fixes: 9c8bca2637 ("mlx5: Move firmware flash implementation to devlink")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:41 -07:00
Eli Cohen
84be2fdae4 net/mlx5: Fix condition for termination table cleanup
When we destroy rules from slow path we need to avoid destroying
termination tables since termination tables are never created in slow
path. By doing so we avoid destroying the termination table created for the
slow path.

Fixes: d8a2034f15 ("net/mlx5: Don't use termination tables in slow path")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:38 -07:00
Moshe Shemesh
8c702a53bb net/mlx5: Fix frequent ioread PCI access during recovery
High frequency of PCI ioread calls during recovery flow may cause the
following trace on powerpc:

[ 248.670288] EEH: 2100000 reads ignored for recovering device at
location=Slot1 driver=mlx5_core pci addr=0000:01:00.1
[ 248.670331] EEH: Might be infinite loop in mlx5_core driver
[ 248.670361] CPU: 2 PID: 35247 Comm: kworker/u192:11 Kdump: loaded
Tainted: G OE ------------ 4.14.0-115.14.1.el7a.ppc64le #1
[ 248.670425] Workqueue: mlx5_health0000:01:00.1 health_recover_work
[mlx5_core]
[ 248.670471] Call Trace:
[ 248.670492] [c00020391c11b960] [c000000000c217ac] dump_stack+0xb0/0xf4
(unreliable)
[ 248.670548] [c00020391c11b9a0] [c000000000045818]
eeh_check_failure+0x5c8/0x630
[ 248.670631] [c00020391c11ba50] [c00000000068fce4]
ioread32be+0x114/0x1c0
[ 248.670692] [c00020391c11bac0] [c00800000dd8b400]
mlx5_error_sw_reset+0x160/0x510 [mlx5_core]
[ 248.670752] [c00020391c11bb60] [c00800000dd75824]
mlx5_disable_device+0x34/0x1d0 [mlx5_core]
[ 248.670822] [c00020391c11bbe0] [c00800000dd8affc]
health_recover_work+0x11c/0x3c0 [mlx5_core]
[ 248.670891] [c00020391c11bc80] [c000000000164fcc]
process_one_work+0x1bc/0x5f0
[ 248.670955] [c00020391c11bd20] [c000000000167f8c]
worker_thread+0xac/0x6b0
[ 248.671015] [c00020391c11bdc0] [c000000000171618] kthread+0x168/0x1b0
[ 248.671067] [c00020391c11be30] [c00000000000b65c]
ret_from_kernel_thread+0x5c/0x80

Reduce the PCI ioread frequency during recovery by using msleep()
instead of cond_resched()

Fixes: 3e5b72ac2f ("net/mlx5: Issue SW reset on FW assert")
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-04-08 15:46:36 -07:00
Linus Torvalds
479a72c0c6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Slave bond and team devices should not be assigned ipv6 link local
    addresses, from Jarod Wilson.

 2) Fix clock sink config on some at803x PHY devices, from Oleksij
    Rempel.

 3) Uninitialized stack space transmitted in slcan frames, fix from
    Richard Palethorpe.

 4) Guard HW VLAN ops properly in stmmac driver, from Jose Abreu.

 5) "=" --> "|=" fix in aquantia driver, from Colin Ian King.

 6) Fix TCP fallback in mptcp, from Florian Westphal. (accessing a plain
    tcp_sk as if it were an mptcp socket).

 7) Fix cavium driver in some configurations wrt. PTP, from Yue Haibing.

 8) Make ipv6 and ipv4 consistent in the lower bound allowed for
    neighbour entry retrans_time, from Hangbin Liu.

 9) Don't use private workqueue in pegasus usb driver, from Petko
    Manolov.

10) Fix integer overflow in mlxsw, from Colin Ian King.

11) Missing refcnt init in cls_tcindex, from Cong Wang.

12) One too many loop iterations when processing cmpri entries in ipv6
    rpl code, from Alexander Aring.

13) Disable SG and TSO by default in r8169, from Heiner Kallweit.

14) NULL deref in macsec, from Davide Caratti.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
  macsec: fix NULL dereference in macsec_upd_offload()
  skbuff.h: Improve the checksum related comments
  net: dsa: bcm_sf2: Ensure correct sub-node is parsed
  qed: remove redundant assignment to variable 'rc'
  wimax: remove some redundant assignments to variable result
  mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
  mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_PRIORITY
  r8169: change back SG and TSO to be disabled by default
  net: dsa: bcm_sf2: Do not register slave MDIO bus with OF
  ipv6: rpl: fix loop iteration
  tun: Don't put_page() for all negative return values from XDP program
  net: dsa: mt7530: fix null pointer dereferencing in port5 setup
  mptcp: add some missing pr_fmt defines
  net: phy: micrel: kszphy_resume(): add delay after genphy_resume() before accessing PHY registers
  net_sched: fix a missing refcnt in tcindex_init()
  net: stmmac: dwmac1000: fix out-of-bounds mac address reg setting
  mlxsw: spectrum_trap: fix unintention integer overflow on left shift
  pegasus: Remove pegasus' own workqueue
  neigh: support smaller retrans_time settting
  net: openvswitch: use hlist_for_each_entry_rcu instead of hlist_for_each_entry
  ...
2020-04-07 12:03:32 -07:00
Petr Machata
ccfc569347 mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_VLAN_MANGLE
The handler for FLOW_ACTION_VLAN_MANGLE ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.

Fixes: a150201a70 ("mlxsw: spectrum: Add support for vlan modify TC action")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-06 10:14:00 -07:00
Petr Machata
0be0ae1441 mlxsw: spectrum_flower: Do not stop at FLOW_ACTION_PRIORITY
The handler for FLOW_ACTION_PRIORITY ends by returning whatever the
lower-level function that it calls returns. If there are more actions lined
up after this action, those are never offloaded. Fix by only bailing out
when the called function returns an error.

Fixes: 463957e3fb ("mlxsw: spectrum_flower: Offload FLOW_ACTION_PRIORITY")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-06 10:14:00 -07:00
Colin Ian King
468c2a1002 mlxsw: spectrum_trap: fix unintention integer overflow on left shift
Shifting the integer value 1 is evaluated using 32-bit
arithmetic and then used in an expression that expects a 64-bit
value, so there is potentially an integer overflow. Fix this
by using the BIT_ULL macro to perform the shift and avoid the
overflow.

Addresses-Coverity: ("Unintentional integer overflow")
Fixes: 13f2e64b94 ("mlxsw: spectrum_trap: Add devlink-trap policer support")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-02 18:00:01 -07:00
Linus Torvalds
919dce2470 RDMA 5.7 pull request
The majority of the patches are cleanups, refactorings and clarity
 improvements
 
 - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1
 
 - Lots of cleanup patches for hns
 
 - Convert more places to use refcount
 
 - Aggressively lock the RDMA CM code that syzkaller says isn't working
 
 - Work to clarify ib_cm
 
 - Use the new ib_device lifecycle model in bnxt_re
 
 - Fix mlx5's MR cache which seems to be failing more often with the new
   ODP code
 
 - mlx5 'dynamic uar' and 'tx steering' user interfaces
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6CSr0ACgkQOG33FX4g
 mxrtKg//XovbOfYAO7nC05FtGz9iEkIUBiwQOjUojgNSi6RMNDqRW1bmqKUugm1o
 9nXA6tw+fueEvUNSD541SCcxkUZJzWvubO9wHB6N3Fgy68N3Vf2rKV3EBBTh99rK
 Cb7rnmTTN6izRyI1wdyP2sjDJGyF8zvsgIrG2sibzLnqlyScrnD98YS0FdPZUfOa
 a1mmXBN/T7eaQ4TbE3lLLzGnifRlGmZ5vxEvOQmAlOdqAfIKQdbbW7oCRLVIleso
 gfQlOOvIgzHRwQ3VrFa3i6ETYtywXq7EgmQxCjqPVJQjCA79n5TLBkP1iRhvn8xi
 3+LO4YCkiSJ/NjTA2d9KwT6K4djj3cYfcueuqo2MoXXr0YLiY6TLv1OffKcUIq7c
 LM3d4CSwIAG+C2FZwaQrdSGa2h/CNfLAEeKxv430zggeDNKlwHJPV5w3rUJ8lT56
 wlyT7Lzosl0O9Z/1BSLYckTvbBCtYcmanVyCfHG8EJKAM1/tXy5LS8btJ3e51rPu
 XekR9ELrTTA2CTuoSCQGP6J0dBD2U7qO4XRCQ9N5BYLrI6RdP7Z4xYzzey49Z3Cx
 JaF86eurM7nS5biUszTtwww8AJMyYicB+0VyjBfk+mhv90w8tS1vZ1aZKzaQ1L6Z
 jWn8WgIN4rWY0YGQs6PiovT1FplyGs3p1wNmjn92WO0wZZ3WsmQ=
 =ae+a
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "The majority of the patches are cleanups, refactorings and clarity
  improvements.

  This cycle saw some more activity from Syzkaller, I think we are now
  clean on all but one of those bugs, including the long standing and
  obnoxious rdma_cm locking design defect. Continue to see many drivers
  getting cleanups, with a few new user visible features.

  Summary:

   - Various driver updates for siw, bnxt_re, rxe, efa, mlx5, hfi1

   - Lots of cleanup patches for hns

   - Convert more places to use refcount

   - Aggressively lock the RDMA CM code that syzkaller says isn't
     working

   - Work to clarify ib_cm

   - Use the new ib_device lifecycle model in bnxt_re

   - Fix mlx5's MR cache which seems to be failing more often with the
     new ODP code

   - mlx5 'dynamic uar' and 'tx steering' user interfaces"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (144 commits)
  RDMA/bnxt_re: make bnxt_re_ib_init static
  IB/qib: Delete struct qib_ivdev.qp_rnd
  RDMA/hns: Fix uninitialized variable bug
  RDMA/hns: Modify the mask of QP number for CQE of hip08
  RDMA/hns: Reduce the maximum number of extend SGE per WQE
  RDMA/hns: Reduce PFC frames in congestion scenarios
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
  IB/hfi1: Call kobject_put() when kobject_init_and_add() fails
  IB/hfi1: Fix memory leaks in sysfs registration and unregistration
  IB/mlx5: Move to fully dynamic UAR mode once user space supports it
  IB/mlx5: Limit the scope of struct mlx5_bfreg_info to mlx5_ib
  IB/mlx5: Extend QP creation to get uar page index from user space
  IB/mlx5: Extend CQ creation to get uar page index from user space
  IB/mlx5: Expose UAR object and its alloc/destroy commands
  IB/hfi1: Get rid of a warning
  RDMA/hns: Remove redundant judgment of qp_type
  RDMA/hns: Remove redundant assignment of wc->smac when polling cq
  RDMA/hns: Remove redundant qpc setup operations
  RDMA/hns: Remove meaningless prints
  ...
2020-04-01 18:18:18 -07:00
Ido Schimmel
39defcbba0 mlxsw: spectrum_trap: Add support for setting of packet trap group parameters
Implement support for setting of packet trap group parameters by
invoking the trap_group_init() callback with the new parameters.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
d12d846821 mlxsw: spectrum_trap: Switch to use correct packet trap group
Some packet traps are currently exposed to user space as being member of
"l3_drops" trap group, but internally they are member of a different
group.

Switch these traps to use the correct group so that they are all subject
to the same policer, as exposed to user space.

Set the trap priority of packets trapped due to loopback error during
routing to the lowest priority. Such packets are not routed again by the
kernel and therefore should not mask other traps (e.g., host miss) that
should be routed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
bc82521e3b mlxsw: spectrum_trap: Do not initialize dedicated discard policer
The policer is now initialized as part of the registration with devlink,
so there is no need to initialize it before the registration.

Remove the initialization.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
13f2e64b94 mlxsw: spectrum_trap: Add devlink-trap policer support
Register supported packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.

Prevent user space from passing invalid policer parameters down to the
device by checking their validity and communicating the failure via an
appropriate extack message.

v2:
* Remove the max/min validity checks from __mlxsw_sp_trap_policer_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
4561705b17 mlxsw: spectrum_trap: Prepare policers for registration with devlink
Prepare an array of policer IDs to register with devlink and their
associated parameters.

The array is composed from both policers that are currently bound to
exposed trap groups and policers that are not bound to any trap group.

v2:
* Provide max/min rate/burst size when registering policers

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
03484e49e7 mlxsw: spectrum: Track used packet trap policer IDs
During initialization the driver configures various packet trap groups
and binds policers to them.

Currently, most of these groups are not exposed to user space and
therefore their policers should not be exposed as well. Otherwise, user
space will be able to alter policer parameters without knowing which
packet traps are policed by the policer.

Use a bitmap to track the used policer IDs so that these policers will
not be registered with devlink in a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
2b84d7c3f6 mlxsw: reg: Extend QPCR register
The QoS Policer Configuration Register (QPCR) is used to configure
hardware policers. Extend this register with following fields and
defines which will be used by subsequent patches:

1. Violate counter: reads number of packets dropped by the policer
2. Clear counter: to ensure we start counting from 0
3. Rate and burst size limits

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:59 -07:00
Ido Schimmel
f9f54392d2 devlink: Add packet trap group parameters support
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.

In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.

Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.

The next patch will enable user space to change this default binding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:58 -07:00
Ido Schimmel
ea315c5507 mlxsw: spectrum_ptp: Fix build warnings
Cited commit extended the enums 'hwtstamp_tx_types' and
'hwtstamp_rx_filters' with values that were not accounted for in the
switch statements, resulting in the build warnings below.

Fix by adding a default case.

drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c: In function ‘mlxsw_sp_ptp_get_message_types’:
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:915:2: warning: enumeration value ‘__HWTSTAMP_TX_CNT’ not handled in switch [-Wswitch]
  915 |  switch (tx_type) {
      |  ^~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.c:927:2: warning: enumeration value ‘__HWTSTAMP_FILTER_CNT’ not handled in switch [-Wswitch]
  927 |  switch (rx_filter) {
      |  ^~~~~~

Fixes: f76510b458 ("ethtool: add timestamping related string sets")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:20:33 -07:00
Eran Ben Elisha
ba7d16c779 devlink: Implicitly set auto recover flag when registering health reporter
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.

Remove this flag from all drivers that called
devlink_health_reporter_create.

All existing health reporters set auto recovery to true if they have a
recover method.

Yet, administrator can unset auto recover via netlink command as prior to
this patch.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:17:34 -07:00
Jiri Pirko
93a129eb8c net: sched: expose HW stats types per action used by drivers
It may be up to the driver (in case ANY HW stats is passed) to select
which type of HW stats he is going to use. Add an infrastructure to
expose this information to user.

$ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop
$ tc -s filter show dev enp3s0np1 ingress
filter protocol ip pref 1 flower chain 0
filter protocol ip pref 1 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.1.1
  in_hw in_hw_count 2
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 10 sec used 10 sec
        Action statistics:
        Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
        backlog 0b 0p requeues 0
        used_hw_stats immediate     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:06:49 -07:00
wenxu
07c264ab8e net/mlx5e: add mlx5e_rep_indr_setup_ft_cb support
Add mlx5e_rep_indr_setup_ft_cb to support indr block setup
in FT mode.
Both tc rules and flow table rules are of the same format,
It can re-use tc parsing for that, and move the flow table rules
to their steering domain(the specific chain_index), the indr
block offload in FT also follow this scenario.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:27 -07:00
wenxu
5a37a8df80 net/mlx5e: refactor indr setup block
Refactor indr setup block for support ft indr setup in the
next patch. The function mlx5e_rep_indr_offload exposes
'flags' in order set additional flag for FT in next patch.
Rename mlx5e_rep_indr_setup_tc_block to mlx5e_rep_indr_setup_block
and add flow_setup_cb_t callback parameters in order set the
specific callback for FT in next patch.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:24 -07:00
Saeed Mahameed
49964352ca net/mlx5: E-Switch: Move eswitch chains to a new directory
eswitch_offloads_chains.{c,h} were just introduced this kernel release
cycle, eswitch is in high development demand right now and many
features are planned to be added to it. eswitch deserves its own
directory and here we move these new files to there, in preparation for
upcoming eswitch features and new files.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-29 23:42:22 -07:00
Mark Zhang
6838a35a45 net/mlx5: Use a separate work queue for fib event handling
In VF lag mode when remove the bonding module without bring down the
bond device first, we could potentially have circular dependency when we
unload IB devices and also handle fib events:
1. The bond work starts first;
2. The "modprobe -rv bonding" process tries to release the bond device,
   with the "pernet_ops_rwsem" lock hold;
3. The bond work blocks in unregister_netdevice_notifier() and waits for
the lock because fib event came right before;
4. The kernel fib module tries to free all the fib entries by broadcasting
   the "FIB_EVENT_NH_DEL" event;
5. Upon the fib event this lag_mp module holds the fib lock and queue a
   fib work.
So:
   bond work -> modprobe task -> kernel fib module -> lag_mp -> bond work

Today we either reload IB devices in roce lag in nic mode or either handle
fib events in switchdev mode, but a new feature could change that we'll
need to reload IB devices also in switchdev mode so this is a future proof
fix as one may not notice this later.

Signed-off-by: Mark Zhang <markz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:20 -07:00
Saeed Mahameed
e999a7343d Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-29 23:42:11 -07:00
Jacob Keller
41b145024c mlx4: fix "initializer element not constant" compiler error
A recent commit e893768179 ("devlink: prepare to support region
operations") used the region_cr_space_str and region_fw_health_str
variables as initializers for the devlink_region_ops structures.

This can result in compiler errors:
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: error: initializer
element is not constant
   .name = region_cr_space_str,
           ^
drivers/net/ethernet/mellanox//mlx4/crdump.c:45:10: note: (near
initialization for ‘region_cr_space_ops.name’)
drivers/net/ethernet/mellanox//mlx4/crdump.c:50:10: error: initializer
element is not constant
   .name = region_fw_health_str,

The variables were made to be "const char * const", indicating that both
the pointer and data were constant. This was enough to resolve this on
recent GCC (gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1) for this author).

Unfortunately this is not enough for older compilers to realize that the
variable can be treated as a constant expression.

Fix this by introducing macros for the string and use those instead of
the variable name in the region ops structures.

Reported-by: tanhuazhong <tanhuazhong@huawei.com>
Fixes: e893768179 ("devlink: prepare to support region operations")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 22:09:39 -07:00
David S. Miller
f0b5989745 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor comment conflict in mac80211.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-29 21:25:29 -07:00
Ido Schimmel
a84acf7830 mlxsw: spectrum_router: Avoid uninitialized symbol errors
Suppress the following smatch errors. None of these are actually
possible with current code paths.

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol 'saddrp'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1220
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_len'.
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1221
mlxsw_sp_ipip_entry_find_decap() error: uninitialized symbol
'saddr_prefix_len'.

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:1390
mlxsw_sp_netdevice_ipip_ol_reg_event() error: uninitialized symbol
'ipipt'.

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3255
mlxsw_sp_nexthop_group_update() error: uninitialized symbol 'err'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Ido Schimmel
cfe9701a25 mlxsw: switchx2: Remove unnecessary conversion to bool
Suppress following warning from coccinelle:

drivers/net/ethernet/mellanox/mlxsw//switchx2.c:183:63-68: WARNING:
conversion to bool not needed here

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Ido Schimmel
e1da9618b7 mlxsw: core_acl: Avoid defining static variable in header file
The static array 'mlxsw_afk_element_infos' in 'core_acl_flex_keys.h' is
copied to each file that includes the header, but not all use it. This
results in the following warnings when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//core_acl_flex_keys.h:76:44:
warning: ‘mlxsw_afk_element_infos’ defined but not used
[-Wunused-const-variable=]

One way to suppress the warning is to mark the array with
'__maybe_unused', but another option is to remove it from the header
file entirely.

Change 'struct mlxsw_afk_element_inst' to store the key to the array
('element') instead of the array value keyed by 'element'. Adjust the
different users accordingly.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Ido Schimmel
bdb373cf5b mlxsw: spectrum: Remove unused RIF and FID families
In merge commit 50853808ff ("Merge branch
'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") I flipped mlxsw to use
emulated 802.1Q FIDs and correspondingly emulated VLAN RIFs. This means
that the non-emulated variants are no longer used. Remove them and
suppress the following warnings when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:7572:38: warning:
‘mlxsw_sp_rif_vlan_ops’ defined but not used [-Wunused-const-variable=]

drivers/net/ethernet/mellanox/mlxsw//spectrum_fid.c:584:41: warning:
‘mlxsw_sp_fid_8021q_family’ defined but not used
[-Wunused-const-variable=]

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Ido Schimmel
f0a66984c1 mlxsw: spectrum_router: Add proper function documentation
Suppress following warnings when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'mlxsw_sp' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'ipip_entry' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'
drivers/net/ethernet/mellanox/mlxsw//spectrum_router.c:1552: warning:
Function parameter or member 'extack' not described in
'__mlxsw_sp_ipip_entry_update_tunnel'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Ido Schimmel
5769e39c6a mlxsw: i2c: Add missing field documentation
Suppress following warning when compiling with W=1:

drivers/net/ethernet/mellanox/mlxsw//i2c.c:78: warning: Function
parameter or member 'cmd' not described in 'mlxsw_i2c'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-27 15:06:43 -07:00
Jason Gunthorpe
dbdf8909d0 Merge branch 'mlx5_tx_steering' into rdma.git for-next
Leon Romanovsky says:

====================
Those two patches from Michael extends mlx5_core and mlx5_ib flow steering
to support RDMA TX in similar way to already supported RDMA RX.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_tx_steering':
  RDMA/mlx5: Add support for RDMA TX flow table
  net/mlx5: Add support for RDMA TX steering
2020-03-27 13:26:59 -03:00
Michael Guralnik
24670b1a31 net/mlx5: Add support for RDMA TX steering
Add new RDMA TX flow steering namespace. Flow steering rules in
this namespace are used to filter transmitted RDMA traffic.

Link: https://lore.kernel.org/r/20200324061425.1570190-2-leon@kernel.org
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-03-27 13:24:48 -03:00
Jacob Keller
12102436ac devlink: track snapshot id usage count using an xarray
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.

There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.

The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.

The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.

Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.

By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.

With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.

This work also paves the way to introduce a mechanism for userspace to
request a snapshot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
7ef19d3b1d devlink: report error once U32_MAX snapshot ids have been used
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.

A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.

This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.

Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
a0a09f6bb2 devlink: convert snapshot destructor callback to region op
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.

This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.

Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
e893768179 devlink: prepare to support region operations
Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Ido Schimmel
f6bf1bafdc mlxsw: spectrum_mr: Fix list iteration in error path
list_for_each_entry_from_reverse() iterates backwards over the list from
the current position, but in the error path we should start from the
previous position.

Fix this by using list_for_each_entry_continue_reverse() instead.

This suppresses the following error from coccinelle:

drivers/net/ethernet/mellanox/mlxsw//spectrum_mr.c:655:34-38: ERROR:
invalid reference to the index variable of the iterator on line 636

Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 12:00:45 -07:00
Petr Machata
9b4b16bba2 mlxsw: spectrum_flower: Offload FLOW_ACTION_MANGLE
Offload action pedit ex munge when used with a flower classifier. Only
allow setting of DSCP, ECN, or the whole DSField in IPv4 and IPv6 packets.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 11:55:40 -07:00
Petr Machata
50e4ee4b92 mlxsw: core: Add DSCP, ECN, dscp_rw to QOS_ACTION
The QOS_ACTION is used for manipulating the QOS attributes of the packet.
Add the defines and helpers related to DSCP and ECN fields, and dscp_rw.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 11:55:40 -07:00
Petr Machata
571ca1f1de mlxsw: core: Rename mlxsw_afa_qos_cmd to mlxsw_afa_qos_switch_prio_cmd
The original idea was to reuse this set of actions for ECN rewrite as well,
but on second look, it's not such a great idea. These two items should each
have its own command. Rename the existing enum to make it obvious that it
belongs to switch_prio_cmd.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 11:55:40 -07:00
David S. Miller
14340219b8 mlx5-updates-2020-03-25
1) Cleanups from Dan Carpenter and wenxu.
 
 2) Paul and Roi, Some minor updates and fixes to E-Switch to address
 issues introduced in the previous reg_c0 updates series.
 
 3) Eli Cohen simplifies and improves flow steering matching group searches
 and flow table entries version management.
 
 4) Parav Pandit, improves devlink eswitch mode changes thread safety.
 By making devlink rely on driver for thread safety and introducing mlx5
 eswitch mode change protection.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl58SW8ACgkQSD+KveBX
 +j4AxQf8DdrFrBD0NFTcAILS4bnTJC0I3xKRPb/2oYtWLVyJ9G5XAZqHC0DAG7xs
 jy8xhIFbeUxgLEdcx0la5vdR1mPlzs4XBHTe99YwzwK/jojrA7YXrlb3kv+RXWVY
 uNVAby68wh4EnO61R51ahIBXLPNbiYpo/wAWKvvBKRkOcYMVTKIFiP157AnJWObY
 fxnt06I0NFaIX8Va4MEqkrmUYrI4jJcqOJC9FwRBLDhFHcFkLh0Gav3vJJ7M4BCB
 ggPJpuZ4pu43qX9TtSOm8V/GlWWN0RB7PdbvliFBEHYG21hf9MfE8bPPKRlB7CO+
 B5+9ULhpvbjX7yRrkZ7fd4zlQ1siew==
 =Flln
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-03-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-25

1) Cleanups from Dan Carpenter and wenxu.

2) Paul and Roi, Some minor updates and fixes to E-Switch to address
issues introduced in the previous reg_c0 updates series.

3) Eli Cohen simplifies and improves flow steering matching group searches
and flow table entries version management.

4) Parav Pandit, improves devlink eswitch mode changes thread safety.
By making devlink rely on driver for thread safety and introducing mlx5
eswitch mode change protection.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 11:38:48 -07:00
Parav Pandit
8e0aa4bc95 net/mlx5: E-switch, Protect eswitch mode changes
Currently eswitch mode change is occurring from 2 different execution
contexts as below.
1. sriov sysfs enable/disable
2. devlink eswitch set commands

Both of them need to access eswitch related data structures in
synchronized manner.
Without any synchronization below race condition exist.

SR-IOV enable/disable with devlink eswitch mode change:

          cpu-0                         cpu-1
          -----                         -----
mlx5_device_disable_sriov()        mlx5_devlink_eswitch_mode_set()
  mlx5_eswitch_disable()             esw_offloads_stop()
    esw_offloads_disable()             mlx5_eswitch_disable()
                                         esw_offloads_disable()

Hence, they are synchronized using a new mode_lock.
eswitch's state_lock is not used as it can lead to a deadlock scenario
below and state_lock is only for vport and fdb exclusive access.

ip link set vf <param>
  netlink rcv_msg() - Lock A
    rtnl_lock
    vfinfo()
      esw->state_lock() - Lock B
devlink eswitch_set
   devlink_mutex
     esw->state_lock() - Lock B
       attach_netdev()
         register_netdev()
           rtnl_lock - Lock A

Alternatives considered:
1. Acquiring rtnl lock before taking esw->state_lock to follow similar
locking sequence as ip link flow during eswitch mode set.
rtnl lock is not good idea for two reasons.
(a) Holding rtnl lock for several hundred device commands is not good
    idea.
(b) It leads to below and more similar deadlocks.

devlink eswitch_set
   devlink_mutex
     rtnl_lock - Lock A
       esw->state_lock() - Lock B
         eswitch_disable()
           reload()
             ib_register_device()
               ib_cache_setup_one()
                 rtnl_lock()

2. Exporting devlink lock may lead to undesired use of it in vendor
driver(s) in future.

3. Unloading representors outside of the mode_lock requires
serialization with other process trying to enable the eswitch.

4. Differing the representors life cycle to a different workqueue
requires synchronization with func_change_handler workqueue.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:25 -07:00
Parav Pandit
ebf77bb83f net/mlx5: E-switch, Extend eswitch enable to handle num_vfs change
Subsequent patch protects eswitch mode changes across sriov and devlink
interfaces. It is desirable for eswitch to provide thread safe eswitch
enable and disable APIs.
Hence, extend eswitch enable API to optionally update num_vfs when
requested.

In subsequent patch, eswitch num_vfs are updated after all the eswitch
users eswitch drops its reference count.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:23 -07:00
Parav Pandit
ae24432cbc net/mlx5: Split eswitch mode check to different helper function
In order to check eswitch state under a lock, prepare code to split
capability check and eswitch state check into two helper functions.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:21 -07:00
Parav Pandit
f999b706b7 net/mlx5: Simplify mlx5_unload_one() and its callers
mlx5_unload_one() always returns 0.
Simplify callers of mlx5_unload_one() and remove the dead code.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:15 -07:00
Parav Pandit
ecd01db871 net/mlx5: Simplify mlx5_register_device to return void
mlx5_register_device() doesn't check for any error and always returns 0.
Simplify mlx5_register_device() to return void and its caller, remove
dead code related to it.

Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:13 -07:00
Eli Cohen
dc638d1122 net/mlx5: Avoid group version scan when not necessary
Group version is used when modifying a rule is allowed
(FLOW_ACT_NO_APPEND is clear) to detect a case where the rule was found
but while the groups where unlocked a new FTE was added. In this case,
the added FTE could be one with identical match value so we need to
attempt again with group lock held.

Change the code so version is retrieved only when FLOW_ACT_NO_APPEND is
cleared. As result, later compare can also be avoided if FLOW_ACT_NO_APPEND
is cleared.

Also improve comments text.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:11 -07:00
Eli Cohen
0aad2a0b42 net/mlx5: Avoid incrementing FTE version
FTE version is not used anywhere in the code so avoid incrementing it.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:09 -07:00
Eli Cohen
454401aeb2 net/mlx5: Fix group version management
When adding a rule to a flow group we need increment the version of the
group. Current code fails to do that and as a result, when trying to add
a rule, we will fail to discover a case where an FTE with the same match
value was added while we scanned the groups of the same match criteria,
thus we may try to add an FTE that was already added.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:06 -07:00
Eli Cohen
b820ce00e0 net/mlx5: Simplify matching group searches
Instead of using two different structs for searching groups with the
same match, use a single struct and thus simplify the code, make it more
readable and smaller size which means less code cache misses.

         text    data     bss     dec     hex
before: 35524    2744       0   38268    957c
after:  35038    2744       0   37782    9396

When testing add 70000 rules, delete all the rules, and repeat three
times taking the average, we get (time in seconds):

Before the change: insert 16.80, delete 11.02
After the change:  insert 16.55, delete 10.95

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:04 -07:00
Roi Dayan
d528d49705 net/mlx5: E-Switch, Use correct type for chain, prio and level values
The correct type is u32.

Fixes: d18296ffd9 ("net/mlx5: E-Switch, Introduce global tables")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:19:02 -07:00
Roi Dayan
c8508713c7 net/mlx5: E-Switch, free flow_group_in after creating the restore table
We allocate a temporary memory but forget to free it.

Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:18:59 -07:00
Paul Blakey
7983a675ba net/mlx5: E-Switch, Enable chains only if regs loopback is enabled
Register c0 loopback is needed to fully support chains and prios.

Enable chains and prio only if loopback (of reg c1 which came together
with c0), is enabled. To be able to check that, move enabling of loopback
before eswitch chains init.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:18:57 -07:00
Paul Blakey
60acc105cb net/mlx5: E-Switch, Enable restore table only if reg_c1 is supported
Reg c0/c1 matching, rewrite of regs c0/c1, and copy header of regs c1,B
is needed for the restore table to function, might not be supported by
firmware, and creation of the restore table or the copy header will
fail.

Check reg_c1 loopback support, as firmware which supports this,
should have all of the above.

Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:18:54 -07:00
wenxu
046826c878 net/mlx5e: remove duplicated check chain_index in mlx5e_rep_setup_ft_cb
The function mlx5e_rep_setup_ft_cb check chain_index is zero twice.

Signed-off-by: wenxu <wenxu@ucloud.cn>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:18:52 -07:00
Dan Carpenter
49397b8012 net/mlx5e: Fix actions_match_supported() return
The actions_match_supported() function returns a bool, true for success
and false for failure.  This error path is returning a negative which
is cast to true but it should return false.

Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25 23:18:49 -07:00
David S. Miller
9fb16955fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25 18:58:11 -07:00
Aya Levin
187a9830c9 net/mlx5e: Do not recover from a non-fatal syndrome
For non-fatal syndromes like LOCAL_LENGTH_ERR, recovery shouldn't be
triggered. In these scenarios, the RQ is not actually in ERR state.
This misleads the recovery flow which assumes that the RQ is really in
error state and no more completions arrive, causing crashes on bad page
state.

Fixes: 8276ea1353 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24 14:43:07 -07:00
Aya Levin
e239c6d686 net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
In striding RQ mode, the buffers of an RX WQE are first
prepared and posted to the HW using a UMR WQEs via the ICOSQ.
We maintain the state of these in-progress WQEs in the RQ
SW struct.

In the flow of ICOSQ recovery, the corresponding RQ is not
in error state, hence:

- The buffers of the in-progress WQEs must be released
  and the RQ metadata should reflect it.
- Existing RX WQEs in the RQ should not be affected.

For this, wrap the dealloc of the in-progress WQEs in
a function, and use it in the ICOSQ recovery flow
instead of mlx5e_free_rx_descs().

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24 14:43:05 -07:00
Aya Levin
39369fd536 net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
When resetting the RQ (moving RQ state from RST to RDY), the driver
resets the WQ's SW metadata.
In striding RQ mode, we maintain a field that reflects the actual
expected WQ head (including in progress WQEs posted to the ICOSQ).
It was mistakenly not reset together with the WQ. Fix this here.

Fixes: 8276ea1353 ("net/mlx5e: Report and recover from CQE with error on RQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24 14:43:02 -07:00
Aya Levin
1de0306c3a net/mlx5e: Enhance ICOSQ WQE info fields
Add number of WQEBBs (WQE's Basic Block) to WQE info struct. Set the
number of WQEBBs on WQE post, and increment the consumer counter (cc)
on completion.

In case of error completions, the cc was mistakenly not incremented,
keeping a gap between cc and pc (producer counter). This failed the
recovery flow on the ICOSQ from a CQE error which timed-out waiting for
the cc and pc to meet.

Fixes: be5323c837 ("net/mlx5e: Report and recover from CQE error on ICOSQ")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24 14:43:00 -07:00
Leon Romanovsky
306f354c67 net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure
The cap_mask1 isn't protected by field_select and not listed among RW
fields, but it is required to be written to properly initialize ports
in IB virtualization mode.

Link: https://lore.kernel.org/linux-rdma/88bab94d2fd72f3145835b4518bc63dda587add6.camel@redhat.com
Fixes: ab118da4c1 ("net/mlx5: Don't write read-only fields in MODIFY_HCA_VPORT_CONTEXT command")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-24 14:42:58 -07:00
Ido Schimmel
107f167894 devlink: Only pass packet trap group identifier in trap structure
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.

Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:40:40 -07:00
Ido Schimmel
8cd999e4ef mlxsw: spectrum_trap: Explicitly register packet trap groups
Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:40:40 -07:00
Nathan Chancellor
c31f0ea737 mlxsw: spectrum_cnt: Fix 64-bit division in mlxsw_sp_counter_resources_register
When building arm32 allyesconfig:

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by spectrum_cnt.c
>>>               net/ethernet/mellanox/mlxsw/spectrum_cnt.o:(mlxsw_sp_counter_resources_register) in archive drivers/built-in.a
>>> did you mean: __aeabi_uidivmod
>>> defined in: arch/arm/lib/lib.a(lib1funcs.o)

pool_size and bank_size are u64; use div64_u64 so that 32-bit platforms
do not error.

Fixes: ab8c4cc604 ("mlxsw: spectrum_cnt: Move config validation along with resource register")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 20:57:16 -07:00
Jakub Kicinski
0dfb2d82af net: sched: rename more stats_types
Commit 53eca1f347 ("net: rename flow_action_hw_stats_types* ->
flow_action_hw_stats*") renamed just the flow action types and
helpers. For consistency rename variables, enums, struct members
and UAPI too (note that this UAPI was not in any official release,
yet).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 20:54:23 -07:00
David S. Miller
684ac83e36 mlx5-fixes-2020-03-05
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5hh6wACgkQSD+KveBX
 +j6qvQf9HQsiQ+cE1UIbM/IzyTWXeBMzjljCWFgQfvyKQjSFnoATeVl6GMQJNk7M
 ovQ7XHOlN36E/tQW/ypnwbX+btjl/mDEJsxEcvVf4gnw/QH1AwUjo291vPfrE5md
 DcWWe9Jrq2MkHeZAgIt/Tnw4GYIwQOBVmYgky3A0+azzuvxK+nXX6JJG5JqRR8Wd
 tBsN9UKok1nOt73d+TyHjGnzQHWzGBdS0vlxl0MYcD1QD66UzA0Atgz5aUQLGvTK
 Nk/IcHr3SF+0kvbtpjRxlrpi4ywD2gBNHXMJX1DDnCkqg9nWjJ5DByYDASo+I6uL
 0fMNsimp/gZG8HD0oFEVTFgaqPACUA==
 =DEqe
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2020-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2020-03-05

This series introduces some fixes to mlx5 driver.

Please pull and let me know if there is any problem.

For -stable v5.4
 ('net/mlx5: DR, Fix postsend actions write length')

For -stable v5.5
 ('net/mlx5e: kTLS, Fix TCP seq off-by-1 issue in TX resync flow')
 ('net/mlx5e: Fix endianness handling in pedit mask')
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 18:38:58 -07:00
Petr Machata
463957e3fb mlxsw: spectrum_flower: Offload FLOW_ACTION_PRIORITY
Offload action skbedit priority when keyed to a flower classifier. The
skb->priority field in Linux is very generic, so only allow setting the
bottom 8 priorities and bounce anything else.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19 21:09:20 -07:00
Petr Machata
4d745f8cf5 mlxsw: core: Add QOS_ACTION
The QOS_ACTION is used for manipulating the QoS attributes of a packet.
Add the corresponding defines and helpers, in particular for the
switch_priority override.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19 21:09:20 -07:00
Ido Schimmel
6002059d78 mlxsw: pci: Only issue reset when system is ready
During initialization the driver issues a software reset command and
then waits for the system status to change back to "ready" state.

However, before issuing the reset command the driver does not check that
the system is actually in "ready" state. On Spectrum-{1,2} systems this
was always the case as the hardware initialization time is very short.
On Spectrum-3 systems this is no longer the case. This results in the
software reset command timing-out and the driver failing to load:

[ 6.347591] mlxsw_spectrum3 0000:06:00.0: Cmd exec timed-out (opcode=40(ACCESS_REG),opcode_mod=0,in_mod=0)
[ 6.358382] mlxsw_spectrum3 0000:06:00.0: Reg cmd access failed (reg_id=9023(mrsr),type=write)
[ 6.368028] mlxsw_spectrum3 0000:06:00.0: cannot register bus device
[ 6.375274] mlxsw_spectrum3: probe of 0000:06:00.0 failed with error -110

Fix this by waiting for the system to become ready both before issuing
the reset command and afterwards. In case of failure, print the last
system status to aid in debugging.

Fixes: da382875c6 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-19 20:23:27 -07:00
David S. Miller
79e28519ac mlx5-updates-2020-03-17
1) Compiler warnings and cleanup for the connection tracking series
 2) Bug fixes for the connection tracking series
 3) Fix devlink port register sequence
 4) Last five patches in the series, By Eli cohen
    Add the support for forwarding traffic between two eswitch uplink
    representors (Hairpin for eswitch), using mlx5 termination tables
    to change the direction of a packet in hw from RX to TX pipeline.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl5ximgACgkQSD+KveBX
 +j4iGwf9FrTxtGjgVXuwmc5LmSU6tak5SjK+dW5PdCw4mNorN2hSJeV/f9evLrf7
 7Cxfm4OH8/ivOSpVQz6XZEF0aYTq9T3JakGzAWESWwo/s+i7iwA+lVPYKhcvHXeg
 C9ImWbnyDCZkPZM6jz4KNpSMRWkyB7sEtQ51hYF0bdiSzcLSDaLCoKPEljp7sNKb
 f1456/yDuOIZ3sb6rYPH6e8EqqfUMiyYAyY3bBu09sl3deXopyueYVqPSPgjOoC7
 SfM5K+9nnuQJdvSqwUJLexxDZo1Z7fizz73LwUp0SBLk5zvZdn2bhxbt4wPS/xH/
 CSjsWFAs1eu2rDqRH48G3jKqTZeq2w==
 =LkML
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-03-17' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-03-17

1) Compiler warnings and cleanup for the connection tracking series
2) Bug fixes for the connection tracking series
3) Fix devlink port register sequence
4) Last five patches in the series, By Eli cohen
   Add the support for forwarding traffic between two eswitch uplink
   representors (Hairpin for eswitch), using mlx5 termination tables
   to change the direction of a packet in hw from RX to TX pipeline.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 19:13:37 -07:00
Jiri Pirko
4e145fc6eb mlxsw: spectrum_cnt: Expose devlink resource occupancy for counters
Implement occupancy counting for counters and expose over devlink
resource API.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
53d9636694 mlxsw: spectrum_cnt: Consolidate subpools initialization
Put all init operations related to subpools into
mlxsw_sp_counter_sub_pools_init().

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
ab8c4cc604 mlxsw: spectrum_cnt: Move config validation along with resource register
Move the validation of subpools configuration, to avoid possible over
commitment to resource registration. Add WARN_ON to indicate bug
in the code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
d53cdbb889 mlxsw: spectrum_cnt: Expose subpool sizes over devlink resources
Implement devlink resources support for counter pools. Move the subpool
sizes calculations into the new resources register function.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
b2d3e33c77 mlxsw: spectrum_cnt: Add entry_size_res_id for each subpool and use it to query entry size
Add new field to subpool struct that would indicate which
resource id should be used to query the entry size for
the subpool from the device.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
c33fbe949f mlxsw: spectrum_cnt: Move sub_pools under per-instance pool struct
Currently, the global static array of subpools is used. Make it
per-instance as multiple instances of the mlxsw driver can have
different values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jiri Pirko
ac5de9a20f mlxsw: spectrum_cnt: Query bank size from FW resources
The bank size is different between Spectrum versions. Also it is
a resource that can be queried. So instead of hard coding the value in
code, query it from the firmware.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-18 16:46:19 -07:00
Jakub Kicinski
53eca1f347 net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*
flow_action_hw_stats_types_check() helper takes one of the
FLOW_ACTION_HW_STATS_*_BIT values as input. If we align
the arguments to the opening bracket of the helper there
is no way to call this helper and stay under 80 characters.

Remove the "types" part from the new flow_action helpers
and enum values.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-17 21:12:39 -07:00
Eli Cohen
87b51810f4 net/mlx5: Avoid forwarding to other eswitch uplink
Do not allow forwarding of encapsulated traffic received from one eswtich's
uplink to another eswtich's uplink.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:41 -07:00
Eli Cohen
613f53fe09 net/mlx5: Eswitch, enable forwarding back to uplink port
Add dependencny on cap termination_table_raw_traffic to allow non
encapsulated packets received from uplink to be forwarded back to the
received uplink port.

Refactor the conditions into a separate function.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:39 -07:00
Eli Cohen
249ccc3c95 net/mlx5e: Add support for offloading traffic from uplink to uplink
Termination tables change the direction of a packet in hw from RX to SX
pipeline. Use that to offload hairpin flows received from uplink and
sent back to uplink.

Currently termination tables are used for pushing VLAN to packets
received from uplink and targeting a VF. Extend the implementation to
allow forwarding packets to uplink. These packets can either be
encapsulated or not.

In case encapsulation is needed before forwarding, move the reformat
object to the termination table as required.

Extend the hash table key to include tunnel information for the sake of
reusing reformat objects.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:37 -07:00
Eli Cohen
d8a2034f15 net/mlx5: Don't use termination tables in slow path
Don't use termination tables for packets that are steered to the slow path,
as a pre-step for supporting packet encap (packet reformat) action on
termination tables. Packet encap (reformat action) actions steer the packet
to the slow path until outer arp entries are resolved.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:35 -07:00
Eli Cohen
b5f814cc73 net/mlx5: Avoid configuring eswitch QoS if not supported
Check if QoS is enabled for the eswitch before attempting to configure
QoS parameters and emit a netlink error if not supported.

Introduce an API to check if QoS is supported for the eswitch.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:32 -07:00
Vladyslav Tarasiuk
31e87b39ba net/mlx5e: Fix devlink port register sequence
If udevd is configured to rename interfaces according to persistent
naming rules and if a network interface has phys_port_name in sysfs,
its contents will be appended to the interface name.
However, register_netdev creates device in sysfs and if
devlink_port_register is called after that, there is a timeframe in
which udevd may read an empty phys_port_name value. The consequence is
that the interface will lose this suffix and its name will not be
really persistent.

The solution is to register the port before registering a netdev.

Fixes: c6acd629ee ("net/mlx5e: Add support for devlink-port in non-representors mode")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:30 -07:00
Roi Dayan
d0645b3780 net/mlx5e: Fix rejecting all egress rules not on vlan
The original condition rejected all egress rules that
are not on tunnel device.
Also, the whole point of this egress reject was to disallow bad
rules because of egdev which doesn't exists today, so remove
this check entirely.

Fixes: 0a7fcb78cc ("net/mlx5e: Support inner header rewrite with goto action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:28 -07:00
Paul Blakey
636bb96852 net/mlx5e: en_tc: Rely just on register loopback for tunnel restoration
Register loopback which is needed for tunnel restoration, is now always
enabled if supported and not just with metadata enabled, check for
that instead.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:26 -07:00
Saeed Mahameed
aded104d39 net/mlx5e: CT: Fix stack usage compiler warning
Fix the following warnings: [-Werror=frame-larger-than=]

In function ‘mlx5_tc_ct_entry_add_rule’:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:541:1:
error: the frame size of 1136 bytes is larger than 1024 bytes

In function ‘__mlx5_tc_ct_flow_offload’:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:1049:1:
error: the frame size of 1168 bytes is larger than 1024 bytes

Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
2020-03-17 19:41:24 -07:00
Paul Blakey
3cfc4332ed net/mlx5e: CT: Fix insert rules when TC_CT config isn't enabled
If CONFIG_MLX5_TC_CT isn't enabled, all offloading of eswitch tc rules
fails on parsing ct match, even if there is no ct match.

Return success if there is no ct match, regardless of config.

Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:21 -07:00
YueHaibing
35e725e1b9 net/mlx5e: CT: remove set but not used variable 'unnew'
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:
 In function mlx5_tc_ct_parse_match:
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c:699:36: warning:
 variable unnew set but not used [-Wunused-but-set-variable]

Fixes: 4c3844d9e9 ("net/mlx5e: CT: Introduce connection tracking")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:19 -07:00
Paul Blakey
e0cb8afdbb net/mlx5: E-Switch, Skip restore modify header between prios of same chain
Restore modify header writes the chain mapping on the packet.
This modify header and action is added on all prios connections,
and gets overwritten with the same value consecutively in prios
of the same chain.

Use the chain's modify header only for the last prio of a given tc
chain.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:17 -07:00
Paul Blakey
0b3a8b6b53 net/mlx5: E-Switch: Fix using fwd and modify when firmware doesn't support it
Currently, if firmware doesn't support fwd and modify, driver fails
initializing eswitch chains while entering switchdev mode.

Instead, on such cases, disable the chains and prio feature (as we can't
restore the chain on miss) and the usage of fwd and modify.

Fixes: 8f1e0b97cc ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:15 -07:00
Nathan Chancellor
9d3faa51be net/mlx5: Add missing inline to stub esw_add_restore_rule
When CONFIG_MLX5_ESWITCH is unset, clang warns:

In file included from drivers/net/ethernet/mellanox/mlx5/core/main.c:58:
drivers/net/ethernet/mellanox/mlx5/core/eswitch.h:670:1: warning: unused
function 'esw_add_restore_rule' [-Wunused-function]
esw_add_restore_rule(struct mlx5_eswitch *esw, u32 tag)
^
1 warning generated.

This stub function is missing inline; add it to suppress the warning.

Fixes: 11b717d615 ("net/mlx5: E-Switch, Get reg_c0 value on CQE")
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-17 19:41:13 -07:00
Nathan Chancellor
826096d84f mlx5: Remove uninitialized use of key in mlx5_core_create_mkey
Clang warns:

../drivers/net/ethernet/mellanox/mlx5/core/mr.c:63:21: warning: variable
'key' is uninitialized when used here [-Wuninitialized]
                      mkey_index, key, mkey->key);
                                  ^~~
../drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h:54:6: note:
expanded from macro 'mlx5_core_dbg'
                 ##__VA_ARGS__)
                   ^~~~~~~~~~~
../include/linux/dev_printk.h:114:39: note: expanded from macro
'dev_dbg'
        dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
                                             ^~~~~~~~~~~
../include/linux/dynamic_debug.h:158:19: note: expanded from macro
'dynamic_dev_dbg'
                           dev, fmt, ##__VA_ARGS__)
                                       ^~~~~~~~~~~
../include/linux/dynamic_debug.h:143:56: note: expanded from macro
'_dynamic_func_call'
        __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__)
                                                              ^~~~~~~~~~~
../include/linux/dynamic_debug.h:125:15: note: expanded from macro
'__dynamic_func_call'
                func(&id, ##__VA_ARGS__);               \
                            ^~~~~~~~~~~
../drivers/net/ethernet/mellanox/mlx5/core/mr.c:47:8: note: initialize
the variable 'key' to silence this warning
        u8 key;
              ^
               = '\0'
1 warning generated.

key's initialization was removed in commit fc6a9f86f0 ("{IB,net}/mlx5:
Assign mkey variant in mlx5_ib only") but its use was not fully removed.
Remove it now so that there is no more warning.

Fixes: fc6a9f86f0 ("{IB,net}/mlx5: Assign mkey variant in mlx5_ib only")
Link: https://github.com/ClangBuiltLinux/linux/issues/932
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-16 23:30:32 -07:00
Takashi Iwai
4a348601eb net: mlx4: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Cc: "David S . Miller" <davem@davemloft.net>
Cc: Tariq Toukan <tariqt@mellanox.com>
To: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:06:22 -07:00
Ido Schimmel
cb851c01b5 mlxsw: reg: Increase register field length to 31 bits
The cited commit set a value of 2^31-1 in order to "disable" the shaper
on a given a port. However, the length of the maximum shaper rate field
was not updated from 28 bits to 31 bits, which means ports are still
limited to ~268Gbps despite supporting speeds of 400Gbps.

Fix this by increasing the field's length.

Fixes: 92afbfedb7 ("mlxsw: reg: Increase MLXSW_REG_QEEC_MAS_DIS")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:04:16 -07:00
Petr Machata
8040c96b4f mlxsw: spectrum_qdisc: Offload RED ECN nodrop mode
RED ECN nodrop mode means that non-ECT traffic should not be early-dropped,
but enqueued normally instead. In Spectrum systems, this is achieved by
disabling CWTPM.ew (enable WRED) for a given traffic class.

So far CWTPM.ew was unconditionally enabled. Instead disable it when the
RED qdisc is in nodrop mode.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-14 21:03:46 -07:00
Alex Vesker
bc1a02884a net/mlx5: DR, Remove unneeded functions deceleration
Remove dummy functions declaration, the dummy functions are not needed
since fs_dr is the only one to call mlx5dr and both fs_dr and dr files
depend on the same config flag (MLX5_SW_STEERING).

Fixes: 70605ea545 ("net/mlx5: DR, Expose APIs for direct rule managing")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:28 -07:00
Alex Vesker
de346f401a net/mlx5: DR, Add support for flow table id destination action
This action allows to go to a flow table based on the table id.
Goto flow table id is required for supporting user space SW.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:26 -07:00
Parav Pandit
0e6fa491e8 net/mlx5: Avoid deriving mlx5_core_dev second time
All callers needs to work on mlx5_core_dev and it is already derived
before calling mlx5_devlink_eswitch_check().
Hence, accept mlx5_core_dev in mlx5_devlink_eswitch_check().

Given that it works on mlx5_core_dev change helper function name to
drop devlink prefix.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:24 -07:00
Parav Pandit
d6c8022dfb net/mlx5: E-switch, Annotate esw state_lock mutex destroy
Invoke mutex_destroy() to catch any esw state_lock errors.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:22 -07:00
Parav Pandit
2bb72e7e2a net/mlx5: E-switch, Annotate termtbl_mutex mutex destroy
Annotate mutex destroy to keep it symmetric to init sequence.
It should be destroyed after its users (representor netdevices) are
destroyed in below flow.

esw_offloads_disable()
  esw_offloads_unload_rep()

Hence, initialize the mutex before creating the representors which uses
it.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:19 -07:00
Mark Bloch
5c2aa8ae3a net/mlx5: Accept flow rules without match
Allow passing NULL spec when creating a flow rule. Such rules will act
as "catch all" flow rules.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:17 -07:00
Bodong Wang
4110fc59ea net/mlx5: E-Switch, Refactor unload all reps per rep type
Following introduction of per vport configuration of vport and rep,
unload all reps per rep type is still needed as IB reps can be
unloaded individually. However, a few internal functions exist purely
for this purpose, merge them to a single function.

This patch doesn't change any existing functionality.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:15 -07:00
Bodong Wang
23bb50cf73 net/mlx5: E-Switch, Update VF vports config when num of VFs changed
Currently, ECPF eswitch manager does one-time only configuration for
VF vports when device switches to offloads mode. However, when num of
VFs changed from host side, driver doesn't update VF vports
configurations.

Hence, perform VFs vport configuration update whenever num_vfs change
event occurs.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:12 -07:00
Bodong Wang
c2d7712ca3 net/mlx5: E-Switch, Introduce per vport configuration for eswitch modes
Both legacy and offload modes require vport setup, only offload mode
requires rep setup. Before this patch, vport and rep operations are
separated applied to all relevant vports in different stages.

Change to use per vport configuration, so that vport and rep operations
are modularized per vport.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:10 -07:00
Bodong Wang
d7c92cb56f net/mlx5: E-switch, Make vport setup/cleanup sequence symmetric
Vport enable and disable sequence is incorrect. It should be:
  enable()
  esw_vport_setup_acl,
  esw_vport_setup,
  esw_vport_enable_qos.

  disable()
  esw_vport_disable_qos,
  esw_vport_cleanup,
  esw_vport_cleanup_acl.

Instead of having two setup functions for port and acl, merge
acl setup to port setup function.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:08 -07:00
Bodong Wang
878a73318a net/mlx5: E-Switch, Prepare for vport enable/disable refactor
Rename esw_apply_vport_config() to esw_vport_setup(), and add new
helper function esw_vport_cleanup() to make them symmetric.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:05 -07:00
Bodong Wang
a9814d7fde net/mlx5: E-Switch, Remove redundant warning when QoS enable failed
esw_vport_enable_qos can return error in cases below:
1. QoS is already enabled. Warnning is useless in this case.
2. Create scheduling element cmd failed. There is already a warning.

Remove the redundant warnning if esw_vport_enable_qos returns err.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:03 -07:00
Bodong Wang
14c844cbf3 net/mlx5: E-Switch, Hold mutex when querying drop counter in legacy mode
Consider scenario below, CPU 1 is at risk to query already destroyed
drop counters. Need to apply the same state mutex when disabling vport.

+-------------------------------+-------------------------------------+
| CPU 0                         | CPU 1                               |
+-------------------------------+-------------------------------------+
| mlx5_device_disable_sriov     | mlx5e_get_vf_stats                  |
| mlx5_eswitch_disable          | mlx5_eswitch_get_vport_stats        |
| esw_disable_vport             | mlx5_eswitch_query_vport_drop_stats |
| mlx5_fc_destroy(drop_counter) | mlx5_fc_query(drop_counter)         |
+-------------------------------+-------------------------------------+

Fixes: b8a0dbe3a9 ("net/mlx5e: E-switch, Add steering drop counters")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:26:01 -07:00
Bodong Wang
86f9453c5f net/mlx5: E-Switch, Remove redundant check of eswitch manager cap
esw_vport_create_legacy_acl_tables bails out immediately for eswitch
manager, hence remove all the check of esw manager cap after.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-13 16:25:59 -07:00
Jason Gunthorpe
d613bd64c6 Merge branch 'mlx5_mr_cache' into rdma.git for-next
Leon Romanovsky says:

====================
This series fixes various corner cases in the mlx5_ib MR cache
implementation, see specific commit messages for more information.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_mr-cache':
  RDMA/mlx5: Allow MRs to be created in the cache synchronously
  RDMA/mlx5: Revise how the hysteresis scheme works for cache filling
  RDMA/mlx5: Fix locking in MR cache work queue
  RDMA/mlx5: Lock access to ent->available_mrs/limit when doing queue_work
  RDMA/mlx5: Fix MR cache size and limit debugfs
  RDMA/mlx5: Always remove MRs from the cache before destroying them
  RDMA/mlx5: Simplify how the MR cache bucket is located
  RDMA/mlx5: Rename the tracking variables for the MR cache
  RDMA/mlx5: Replace spinlock protected write with atomic var
  {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
  {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
  {IB,net}/mlx5: Setup mkey variant before mr create command invocation
2020-03-13 11:11:07 -03:00
Michael Guralnik
a3cfdd3928 {IB,net}/mlx5: Move asynchronous mkey creation to mlx5_ib
As mlx5_ib is the only user of the mlx5_core_create_mkey_cb, move the
logic inside mlx5_ib and cleanup the code in mlx5_core.

Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:10 +02:00
Saeed Mahameed
fc6a9f86f0 {IB,net}/mlx5: Assign mkey variant in mlx5_ib only
mkey variant is not required for mlx5_core use, move the mkey variant
counter to mlx5_ib.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:04 +02:00
Saeed Mahameed
54c62e13ad {IB,net}/mlx5: Setup mkey variant before mr create command invocation
On reg_mr_callback() mlx5_ib is recalculating the mkey variant which is
wrong and will lead to using a different key variant than the one
submitted to firmware on create mkey command invocation.

To fix this, we store the mkey variant before invoking the firmware
command and use it later on completion (reg_mr_callback).

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2020-03-13 15:48:00 +02:00
Paul Blakey
1ef3018f5a net/mlx5e: CT: Support clear action
Clear action, as with software, removes all ct metadata from
the packet.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
5c6b946047 net/mlx5e: CT: Handle misses after executing CT action
Mark packets with a unique tupleid, and on miss use that id to get
the act ct restore_cookie. Using that restore cookie, we ask CT to
restore the relevant info on the SKB.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
ac991b48d4 net/mlx5e: CT: Offload established flows
Register driver callbacks with the nf flow table platform.
FT add/delete events will create/delete FTE in the CT/CT_NAT tables.

Restoring the CT state on miss will be added in the following patch.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
4c3844d9e9 net/mlx5e: CT: Introduce connection tracking
Add support for offloading tc ct action and ct matches.
We translate the tc filter with CT action the following HW model:

+-------------------+      +--------------------+    +--------------+
+ pre_ct (tc chain) +----->+ CT (nat or no nat) +--->+ post_ct      +----->
+ original match    +  |   + tuple + zone match + |  + fte_id match +  |
+-------------------+  |   +--------------------+ |  +--------------+  |
                       v                          v                    v
                      set chain miss mapping  set mark             original
                      set fte_id              set label            filter
                      set zone                set established      actions
                      set tunnel_id           do nat (if needed)
                      do decap

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
43435e9139 net/mlx5: E-Switch, Support getting chain mapping
Currently, we write chain register mapping on miss from the the last
prio of a chain. It is used to restore the chain in software.

To support re-using the chain register mapping from global tables (such
as CT tuple table) misses, export the chain mapping.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
6fb0701a9c net/mlx5: E-Switch, Add support for offloading rules with no in_port
FTEs in global tables may match on packets from multiple in_ports.
Provide the capability to omit the in_port match condition.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
d18296ffd9 net/mlx5: E-Switch, Introduce global tables
Currently, flow tables are automatically connected according to their
<chain,prio,level> tuple.

Introduce global tables which are flow tables that are detached from the
eswitch chains processing, and will be connected by explicitly referencing
them from multiple chains.

Add this new table type, and allow connecting them by refenece.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:39 -07:00
Paul Blakey
c6fe5729dc net/mlx5e: en_rep: Create uplink rep root table after eswitch offloads table
The eswitch offloads table, which has the reps (vport) rx miss rules,
was moved from OFFLOADS namespace [0,0] (prio, level), to [1,0], so
the restore table (the new [0,0]) can come before it. The destinations
of these miss rules is the rep root ft (ttc for non uplink reps).

Uplink rep root ft is created as OFFLOADS namespace [0,1], and is used
as a hook to next RX prio (either ethtool or ttc), but this fails to
pass fs_core level's check.

Move uplink rep root ft to OFFLOADS prio 1, level 1 ([1,1]), so it
will keep the same relative position after the restore table
change.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:38 -07:00
Paul Blakey
5b7cb74515 net/mlx5: E-Switch, Enable reg c1 loopback when possible
Enable reg c1 loopback if firmware reports it's supported,
as this is needed for restoring packet metadata (e.g chain).

Also define helper to query if it is enabled.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-12 15:00:38 -07:00
David S. Miller
bf3347c4d1 Merge branch 'ct-offload' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux 2020-03-12 12:34:23 -07:00
Jakub Kicinski
bd4be35b4a net: mlx4: reject unsupported coalescing params
Set ethtool_ops->supported_coalesce_params to let
the core reject unsupported coalescing parameters.

This driver did not previously reject unsupported parameters.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:28:53 -07:00
Jiri Pirko
a16fa28984 flow_offload: restrict driver to pass one allowed bit to flow_action_hw_stats_types_check()
The intention of this helper was to allow driver to specify one type
that it supports, so not only "any" value would pass. So make the API
more strict and allow driver to pass only 1 bit that is going
to be checked.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-10 16:04:19 -07:00
Jason Gunthorpe
3e3cf2e82c Merge branch 'mlx5_packet_pacing' into rdma.git for-next
Yishai Hadas Says:

====================
Expose raw packet pacing APIs to be used by DEVX based applications.  The
existing code was refactored to have a single flow with the new raw APIs.
====================

Based on the mlx5-next branch at
 git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Due to dependencies

* branch 'mlx5_packet_pacing':
  IB/mlx5: Introduce UAPIs to manage packet pacing
  net/mlx5: Expose raw packet pacing APIs
2020-03-10 11:54:17 -03:00
Vlad Buslov
b63293e759 net/mlx5e: Show/set Rx network flow classification rules on ul rep
Reuse infrastructure that already exists for pf in legacy mode to show/set
Rx network flow classification rules for uplink representors.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09 16:58:49 -07:00
Vlad Buslov
6783e8b29f net/mlx5e: Init ethtool steering for representors
During transition to uplink representors the code responsible for
initializing ethtool steering functionality wasn't added to representor
init rx routine. This causes NULL pointer dereference during configuration
of network flow classification rule with ethtool (only possible to
reproduce with next commit in this series which registers necessary ethtool
callbacks).

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09 16:58:48 -07:00
Vlad Buslov
01013ad355 net/mlx5e: Show/set Rx flow indir table and RSS hash key on ul rep
Reuse infrastructure that already exists for pf in legacy mode to show/set
Rx flow hash indirection table and RSS hash key for uplink representors.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09 16:58:46 -07:00
Saeed Mahameed
20f7b37ffc net/mlx5e: Introduce root ft concept for representors netdevs
Uplink representor traffic will be redirected to an empty root ft rather
than directly to a direct tir or ttc table, this root ft will be empty and
will be used as a link for auto-chaining with ttc table or ethtool tables
in downstream patches.

On load, fs core will connect uplink rep root_ft with ttc table.  In case
ethtool steering will be used, fs core will auto connect root_ft with
the ethtool bypass tables, which will be connected with the ttc table.

vport_rx_rule[uplink_rep]->root_ft->ethtool->ttc.

For non-uplink representors, for simplicity root_ft will always point at
ttc table, hence the replace vport_rx rule logic is removed.

vport_rx_rule[non_uplink_rep]->root_ft(ttc).

For now ethtool steering support can only be available on uplink rep.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
2020-03-09 16:58:44 -07:00
Parav Pandit
cc617ceda0 net/mlx5: E-switch, make query inline mode a static function
mlx5_eswitch_inline_mode_get() is used only in eswitch_offloads.c.
Hence, make it static and adjacent to its caller function.

Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:42 -07:00
Paul Blakey
891b8f3321 net/mlx5: Allocate smaller size tables for ft offload
Instead of giving ft tables one of the largest tables available - 4M,
give it a more reasonable size - 64k. Especially since it will
always be created as a miss hook in the following patch.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:40 -07:00
Dan Carpenter
d9fb932fde net/mlx5e: Fix an IS_ERR() vs NULL check
The esw_vport_tbl_get() function returns error pointers on error.

Fixes: 96e326878f ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:38 -07:00
Eli Cohen
2fbbc30da0 net/mlx5: Verify goto chain offload support
According to PRM, forward to flow table along with either packet
reformat or decap is supported only if reformat_and_fwd_to_table
capability is set for the flow table.

Add dependency on the capability and pack all the conditions for "goto
chain" in a single function.

Fix language in error message in case of not supporting forward to a
lower numbered flow table.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:36 -07:00
Majd Dibbiny
1e62e222db net/mlx5: E-Switch, Use vport metadata matching only when mandatory
Multi-port RoCE mode requires tagging traffic that passes through the
vport.
This matching can cause performance degradation, therefore disable it
and use the legacy matching on vhca_id and source_port when possible.

Fixes: 92ab1eb392 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:35 -07:00
Mark Bloch
2f5438ca0e net/mlx5: Tidy up and fix reverse christmas ordring
Use reverse chirstmas tree inside mlx5e_ethtool_get_link_ksettings.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:33 -07:00
Mark Bloch
c268ca6087 net/mlx5: Expose port speed when possible
When port speed can't be reported based on ext_eth_proto_capability
or eth_proto_capability instead of reporting speed as unknown check
if the port's speed can be inferred based on the data_rate_oper field.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:31 -07:00
Saeed Mahameed
a70ed9d8ec Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
This series adds some HW bits and definitions for mlx5 driver, to be
used by downstream features in both rdma and netdev branches.

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: HW bit for goto chain offload support
  net/mlx5: Expose link speed directly
  net/mlx5: Introduce TLS and IPSec objects enums
  net/mlx5: Introduce egress acl forward-to-vport capability
  net/mlx5: Expose raw packet pacing APIs
  net/mlx5e: Replace zero-length array with flexible-array member
  net/mlx5: fix spelling mistake "reserverd" -> "reserved"

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-09 16:58:26 -07:00
Jiri Pirko
d7cb1e3ba1 flow_offload: introduce "disabled" HW stats type and allow it in mlxsw
Introduce new type for disabled HW stats and allow the value in
mlxsw offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
f16e7f64e4 mlxsw: spectrum_acl: Ask device for rule stats only if counter was created
Set a flag in case rule counter was created. Only query the device for
stats of a rule, which has the valid counter assigned.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
4885547951 flow_offload: introduce "delayed" HW stats type and allow it in mlx5
Introduce new type for delayed HW stats and allow the value in
mlx5 offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
d60d7ed4c8 flow_offload: introduce "immediate" HW stats type and allow it in mlxsw
Introduce new type for immediate HW stats and allow the value in
mlxsw offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
c4afd0c816 mlxsw: restrict supported HW stats type to "any"
Currently don't allow actions with any other type to be inserted.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
3632f6d390 mlxsw: spectrum_flower: Do not allow mixing HW stats types for actions
As there is one set of counters for the whole action chain, forbid to
mix the HW stats types.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Jiri Pirko
319a1d1947 flow_offload: check for basic action hw stats type
Introduce flow_action_basic_hw_stats_types_check() helper and use it
in drivers. That sanitizes the drivers which do not have support
for action HW stats types.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-08 21:07:48 -07:00
Saeed Mahameed
bd673da6d9 net/mlx5: Introduce TLS and IPSec objects enums
Expose the TLS encryption key general object type enum correctly,
and add the IPSec encryption key general object type enum.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-07 13:19:25 -08:00
Eli Cohen
0b13645474 net/mlx5: Clear LAG notifier pointer after unregister
After returning from unregister_netdevice_notifier_dev_net(), set the
notifier_call field to NULL so successive call to mlx5_lag_add() will
function as expected.

Fixes: 7907f23adc ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-05 15:13:45 -08:00
Sebastian Hense
404402abd5 net/mlx5e: Fix endianness handling in pedit mask
The mask value is provided as 64 bit and has to be casted in
either 32 or 16 bit. On big endian systems the wrong half was
casted which resulted in an all zero mask.

Fixes: 2b64beba02 ("net/mlx5e: Support header re-write of partial fields in TC pedit offload")
Signed-off-by: Sebastian Hense <sebastian.hense1@ibm.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-05 15:13:42 -08:00
Tariq Toukan
f28ca65efa net/mlx5e: kTLS, Fix wrong value in record tracker enum
Fix to match the HW spec: TRACKING state is 1, SEARCHING is 2.
No real issue for now, as these values are not currently used.

Fixes: d2ead1f360 ("net/mlx5e: Add kTLS TX HW offload support")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-05 15:13:39 -08:00