Commit Graph

4159 Commits

Author SHA1 Message Date
Jiri Pirko
e2f2a1fd5b mlxsw: spectrum: Implement chain template hinting
Since cld_flower provides information about the filter template for
specific chain, use this information in order to prepare a region.
Use the template to find out what elements are going to be used
and pass that down to mlxsw_sp_acl_tcam_group_add(). Later on, when the
first filter is inserted, the mlxsw_sp_acl_tcam_group_use_patterns()
function would use this element usage information instead of looking
up a pattern.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-23 20:44:13 -07:00
Boris Pismenny
3f44899ef2 net/mlx5e: Use PARTIAL_GSO for UDP segmentation
This patch removes the splitting of UDP_GSO_L4 packets in the driver,
and exposes UDP_GSO_L4 as a PARTIAL_GSO feature. Thus, the network stack
is not responsible for splitting the packet into two.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
cc495188a8 net/mlx5e: Support offloading double vlan push/pop tc actions
As we can configure two push/pop actions in one flow table entry,
add support to offload those double vlan actions in a rule to HW.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
1482bd3d50 net/mlx5e: Refactor tc vlan push/pop actions offloading
Extract actions offloading code to a new function, and also extend data
structures for double vlan actions.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Jianbo Liu
699e96ddf4 net/mlx5e: Support offloading tc double vlan headers match
We can match on both outer and inner vlan tags, add support for
offloading that.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Roi Dayan
c7f7ba8df8 net/mlx5e: Remove redundant WARN when we cannot find neigh entry
It is possible for neigh entry not to exist if it was cleaned already.
When we bring down an interface the neigh gets deleted but it could be
that our listener for neigh event to clear the encap valid bit didn't
start yet and the neigh update last used work is started first.
In this scenario the encap entry has valid bit set but the neigh entry
doesn't exist.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
3101d1fc6b net/mlx5: FW tracer, Add debug prints
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
244069532f net/mlx5: FW tracer, Enable tracing
Add the tracer file to the makefile and add the init
function to the load one flow.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
70dd6fdb89 net/mlx5: FW tracer, parse traces and kernel tracing support
For each message the driver should do the following:
1- Find the message string in the strings database
2- Count the param number of each message
3- Wait for the param events and accumulate them
4- Calculate the event timestamp using the local event timestamp
and the first timestamp event following it.
5- Print message to trace log

Enable the tracing by:
echo 1 > /sys/kernel/debug/tracing/events/mlx5/mlx5_fw/enable

Read traces by:
cat /sys/kernel/debug/tracing/trace

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
c71ad41ccb net/mlx5: FW tracer, events handling
The tracer has one event, event 0x26, with two subtypes:
- Subtype 0: Ownership change
- Subtype 1: Traces available

An ownership change occurs in the following cases:
1- Owner releases his ownership, in this case, an event will be
sent to inform others to reattempt acquire ownership.
2- Ownership was taken by a higher priority tool, in this case
the owner should understand that it lost ownership, and go through
tear down flow.

The second subtype indicates that there are traces in the trace buffer,
in this case, the driver polls the tracer buffer for new traces, parse
them and prepares the messages for printing.

The HW starts tracing from the first address in the tracer buffer.
Driver receives an event notifying that new trace block exists.
HW posts a timestamp event at the last 8B of every 256B block.
Comparing the timestamp to the last handled timestamp would indicate
that this is a new trace block. Once the new timestamp is detected,
the entire block is considered valid.

Block validation and parsing, should be done after copying the current
block to a different location, in order to avoid block overwritten
during processing.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
e9cad2cea7 net/mlx5: FW tracer, register log buffer memory key
Create a memory key and protection domain for the tracer log buffer.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
48967ffdeb net/mlx5: FW tracer, create trace buffer and copy strings database
For each PF do the following:
1- Allocate memory for the tracer strings database and read the
strings from the FW to the SW. These strings will be used later for
parsing traces.
2- Allocate and dma map tracer buffers.

Traces that will be written into the buffer will be parsed as a group
of one or more traces, referred to as trace message. The trace message
represents a C-like printf string.
First trace of a message holds the pointer to the correct string in
strings database. The following traces holds the variables of the
message.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Feras Daoud
f53aaa31cc net/mlx5: FW tracer, implement tracer logic
Implement FW tracer logic and registers access, initialization and
cleanup flows.

Initializing the tracer will be part of load one flow, as multiple
PFs will try to acquire ownership but only one will succeed and will
be the tracer owner.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 15:01:11 -07:00
Saeed Mahameed
7854ac44fe Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
mlx5 core infrastructure updates and fixes.

From Eran:
 - Add MPEGC (Management PCIe General Configuration) registers and btis
 - Fix tristate and description for MLX5 module

rom Feras:
 - Add hardware structures for the firmware tracer

From Jainbo:
 - Core support for double vlan push/pop steering action

From Max:
 - Add XRQ commands definitions

From Noa:
 - Add missing SET_DRIVER_VERSION command translation

From Roi:
 - Use ERR_CAST() instead of coding it

From Tariq:
 - Better return types for CQE API

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-23 14:58:46 -07:00
David S. Miller
c4c5551df1 Merge ra.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 21:17:12 -07:00
Roi Dayan
7e29392eee net/mlx5e: Only allow offloading decap egress (egdev) flows
We get egress rules through the egdev mechanism when the ingress device
is not supporting offload, with the expected use-case of tunnel decap
ingress rule set on shared tunnel device.

Make sure to offload egress/egdev rules only if decap action (tunnel key
unset) exists there and err otherwise.

Fixes: 717503b9cf ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:59 -07:00
Tariq Toukan
d7037ad73d net/mlx5: Fix QP fragmented buffer allocation
Fix bad alignment of SQ buffer in fragmented QP allocation.
It should start directly after RQ buffer ends.

Take special care of the end case where the RQ buffer does not occupy
a whole page. RQ size is a power of two, so would be the case only for
small RQ sizes (RQ size < PAGE_SIZE).

Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
and for npages value of RQ and SQ.

Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:58 -07:00
Raed Salem
8c49f54a3f net/mlx5: Fix 'DON'T_TRAP' functionality
The flow counters binding support commit introduced a code change where
none NULL 'rule_dest' is always passed to mlx5_add_flow_rules, this breaks
'DON'T_TRAP' rules insertion.

The fix uses the equivalent 'dest_num' value instead of dest pointer
at the failed check.

fixes: 3b3233fbf0 ('IB/mlx5: Add flow counters binding support')
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:57 -07:00
Saeed Mahameed
443a858158 net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode
With debug kernel UBSAN detects the following issue, which might happen
when eswitch instance is not created, fix this by testing the eswitch
pointer before returning the eswitch mode, if not set return mode =
SRIOV_NONE.

[   32.528951] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:2219:12
[   32.528951] member access within null pointer of type 'struct mlx5_eswitch'
[   32.528951] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.0-rc3-dirty #181
[   32.528951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
[   32.528951] Call Trace:
[   32.528951]  dump_stack+0xc7/0x13b
[   32.528951]  ? show_regs_print_info+0x5/0x5
[   32.528951]  ? __pm_runtime_use_autosuspend+0x140/0x140
[   32.528951]  ubsan_epilogue+0x9/0x49
[   32.528951]  ubsan_type_mismatch_common+0x1f9/0x2c0
[   32.528951]  ? ucs2_as_utf8+0x310/0x310
[   32.528951]  ? device_initialize+0x229/0x2e0
[   32.528951]  __ubsan_handle_type_mismatch+0x9f/0xc9
[   32.528951]  ? __ubsan_handle_divrem_overflow+0x19b/0x19b
[   32.578008]  ? ib_device_get_by_index+0xf0/0xf0
[   32.578008]  mlx5_eswitch_mode+0x30/0x40
[   32.578008]  mlx5_ib_add+0x1e0/0x4a0

Fixes: 57cbd893c4 ("net/mlx5: E-Switch, Move representors definition to a global scope")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-18 18:16:57 -07:00
Eran Ben Elisha
d2e1c57bcf net/mlx5e: Don't allow aRFS for encapsulated packets
Driver is yet to support aRFS for encapsulated packets, return early
error in such case.

Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:56 -07:00
Eran Ben Elisha
2630bae801 net/mlx5e: Fix quota counting in aRFS expire flow
Quota should follow the amount of rules which do expire, and not the
number of rules that were examined, fixed that.

Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:55 -07:00
Ariel Levkovich
33180bee86 net/mlx5: Adjust clock overflow work period
When driver converts HW timestamp to wall clock time it subtracts
the last saved cycle counter from the HW timestamp and converts the
difference to nanoseconds.
The conversion is done by multiplying the cycles difference with the
clock multiplier value as a first step and therefore the cycles
difference should be small enough so that the multiplication product
doesn't exceed 64bit.

The overflow handling routine is in charge of updating the last saved
cycle counter in driver and it is called periodically using kernel
delayed workqueue.

The delay period for this work is calculated using the max HW cycle
counter value (a 41 bit mask) as a base which doesn't take the 64bit
limit into account so the delay period may be incorrect and too
long to prevent a large difference between the HW counter and the last
saved counter in SW.

This change adjusts the work period for the HW clock overflow work by
taking the minimum between the previous value and the quotient of max
u64 value and the clock multiplier value.

Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:55 -07:00
Shay Agroskin
e279d634f3 net/mlx5e: Refine ets validation function
Removed an error message received when configuring ETS total
bandwidth to be zero.
Our hardware doesn't support such configuration, so we shall
reject it in the driver. Nevertheless, we removed the error message
in order to eliminate error messages caused by old userspace tools
who try to pass such configuration.

Fixes: ff0891915c ("net/mlx5e: Fix ETS BW check")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 18:16:54 -07:00
Eran Ben Elisha
048f31437a net/mlx5: Fix tristate and description for MLX5 module
Current description did not include new devices. Fix that by proving the
correct generic description.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Roi Dayan
d34c6efc59 net/mlx5: Use ERR_CAST() instead of coding it
This makes it more readable that rule is being used to return an err.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Noa Osherovich
0f4039104e net/mlx5: Add missing SET_DRIVER_VERSION command translation
When translating command opcodes to a string, SET_DRIVER_VERSION
command was missing.

Fixes: 42ca502e17 ('net/mlx5_core: Use a macro in mlx5_command_str()')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Max Gurtovoy
a451733909 net/mlx5: Add XRQ commands definitions
Update mlx5 command list and error return function to handle XRQ
commands.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Jianbo Liu
8da6fe2a18 net/mlx5: Add core support for double vlan push/pop steering action
As newer firmware supports double push/pop in a single FTE, we add
core bits and extend vlan action logic for it.

Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-18 14:33:25 -07:00
Arnd Bergmann
6060d9d24e net/mlx5: fix an unused-function warning
These dummy helpers are all intended to be inline functions,
but one of them by accident came without the 'inline' keyword,
causing a harmless warning:

In file included from drivers/net/ethernet/mellanox/mlx5/core/main.c:63:
drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h:79:1: error: 'mlx5_accel_tls_add_flow' defined but not used [-Werror=unused-function]
 mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,

Fixes: ab412e1dd7 ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-18 13:38:54 -07:00
Jiri Pirko
c3ab435466 mlxsw: spectrum: Extend to support Spectrum-2 ASIC
Extend existing driver for Spectrum ASIC to support Spectrum-2 ASIC.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Jiri Pirko
9912e6b8c2 mlxsw: spectrum_acl: Add initial Spectrum-2 ACL implementation
Utilize only C-TCAM for now. Do very minimal A-TCAM initialization in
order to make C-TCAM work.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
a6b9c87daf mlxsw: spectrum_acl: Add region association callback
In Spectrum-2, ACL regions that use 8 or 12 key blocks require several
consecutive hardware regions.

In order to allow defragmentation, the device stores a mapping from a
logical region ID to an hardware region ID, which is similar to the page
table that is used to translate virtual addresses to physical addresses.

Add the region association callback to the region create sequence and
implement it as a NOP in Spectrum which does not require it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
7a921a1e58 mlxsw: spectrum_acl: Add support for Spectrum-2 block encoding
Encode each flexible key block in the general block scheme according its
block index.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
a6d70a878e mlxsw: spectrum_acl: Prepare for Spectrum-2 block encoding
In Spectrum the key (and mask) block layout is very straight forward and
every block is 16 bytes aligned.

However, in Spectrum-2 the blocks are not even byte aligned, which makes
it difficult to encode them using current method.

Instead, first encode each block and then encode the block in the
general blocks layout.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
7050f439ef mlxsw: reg: Add Policy-Engine General Configuration Register
The PGCR register configures general Policy-Engine settings.

Specifically, we are going to use it in order to set the default action
base pointer, which determines where the default action (when there is
no hit) is located for each region.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
f1c7d9cce2 mlxsw: reg: Add Policy-Engine Region eRP Register
The PERERP register configures the region eRPs. It can be used, for
example, to enable lookup in the C-TCAM in addition to the A-TCAM.

To be able to perform a lookup in the C-TCAM we need to "use" the eRP
table. This is done by marking the pointer as valid, but zeroing the eRP
table vector.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:14 +09:00
Ido Schimmel
481662a8a3 mlxsw: reg: Add Policy-Engine Region Configuration Register
The PERCR register configures the region parameters such as whether to
consult the bloom filter before performing a lookup using a specific
eRP.

For C-TCAM only usage we don't need to accurately set the master mask.
Instead, we can set all of its bits to make sure all the extracted keys
are actually used.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
3390787b61 mlxsw: reg: Add Policy-Engine Region Association Register
The PERAR register is used to associate a hw region for region_id's.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
0f27e80aea mlxsw: acl: Introduce activity get operation for action block/set
In Spectrum-2, activity cannot be find out by TCAM rule (PTCEv2 register),
but rather by associated action set. For that purpose, extend action ops
to allow query activity from PEFA register. Block activity is decided
according to activity of the first set.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
2d186ed4dd mlxsw: reg: Add support for activity information from PEFA register
In Spectrum-2, the PEFA register is extend to report if the action set
was hit during processing of packets. Introduce this extension and
adjust the code around this accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
dcdf01028e mlxsw: spectrum: Introduce flex key blocks for Spectrum-2
Introduce key blocks for Spectrum-2 that contains the same elements used
already for Spectrum1. Along with that, introduce encoder stub.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
d55ece4b6e mlxsw: spectrum: Add Spectrum-2 variant of flex actions ops
In Spectrum-2, no action set is stored directly in TCAM, all are located
in KVD linear. So ask core to treat the first set as dummy empty one,
to be just used for PTCEV2 purposes.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
18ce0e4e66 mlxsw: spectrum_mr_tcam: Add Spectrum-2 stubs
Add dummy ops for now. The ops are going to be implemented later on.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
742f75a600 mlxsw: spectrum: Add KVDL manager implementation for Spectrum-2
In Spectrum-2, KVD linear indexes are hashed into KVD hash. Therefore it
is possible for multiple resource types to use same indexes. There are
multiple index spaces. Also, the index space is bigger than the actual
KVD hash area, which allows to have holes in the index space without any
penalization. The HW has to be told in case the index for particular
resource type is no longer used so it can be freed from KVD hash. IEDR
register is used for that.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Jiri Pirko
c33d0cb192 mlxsw: reg: Add Infrastructure Entry Delete Register
The IEDR register is used for deleting entries from the entry tables.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-19 02:13:13 +09:00
Saeed Mahameed
432e629e56 net/mlx4_en: Don't reuse RX page when XDP is set
When a new rx packet arrives, the rx path will decide whether to reuse
the remainder of the page or not according to one of the below conditions:
1. frag_info->frag_stride == PAGE_SIZE / 2
2. frags->page_offset + frag_info->frag_size > PAGE_SIZE;

The first condition is no met for when XDP is set.
For XDP, page_offset is always set to priv->rx_headroom which is
XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some
padding, still the 2nd release condition will hold since
XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not
be released and will be _wrongly_ reused for next free rx descriptor.

In XDP there is an assumption to have a page per packet and reuse can
break such assumption and might cause packet data corruptions.

Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd
case to avoid page reuse when XDP is set, since rx_headroom is set to 0
for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup.

No additional cache line is required for the new condition.

Fixes: 34db548bfb ("mlx4: add page recycling in receive path")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: Martin KaFai Lau <kafai@fb.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 14:05:25 -07:00
Jiri Pirko
1222d15a01 mlxsw: spectrum: Expose counters for various packet sizes
Expose counters ASIC has in the group of RFC 2819 counters that count
number of packets within specific size range.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 14:04:42 -07:00
Boris Pismenny
b3ccf97813 net/mlx5e: IPsec, fix byte count in CQE
This patch fixes the byte count indication in CQE for processed IPsec
packets that contain a metadata header.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:28 -07:00
Boris Pismenny
10e71acca2 net/mlx5: Accel, add common metadata functions
This patch adds common functions to handle mellanox metadata headers.
These functions are used by IPsec and TLS to process FPGA metadata.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
790af90c00 net/mlx5e: TLS, build TLS netdev from capabilities
This patch enables TLS Rx based on available HW capabilities.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
afd3baaa93 net/mlx5e: TLS, add software statistics
This patch adds software statistics for TLS to count important
events.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
00aebab27c net/mlx5e: TLS, add Innova TLS rx data path
Implement the TLS rx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.

When hardware loses synchronization a special resync request
metadata message is used to request resync.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
ca942c78f3 net/mlx5e: TLS, add innova rx support
Add the mlx5 implementation of the TLS Rx routines to add/del TLS
contexts, also add the tls_dev_resync_rx routine
to work with the TLS inline Rx crypto offload infrastructure.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
ab412e1dd7 net/mlx5: Accel, add TLS rx offload routines
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Complete the implementation for Innova TLS (FPGA-based) hardware by
adding support for rx inline crypto offload.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
0aadb2fc09 net/mlx5e: TLS, refactor variable names
For symmetry, we rename mlx5e_tls_offload_context to
mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:13:11 -07:00
Boris Pismenny
d80a1b9d18 tls: Refactor tls_offload variable names
For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-16 00:12:09 -07:00
David S. Miller
2aa4a3378a Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-07-15

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Various different arm32 JIT improvements in order to optimize code emission
   and make the JIT code itself more robust, from Russell.

2) Support simultaneous driver and offloaded XDP in order to allow for advanced
   use-cases where some work is offloaded to the NIC and some to the host. Also
   add ability for bpftool to load programs and maps beyond just the cgroup case,
   from Jakub.

3) Add BPF JIT support in nfp for multiplication as well as division. For the
   latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.

4) Add BTF pretty print functionality to bpftool in plain and JSON output
   format, from Okash.

5) Add build and installation to the BPF helper man page into bpftool, from Quentin.

6) Add a TCP BPF callback for listening sockets which is triggered right after
   the socket transitions to TCP_LISTEN state, from Andrey.

7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
   tree and prints all attached programs, from Roman.

8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
   packets, from Jesper.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 18:47:44 -07:00
Ido Schimmel
c3a495409a mlxsw: spectrum_router: Optimize processing of VRRP MACs
Hosts using a VRRP router send their packets with a destination MAC of
the VRRP router which is of the following form [1]:

IPv4 - 00-00-5E-00-01-{VRID}
IPv6 - 00-00-5E-00-02-{VRID}

Where VRID is the ID of the virtual router. Such packets are directed to
the router block in the ASIC by an FDB entry that was added in the
previous patch.

However, in certain cases it is possible to skip this FDB lookup and
send such packets directly to the router. This is accomplished by adding
these special MAC addresses to the RIF cache. If the cache is hit, the
packet will skip the L2 lookup and ingress the router with the RIF
specified in the cache entry.

1. https://tools.ietf.org/html/rfc5798#section-7.3

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 11:23:26 -07:00
Ido Schimmel
11566d34f8 mlxsw: spectrum: Add VRRP traps
Virtual Router Redundancy Protocol packets are used to communicate the
state of the Master router associated with the virtual router ID (VRID).

These are link-local multicast packets sent with IP protocol 112 that
are trapped in the router block in the ASIC.

Add a trap for these packets and mark the trapped packets to prevent
them from potentially being re-flooded by the bridge driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 11:23:26 -07:00
Ido Schimmel
2db9937804 mlxsw: spectrum_router: Direct macvlans' MACs to router
An IP packet received on a netdev with a macvlan upper whose MAC matches
the packet's destination MAC will be re-injected to the Rx path as if it
was received by the macvlan, and perform an L3 lookup.

Reflect this functionality to the ASIC by programming FDB entries that
will direct MACs of macvlan uppers to the router.

In a similar fashion to router interfaces (RIFs) that are programmed
upon the addition of the first IP address on an interface and destroyed
upon the removal of the last IP address, the FDB entries for the macvlan
are added and destroyed based on the addition of the first and removal
of the last IP address on the macvlan.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 11:23:26 -07:00
Ido Schimmel
c55161852f mlxsw: spectrum: Enable macvlan upper devices
In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
be directed to the router we need to enable macvlan uppers on top of
mlxsw netdevs.

Allow macvlan upper devices on top of mlxsw netdevs and sanitize
configurations that can't work. For example, a macvlan can't be enslaved
to a bridge as without ACLs the device doesn't take the destination MAC
into account when classifying a packet to a bridge instance (i.e., a
FID).

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-14 11:23:25 -07:00
Jakub Kicinski
6b86758973 xdp: don't make drivers report attachment mode
prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw).  Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports.  Remove
prog_attached member.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-07-13 20:26:35 +02:00
Alex Vesker
3c641ba4a8 net/mlx4_core: Use devlink region_snapshot parameter
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:37:13 -07:00
Alex Vesker
bedc989b0c net/mlx4_core: Add Crdump FW snapshot support
Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.

The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:37:13 -07:00
Alex Vesker
523f9eb1ef net/mlx4_core: Add health buffer address capability
Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-12 17:37:13 -07:00
Petr Machata
b5de82f3df mlxsw: spectrum_span: Change LAG lower selection
When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is.

So instead of checking upness of the device, check carrier state and
txability.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11 23:10:19 -07:00
David S. Miller
e32f55f373 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09

This patch series is meant to allow support for the L2 forward offload, aka
MACVLAN offload without the need for using ndo_select_queue.

The existing solution currently requires that we use ndo_select_queue in
the transmit path if we want to associate specific Tx queues with a given
MACVLAN interface. In order to get away from this we need to repurpose the
tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
a means of accessing the queues on the lower device. As a result we cannot
offload a device that is configured as multiqueue, however it doesn't
really make sense to configure a macvlan interfaced as being multiqueue
anyway since it doesn't really have a qdisc of its own in the first place.

The big changes in this set are:
  Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
  Disable XPS for single queue devices
  Replace accel_priv with sb_dev in ndo_select_queue
  Add sb_dev parameter to fallback function for ndo_select_queue
  Consolidated ndo_select_queue functions that appeared to be duplicates
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-11 23:03:32 -07:00
Jann Horn
31e33a5b41 net/mlx5: fix uaccess beyond "count" in debugfs read/write handlers
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.

In this case, the affected files are in debugfs (and should therefore only
be accessible to root) and check that *pos is zero (which prevents the
sys_splice() trick). Therefore, this is not a security fix, but rather a
small cleanup.

For the read handlers, fix it by using simple_read_from_buffer() instead of
custom logic.
For the write handler, add a check.

changed in v2:
 - also fix dbg_write()

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-07-11 12:08:57 -07:00
Jiri Pirko
a8b9f232ec mlxsw: resources: Add couple of Spectrum-2 KVD resources
These resources are needed for Spectrum-2 KVD linear management
implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:18 -07:00
Jiri Pirko
abfd61825b mlxsw: spectrum: Prepare for multiple FW versions for Spectrum and Spectrum-2
Prepare for Spectrum-2 FW version checking and
make mlxsw_sp_fw_rev_validate() per-ASIC as well as required FW revision
and FW filename.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
ea8b2e28aa mlxsw: spectrum_acl: Implement priority setting for rules inserted to TCAM
For Spectrum-2, we need to insert priority to C-TCAM because HW
needs that info in order to correctly process scenarios where rules
are in both C-TCAM and A-TCAM.

So extend the mlxsw_sp_acl_ctcam_entry_add() args to accept indication
if priority needs to be filled up and implement the priority
computation and fill-up.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
42df8358c3 mlxsw: reg: Add priority field for PTCEV2 register
This is going to be needed for Spectrum-2 C-TCAM implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
a5995cc801 mlxsw: spectrum_acl: Move block items encoding into Spectrum op
Since Spectrum-2 encodes blocks into different HW layout, push this
code into Spectrum-specific op.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
c17d20838e mlxsw: spectrum_acl: Convert mlxsw_afk_create args to ops
Since the flex keys for Spectrum-2 differ not only in blocks definitions
but also in encoding layout, prepare for the implementation and pass
Spectrum/Spectrum-2 specific ops down to mlxsw_afk_create.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
bab5c1cfb7 mlxsw: spectrum_acl: Add tcam init/fini ops
Add ops to be called on driver instance init and fini.
This is needed in order to be possible to do Spectrum-2 specific init
and fini work.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
64eccd0066 mlxsw: spectrum_acl: Split TCAM handling 3 ways
To allow easy and clean Spectrum-2 implementation for things that differ
from Spectrum, split the existing ACL TCAM code 3 ways:
1) common code that calls Spectrum/Spectrum-2 specific ops
2) Spectrum ops implementations
3) common C-TCAM code that is going to be shared between Spectrum and
   Spectrum-2 implementations

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
8fae4392d4 mlxsw: spectrum_mr_tcam: Push Spectrum-specific operations into a separate file
Since Spectrum-2 has different handling of TCAM, push Spectrum MR TCAM
bits to a separate file accessible by ops which allows to implement
Spectrum-2 specific ops.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:17 -07:00
Jiri Pirko
0304c00546 mlxsw: spectrum_kvdl: Pass entry_count to free function
For the Spectrum-2 KVD linear manager implementation, entry_count will be
needed even for the free function. So pass it down.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:16 -07:00
Jiri Pirko
4b6b18692a mlxsw: spectrum_kvdl: Pass entry type to alloc/free
Future Spectrum-2 KVD linear manager implementation needs to know type
of the entry to alloc and free. So define the types in an enum and
pass it down to alloc and free functions. Once the entry type
is passed down, KVDL common part knows sizes of each entry types,
so replace size function arg with entry count.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:16 -07:00
Jiri Pirko
ebcff74386 mlxsw: spectrum_kvdl: Push out KVD linear management into ops
In Spectrum-2 there is a different implementation of KVD linear
management. Unlike in Spectrum where there is a single index space,
in Spectrum-2 the indexes are per-resource. Also there is need to
explicitly tell HW that an entry is no longer used.
So push out the existing implementation into spectrum1_kvdl.c and
prepare ops infrastructure to allow new implementation in a follow-up.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:24:16 -07:00
Kees Cook
eec4edc9ee net/mlx5: Use 2-factor allocator calls
This restores the use of 2-factor allocation helpers that were already
fixed treewide. Please do not use open-coded multiplication; prefer,
instead, using 2-factor allocation helpers.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09 16:00:07 -07:00
Alexander Duyck
8ec56fc3c5 net: allow fallback function to pass netdev
For most of these calls we can just pass NULL through to the fallback
function as the sb_dev. The only cases where we cannot are the cases where
we might be dealing with either an upper device or a driver that would
have configured things to support an sb_dev itself.

The only driver that has any significant change in this patch set should be
ixgbe as we can drop the redundant functionality that existed in both the
ndo_select_queue function and the fallback function that was passed through
to us.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09 13:57:25 -07:00
Alexander Duyck
4f49dec907 net: allow ndo_select_queue to pass netdev
This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09 13:41:34 -07:00
Jiri Pirko
0317a6f4eb mlxsw: core_acl_flex_actions: Fix helper to get the first KVD linear index
The helper should return always KVD linear index of the second set.
It is unused now, but going to be used soon.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
5b9488fd5f mlxsw: core_acl_flex_actions: Allow the first set to be dummy
In Spectrum-2, the real action sets are always in KVD linear. The first
set is always empty and contains only pointer to the first real set in
KVD linear. So provide possibility to specify the first set is the dummy
one.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
9dbab6f588 mlxsw: spectrum: Put pointer to flex action ops to mlxsw_sp
Spectrum-2 need a slightly different handling of flexible actions. So
put an ops pointer in mlxsw_sp struct and rename it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
82b63bcf8c mlxsw: core_acl_flex_keys: Change SRC_SYS_PORT flex key element size
The SRC_SYS_PORT is passed as 8 bit value down to hw anyway, so cap it
in the driver as well. Also, in Spectrum-2 the FW iface for SRC_SYS_PORT
is only 8 bits, so prepare for it.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
c43ea06dbd mlxsw: core_acl_flex_keys: Split MAC and IP address flex key elements
Since in Spectrum-2, MACs are split and IP addresses are split as well,
in order to use the same elements for Spectrum and Spectrum-2 split them
now.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
2139469b04 mlxsw: spectrum_acl: Ignore always-zeroed bits in tp->prio
The lowest 16 bits of tp->prio are always zero, so ignore them with a
shift.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
45e0620d5e mlxsw: reg: Introduce Flex2 key type for PTAR register
Introduce Flex2 key type for PTAR register which is used in Spectrum-2.
Also, extend mlxsw_reg_ptar_pack() to set the value according to the
caller.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Jiri Pirko
d4b0d20fec mlxsw: spectrum: Change name of mlxsw_sp_afk_blocks to mlxsw_sp1_afk_blocks
This is specific for Spectrum as Spectrum-2 has completely different key
blocks.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-08 17:05:19 +09:00
Arnd Bergmann
be9c64b19b mlxsw: spectrum_router: avoid uninitialized variable access
When CONFIG_BRIDGE_VLAN_FILTERING is disabled, gcc correctly points out
that the 'vid' variable is uninitialized whenever br_vlan_get_pvid
returns an error:

drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c: In function 'mlxsw_sp_rif_vlan_fid_get':
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:6881:6: error: 'vid' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the condition check to always return -EINVAL here,
which I guess is what the author intended here.

Fixes: e6f1960ae6 ("mlxsw: spectrum_router: Allocate FID according to PVID")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-07 20:06:08 +09:00
Moshe Shemesh
dfb3c0821a mlx4: Add support for devlink reload and load driverinit values
Add mlx4_devlink_reload() to support devlink reload operation.
Add mlx4_devlink_param_load_driverinit_values() to load values which
were set using driverinit configuration mode.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05 19:58:35 +09:00
Moshe Shemesh
bd1b51dc66 mlx4: Add mlx4 initial parameters table and register it
Create initial parameters table for mlx4.
The table consists of two generic parameters and two driver-specific
parameters.
Generic:
  internal_err_reset - Enable reset device on internal errors. This
  parameter can be configured on mlx4 either on runtime or during driver
  initialization.
  max_macs - Max number of MACs per ETH port. For mlx4 this parameter
  value range is between 1 and 128. This parameter can be configured on
  mlx4 only during driver initialization.
Driver specific:
  enable_64b_cqe_eqe - Enable 64 byte CQEs/EQEs when the FW supports it.
  This parameter can be configured on mlx4 only during driver
  initialization.
  enable_4k_uar - Enable using 4K UAR. This parameter can be configured on
  mlx4 only during driver initialization.

Register the parameters table on mlx4_init_one() and unregister on
mlx4_remove_one().

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05 19:58:35 +09:00
Jason Gunthorpe
8193abb6a8 Merge branch 'mlx5-dump-fill-mkey' into rdma.git for-next
For dependencies, branch based on 'mellanox/mlx5-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git

Pull Dump and fill MKEY from Leon Romanovsky:

====================
MLX5 IB HCA offers the memory key, dump_fill_mkey to increase performance,
when used in a send or receive operations.

It is used to force local HCA operations to skip the PCI bus access, while
keeping track of the processed length in the ibv_sge handling.

In this three patch series, we expose various bits in our HW spec
file (mlx5_ifc.h), move unneeded for mlx5_core FW command and export such
memory key to user space thought our mlx5-abi header file.
====================

Botched auto-merge in mlx5_ib_alloc_ucontext() resolved by hand.

* branch 'mlx5-dump-fill-mkey':
  IB/mlx5: Expose dump and fill memory key
  net/mlx5: Add hardware definitions for dump_fill_mkey
  net/mlx5: Limit scope of dump_fill_mkey function
  net/mlx5: Rate limit errors in command interface

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-07-04 13:23:46 -06:00
Yonatan Cohen
4d4fb5dc98 net/mlx5: Limit scope of dump_fill_mkey function
mlx5_core_dump_fill_mkey() is going to be used in next
patch in IB and doesn't need to be visible to whole
mlx5_core. Move that command to mlx5_ib.

Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-07-04 21:51:07 +03:00
David Ahern
33bd5ac54d net/ipv6: Revert attempt to simplify route replace and append
NetworkManager likes to manage linklocal prefix routes and does so with
the NLM_F_APPEND flag, breaking attempts to simplify the IPv6 route
code and by extension enable multipath routes with device only nexthops.

Revert f34436a430 and these followup patches:
6eba08c362 ("ipv6: Only emit append events for appended routes").
ce45bded64 ("mlxsw: spectrum_router: Align with new route replace logic")
53b562df8c ("mlxsw: spectrum_router: Allow appending to dev-only routes")

Update the fib_tests cases to reflect the old behavior.

Fixes: f34436a430 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-04 15:22:13 +09:00
David S. Miller
5cd3da4ba2 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net
Simple overlapping changes in stmmac driver.

Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-03 10:29:26 +09:00
Saeed Mahameed
ed56c5193a net/mlx5e: Update NIC HW stats on demand only
Disable periodic stats update background thread and update stats in
background on demand when ndo_get_stats is called.

Having a background thread running in the driver all the time is bad for
power consumption and normally a user space daemon will query the stats
once every specific interval, so ideally the background thread and its
interval can be done in user space..

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
2018-06-28 14:44:38 -07:00
Tariq Toukan
2ad9ecdbe7 net/mlx5e: Add counter for total num of NOP operations
A per-ring counter for NOP operations already exists.
Here I add a counter that sums them up.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:20 -07:00
Tariq Toukan
b71ba6b46f net/mlx5e: Add counter for MPWQE filler strides
Add ethtool counter to indicate the number of strides consumed
by filler CQEs.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:19 -07:00
Tariq Toukan
a1bf74dc6e net/mlx5e: Add channel events counter
Add per-channel and global ethtool counters for channel events.
Each event indicates an interrupt on one of the channel's
completion queues.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:19 -07:00
Tariq Toukan
dc983f0e2b net/mlx5e: Add a counter for congested UMRs
Add per-ring and global ethtool counters for congested UMR requests.
These events indicate congestion in UMR handlers in HW.

Such event is concluded when there's an outstanding UMR post,
yet the SW consumed at least two additional MPWQEs in the meanwhile.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:18 -07:00
Tariq Toukan
2d7103c800 net/mlx5e: Add NAPI statistics
Add per-channel and global ethtool counters for NAPI.
This helps us monitor and analyze performance in general.

- ch[i]_poll:
  the number of times the channel's NAPI poll was invoked.

- ch[i]_arm:
  the number of times the channel's NAPI poll completed
  and armed the completion queues.

- ch[i]_aff_change:
  the number of times the channel's NAPI poll explicitly
  stopped execution on a cpu due to a change in affinity.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:17 -07:00
Tariq Toukan
cbe73aaeec net/mlx5e: Add XDP_TX completions statistics
Add per-ring and global ethtool counters for XDP_TX completions.
This helps us monitor and analyze XDP_TX flow performance.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:17 -07:00
Tariq Toukan
8615565696 net/mlx5e: Add TX completions statistics
Add per-ring and global ethtool counters for TX completions.
This helps us monitor and analyze TX flow performance.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:16 -07:00
Tariq Toukan
c400028371 net/mlx5e: RX, Use existing WQ local variable
Local variable 'wq' already points to &sq->wq, use it.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:16 -07:00
Tariq Toukan
ca11b79899 net/mlx5e: Convert large order kzalloc allocations to kvzalloc
Replace calls to kzalloc_node with kvzalloc_node, as it fallsback
to lower-order pages if the higher-order trials fail.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:15 -07:00
Boris Pismenny
bc5a7ccd9a net/mlx5e: Add UDP GSO remaining counter
This patch adds a counter for tx UDP GSO packets that contain a segment
that is not aligned to MSS - remaining segment.

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:14 -07:00
Boris Pismenny
689adf0d48 net/mlx5e: Add UDP GSO support
This patch enables UDP GSO support. We enable this by using two WQEs
the first is a UDP LSO WQE for all segments with equal length, and the
second is for the last segment in case it has different length.
Due to HW limitation, before sending, we must adjust the packet length fields.

We measure performance between two Intel(R) Xeon(R) CPU E5-2643 v2 @3.50GHz
machines connected back-to-back with Connectx4-Lx (40Gbps) NICs.
We compare single stream UDP, UDP GSO and UDP GSO with offload.
Performance:
		| MSS (bytes)	| Throughput (Gbps)	| CPU utilization (%)
UDP GSO offload	| 1472		| 35.6			| 8%
UDP GSO 	| 1472		| 25.5			| 17%
UDP 		| 1472		| 10.2			| 17%
UDP GSO offload	| 1024		| 35.6			| 8%
UDP GSO		| 1024		| 19.2			| 17%
UDP 		| 1024		| 5.7			| 17%
UDP GSO offload	| 512		| 33.8			| 16%
UDP GSO		| 512		| 10.4			| 17%
UDP 		| 512		| 3.5			| 17%

Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-28 14:44:14 -07:00
David S. Miller
04c6faa175 mlx5-fixes-2018-06-26
Fixes for mlx5 core and netdev driver:
 
 Two fixes from Alex Vesker to address command interface issues
  - Race in command interface polling mode
  - Incorrect raw command length parsing
 
 From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.
 
 From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
 Eswitch capability is not advertised for the PF host driver
     - Fix required capability for manipulating MPFS
     - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
     - Avoid dealing with vport IB/eth representors if not being e-switch manager
     - E-Switch, Avoid setup attempt if not being e-switch manager
     - Don't attempt to dereference the ppriv struct if not being eswitch manager
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbMtMEAAoJEEg/ir3gV/o+NjAIAMGrerpwg8ADBj+b9tSWm4WV
 2yAJ561kBObwhA+uDJtH7mGUO3+AnkcWz9vynGqFdmkOikUcbpPkBb9D+rmFbkX2
 E585pwR3pH7lEzYEG4xO6SwuQcQ4OytFNxz94AT6CgNEXqrmbrD7A5Vsgk265yZq
 pJzL1OVfkXKOtb2x5PpCOh19/28OxAzyMQfoklsE2Wn7j8/2RWX0UUDxuF8jS+He
 9loaurT4Fsfo5JYE+o+k38knHFBkdTUZBD9/bZrtaMcrD68bZdJTpZm6eYwRXW3S
 7J88SmH/xTy74f1KY4qf0JOTxnaWtm/r4YaCXf1QD05W2/U9FQpIW1ipMKH51vk=
 =te2H
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2018-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-fixes-2018-06-26

Fixes for mlx5 core and netdev driver:

Two fixes from Alex Vesker to address command interface issues
 - Race in command interface polling mode
 - Incorrect raw command length parsing

From Shay Agroskin, Fix wrong size allocation for QoS ETC TC regitster.

From Or Gerlitz and Eli Cohin, Address backward compatability issues for when
Eswitch capability is not advertised for the PF host driver
    - Fix required capability for manipulating MPFS
    - E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
    - Avoid dealing with vport IB/eth representors if not being e-switch manager
    - E-Switch, Avoid setup attempt if not being e-switch manager
    - Don't attempt to dereference the ppriv struct if not being eswitch manager
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 16:21:35 +09:00
Leon Romanovsky
b30408d7a3 net/mlx5: Rate limit errors in command interface
Any error status returned by FW will trigger a print similar to the
following error message in the dmesg.

[   55.884355] mlx5_core 0000:00:04.0: mlx5_cmd_check:712:(pid 555):
ALLOC_UAR(0x802) op_mod(0x0) failed, status limits exceeded(0x8),
syndrome (0x0)

Those prints are extremely valuable to diagnose issues with running system
and it is important to keep them. However, not-so-careful user can trigger
endless number of such prints by depleting HW resources and will spam
dmesg.

Rate limiting of such messages solves this issue.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-27 08:43:55 +03:00
Alex Vesker
d412c31dae net/mlx5: Fix command interface race in polling mode
The command interface can work in two modes: Events and Polling.
In the general case, each time we invoke a command, a work is
queued to handle it.

When working in events, the interrupt handler completes the
command execution. On the other hand, when working in polling
mode, the work itself completes it.

Due to a bug in the work handler, a command could have been
completed by the interrupt handler, while the work handler
hasn't finished yet, causing the it to complete once again
if the command interface mode was changed from Events to
polling after the interrupt handler was called.

mlx5_unload_one()
        mlx5_stop_eqs()
                // Destroy the EQ before cmd EQ
                ...cmd_work_handler()
                        write_doorbell()
                        --> EVENT_TYPE_CMD
                                mlx5_cmd_comp_handler() // First free
                                        free_ent(cmd, ent->idx)
                                        complete(&ent->done)

        <-- mlx5_stop_eqs //cmd was complete
                // move to polling before destroying the last cmd EQ
                mlx5_cmd_use_polling()
                        cmd->mode = POLL;

                --> cmd_work_handler (continues)
                        if (cmd->mode == POLL)
                                mlx5_cmd_comp_handler() // Double free

The solution is to store the cmd->mode before writing the doorbell.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:34 -07:00
Alex Vesker
603b7bcff8 net/mlx5: Fix incorrect raw command length parsing
The NULL character was not set correctly for the string containing
the command length, this caused failures reading the output of the
command due to a random length. The fix is to initialize the output
length string.

Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:34 -07:00
Shay Agroskin
d14fcb8d87 net/mlx5: Fix wrong size allocation for QoS ETC TC regitster
The driver allocates wrong size (due to wrong struct name) when issuing
a query/set request to NIC's register.

Fixes: d8880795da ("net/mlx5e: Implement DCBNL IEEE max rate")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:33 -07:00
Eli Cohen
f811980444 net/mlx5: Fix required capability for manipulating MPFS
Manipulating of the MPFS requires eswitch manager capabilities.

Fixes: eeb66cdb68 ('net/mlx5: Separate between E-Switch and MPFS')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:32 -07:00
Eli Cohen
a8d70a054a net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager
In smartnic env, if the host (PF) driver is not an e-switch manager, we
are not allowed to apply eswitch ports setups such as vlan (VST),
spoof-checks, min/max rate or state.

Make sure we are eswitch manager when coming to issue these callbacks
and err otherwise.

Also fix the definition of ESW_ALLOWED to rely on eswitch_manager
capability and on the vport_group_manger.

Operations on the VF nic vport context, such as setting a mac or reading
the vport counters are allowed to the PF in this scheme.

The modify nic vport guid code was modified to omit checking the
nic_vport_node_guid_modify eswitch capability.
The reason for doing so is that modifying node guid requires vport group
manager capability, and there's no need to check further capabilities.

1. set_vf_vlan     - disallowed
2. set_vf_spoofchk - disallowed
3. set_vf_mac      - allowed
4. get_vf_config   - allowed
5. set_vf_trust    - disallowed
6. set_vf_rate     - disallowed
7. get_vf_stat     - allowed
8. set_vf_link_state - disallowed

Fixes: f942380c12 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Or Gerlitz <ogerlitz@mellanox.com>
2018-06-26 15:26:31 -07:00
Or Gerlitz
733d3e5497 net/mlx5e: Avoid dealing with vport representors if not being e-switch manager
In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the switchdev mode representors are running on
the embedded cpu (EC) and not at the host.

As such, we should avoid dealing with vport representors if
not being esw manager.

While here, make sure to disallow eswitch switchdev related
setups through devlink if we are not esw managers.

Fixes: cb67b83292 ('net/mlx5e: Introduce SRIOV VF representors')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:30 -07:00
Or Gerlitz
0efc856249 net/mlx5: E-Switch, Avoid setup attempt if not being e-switch manager
In smartnic env, the host (PF) driver might not be an e-switch
manager, hence the FW will err on driver attempts to deal with
setting/unsetting the eswitch and as a result the overall setup
of sriov will fail.

Fix that by avoiding the operation if e-switch management is not
allowed for this driver instance. While here, move to use the
correct name for the esw manager capability name.

Fixes: 81848731ff ('net/mlx5: E-Switch, Add SR-IOV (FDB) support')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Guy Kushnir <guyk@mellanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Tested-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:29 -07:00
Or Gerlitz
8ffd569aaa net/mlx5e: Don't attempt to dereference the ppriv struct if not being eswitch manager
The check for cpu hit statistics was not returning immediate false for
any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if
user-space tool was calling into any possible netdev in the system.

Fix that by doing a proper check before dereferencing.

Fixes: 1d447a3914 ('net/mlx5e: Extendable vport representor netdev private data')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Eli Cohen <eli@melloanox.com>
Reviewed-by: Eli Cohen <eli@melloanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-26 15:26:28 -07:00
John Hurley
326367427c net: sched: call reoffload op on block callback reg
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a
block when a callback tries to register to a block that already has
offloaded rules. If all existing rules cannot be offloaded then the
registration is rejected. This replaces the previous policy of rejecting
such callback registration outright.

On unregistration of a callback, the rules are flushed for that given cb.
The implementation of block sharing in the NFP driver, for example,
duplicates shared rules to all devs bound to a block. This meant that
rules could still exist in hw even after a device is unbound from a block
(assuming the block still remains active).

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:33 +09:00
John Hurley
60513bd82c net: sched: pass extack pointer to block binds and cb registration
Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:32 +09:00
Petr Machata
567ad1a2bb mlxsw: spectrum_switchdev: Ban PVID change if bridge has a RIF
When traffic passes through a router port, it needs to be assigned a FID
for ASIC to forward correctly. For bridges, this FID used to be the one
corresponding to VLAN 1. In a previous patch, this was changed to
instead use the PVID at the time that the RIF is created. This patch
guards PVID changes after the RIF was introduced.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 18:05:22 +09:00
Petr Machata
a28b1ebef7 mlxsw: spectrum_router: Add mlxsw_sp_rif_fid()
In order to allow querying of the VID for which a RIF was created, add
a new function that returns a FID for a given RIF.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 18:05:22 +09:00
Petr Machata
0c41292bc5 mlxsw: spectrum_router: Publish mlxsw_sp_rif_find_by_dev()
In order to guard against removal of a PVID for which a FID was
allocated, spectrum_switchdev needs to first determine whether there is
a RIF associated with a given bridge. To that end, publish a preexisting
function mlxsw_sp_rif_find_by_dev().

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 18:05:22 +09:00
Petr Machata
e6f1960ae6 mlxsw: spectrum_router: Allocate FID according to PVID
For bridge netdevices, instead of assuming that the router traffic is on
VLAN 1, look at the bridge PVID.

This patch assumes that the PVID doesn't change after the router
interface is created (i.e. after the IP address is assigned).

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 18:05:21 +09:00
Petr Machata
5f15e257e1 mlxsw: spectrum_router: Propagate extack to .fid_get()
In the follow-up patch, mlxsw_sp_rif_vlan_fid_get() will be changed in a
way that could fail. Give that function a possibility to explain the
failure through extack.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 18:05:21 +09:00
Geert Uytterhoeven
e020797b7d net: Remove depends on HAS_DMA in case of platform dependency
Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
In most cases this other symbol is an architecture or platform specific
symbol, or PCI.

Generic symbols and drivers without platform dependencies keep their
dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
cannot work anyway.

This simplifies the dependencies, and allows to improve compile-testing.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:44:30 +09:00
Yishai Hadas
e662e14d80 IB/mlx5: Add DEVX support for modify and query commands
Add support in DEVX for modify and query commands, the required lock is
taken (i.e. READ/WRITE) by the KABI infrastructure accordingly.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-19 10:53:02 -06:00
Yishai Hadas
38b7ca927d net/mlx5: Expose DEVX specification
This patch updates the mlx5_ifc structures and
command interface to support DEVX.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-19 19:31:31 +03:00
Yishai Hadas
fc6c391a7a net/mlx5: Prevent warns in dmesg upon firmware commands
When DEVX is used application builds by itself the command mail box,
this patch prevents warns upon firmware commands as of invalid user
space usage.

In addition,
A failure in destroy_mkey command was changed to be printed only under
debug mode.

This prevents a redundant warn when a memory window was used
with rereg_mr and finally was some kernel cleanup as of reset
flow/process termination.

In that case this command might temporarily fails as part of the cleanup
but finally it expects to succeed.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
2018-06-19 19:17:39 +03:00
Linus Torvalds
9215310cf1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various netfilter fixlets from Pablo and the netfilter team.

 2) Fix regression in IPVS caused by lack of PMTU exceptions on local
    routes in ipv6, from Julian Anastasov.

 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.

 4) Don't crash on poll in TLS, from Daniel Borkmann.

 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
    including Avahi mDNS. From Bart Van Assche.

 6) Missing of_node_put in qcom/emac driver, from Yue Haibing.

 7) We lack checking of the TCP checking in one special case during SYN
    receive, from Frank van der Linden.

 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.

 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.

10) Must grab HW caps before doing quirk checks in stmmac driver, from
    Jose Abreu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  net: stmmac: Run HWIF Quirks after getting HW caps
  neighbour: skip NTF_EXT_LEARNED entries during forced gc
  net: cxgb3: add error handling for sysfs_create_group
  tls: fix waitall behavior in tls_sw_recvmsg
  tls: fix use-after-free in tls_push_record
  l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
  l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  mlxsw: spectrum_router: Align with new route replace logic
  mlxsw: spectrum_router: Allow appending to dev-only routes
  ipv6: Only emit append events for appended routes
  stmmac: added support for 802.1ad vlan stripping
  cfg80211: fix rcu in cfg80211_unregister_wdev
  mac80211: Move up init of TXQs
  mac80211_hwsim: fix module init error paths
  cfg80211: initialize sinfo in cfg80211_get_station
  nl80211: fix some kernel doc tag mistakes
  hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
  rds: avoid unenecessary cong_update in loop transport
  l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
  ...
2018-06-16 07:39:34 +09:00
Petr Machata
9e25826ffc mlxsw: spectrum_switchdev: Fix port_vlan refcounting
Switchdev notifications for addition of SWITCHDEV_OBJ_ID_PORT_VLAN are
distributed not only on clean addition, but also when flags on an
existing VLAN are changed. mlxsw_sp_bridge_port_vlan_add() calls
mlxsw_sp_port_vlan_get() to get at the port_vlan in question, which
implicitly references the object. This then leads to discrepancies in
reference counting when the VLAN is removed. spectrum.c warns about the
problem when the module is removed:

[13578.493090] WARNING: CPU: 0 PID: 2454 at drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2973 mlxsw_sp_port_remove+0xfd/0x110 [mlxsw_spectrum]
[...]
[13578.627106] Call Trace:
[13578.629617]  mlxsw_sp_fini+0x2a/0xe0 [mlxsw_spectrum]
[13578.634748]  mlxsw_core_bus_device_unregister+0x3e/0x130 [mlxsw_core]
[13578.641290]  mlxsw_pci_remove+0x13/0x40 [mlxsw_pci]
[13578.646238]  pci_device_remove+0x31/0xb0
[13578.650244]  device_release_driver_internal+0x14f/0x220
[13578.655562]  driver_detach+0x32/0x70
[13578.659183]  bus_remove_driver+0x47/0xa0
[13578.663134]  pci_unregister_driver+0x1e/0x80
[13578.667486]  mlxsw_sp_module_exit+0xc/0x3fa [mlxsw_spectrum]
[13578.673207]  __x64_sys_delete_module+0x13b/0x1e0
[13578.677888]  ? exit_to_usermode_loop+0x78/0x80
[13578.682374]  do_syscall_64+0x39/0xe0
[13578.685976]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix by putting the port_vlan when mlxsw_sp_port_vlan_bridge_join()
determines it's a flag-only change.

Fixes: b3529af6bb ("spectrum: Reference count VLAN entries")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:11:16 -07:00
Ido Schimmel
ce45bded64 mlxsw: spectrum_router: Align with new route replace logic
Commit f34436a430 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route replace logic so that the
first matching route (i.e., same metric) is replaced.

Have mlxsw replace the first matching route as well.

Fixes: f34436a430 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:11:16 -07:00
Ido Schimmel
53b562df8c mlxsw: spectrum_router: Allow appending to dev-only routes
Commit f34436a430 ("net/ipv6: Simplify route replace and appending
into multipath route") changed the IPv6 route append logic so that
dev-only routes can be appended and not only gatewayed routes.

Align mlxsw with the new behaviour.

Fixes: f34436a430 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:11:16 -07:00
Kees Cook
84ca176bf5 treewide: Use array_size() in kvzalloc_node()
The kvzalloc_node() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

        kvzalloc_node(a * b, gfp, node)

with:
        kvzalloc_node(array_size(a, b), gfp, node)

as well as handling cases of:

        kvzalloc_node(a * b * c, gfp, node)

with:

        kvzalloc_node(array3_size(a, b, c), gfp, node)

This does, however, attempt to ignore constant size factors like:

        kvzalloc_node(4 * 1024, gfp, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  kvzalloc_node(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc_node(C1 * C2 * C3, ...)
|
  kvzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  kvzalloc_node(C1 * C2, ...)
|
  kvzalloc_node(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook
778e1cdd81 treewide: kvzalloc() -> kvcalloc()
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
patch replaces cases of:

        kvzalloc(a * b, gfp)

with:
        kvcalloc(a * b, gfp)

as well as handling cases of:

        kvzalloc(a * b * c, gfp)

with:

        kvzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kvcalloc(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kvzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvzalloc
+ kvcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kvzalloc(sizeof(THING) * C2, ...)
|
  kvzalloc(sizeof(TYPE) * C2, ...)
|
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(C1 * C2, ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook
590b5b7d86 treewide: kzalloc_node() -> kcalloc_node()
The kzalloc_node() function has a 2-factor argument form, kcalloc_node(). This
patch replaces cases of:

        kzalloc_node(a * b, gfp, node)

with:
        kcalloc_node(a * b, gfp, node)

as well as handling cases of:

        kzalloc_node(a * b * c, gfp, node)

with:

        kzalloc_node(array3_size(a, b, c), gfp, node)

as it's slightly less ugly than:

        kcalloc_node(array_size(a, b), c, gfp, node)

This does, however, attempt to ignore constant size factors like:

        kzalloc_node(4 * 1024, gfp, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc_node
+ kcalloc_node
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc_node(C1 * C2 * C3, ...)
|
  kzalloc_node(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc_node(sizeof(THING) * C2, ...)
|
  kzalloc_node(sizeof(TYPE) * C2, ...)
|
  kzalloc_node(C1 * C2 * C3, ...)
|
  kzalloc_node(C1 * C2, ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc_node
+ kcalloc_node
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook
6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook
6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Linus Torvalds
a1cdde8c41 4.18 Merge window pull request
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
 driver updates and bug fixes. About four new uAPI related things. Not as much
 Szykaller patches this time, the bugs it finds are getting harder to fix.
 
 - More work cleaning up the RDMA CM code
 - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
 - Driver specific resource tracking and reporting via netlink
 - Continued work for name space support from Parav
 - MPLS support for the verbs flow steering uAPI
 - A few tricky IPoIB fixes improving robustness
 - HFI1 driver support for the '16B' management packet format
 - Some auditing to not print kernel pointers via %llx or similar
 - Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
   entirely. The user space side of this was long ago replaced with RDMA-CM and
   syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
   nobody uses it.
 - Purge more bogus BUG_ON's from Leon
 - 'flow counters' verbs uAPI
 - T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
   tree due to dependencies
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
 YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
 mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
 RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
 7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
 V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
 TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
 euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
 U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
 4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
 BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
 8T6TXYi2yF+f+vZBTFlV
 =Mb8m
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a quiet cycle for RDMA, the big bulk is the usual
  smallish driver updates and bug fixes. About four new uAPI related
  things. Not as much Szykaller patches this time, the bugs it finds are
  getting harder to fix.

  Summary:

   - More work cleaning up the RDMA CM code

   - Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
     i40iw, iw_cxgb4, mlx5, rxe

   - Driver specific resource tracking and reporting via netlink

   - Continued work for name space support from Parav

   - MPLS support for the verbs flow steering uAPI

   - A few tricky IPoIB fixes improving robustness

   - HFI1 driver support for the '16B' management packet format

   - Some auditing to not print kernel pointers via %llx or similar

   - Mark the entire 'UCM' user-space interface as BROKEN with the
     intent to remove it entirely. The user space side of this was long
     ago replaced with RDMA-CM and syzkaller is finding bugs in the
     residual UCM interface nobody wishes to fix because nobody uses it.

   - Purge more bogus BUG_ON's from Leon

   - 'flow counters' verbs uAPI

   - T10 fixups for iser/isert, these are Acked by Martin but going
     through the RDMA tree due to dependencies"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
  RDMA/mlx5: Update SPDX tags to show proper license
  RDMA/restrack: Change SPDX tag to properly reflect license
  IB/hfi1: Fix comment on default hdr entry size
  IB/hfi1: Rename exp_lock to exp_mutex
  IB/hfi1: Add bypass register defines and replace blind constants
  IB/hfi1: Remove unused variable
  IB/hfi1: Ensure VL index is within bounds
  IB/hfi1: Fix user context tail allocation for DMA_RTAIL
  IB/hns: Use zeroing memory allocator instead of allocator/memset
  infiniband: fix a possible use-after-free bug
  iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
  IB/isert: use T10-PI check mask definitions from core layer
  IB/iser: use T10-PI check mask definitions from core layer
  RDMA/core: introduce check masks for T10-PI offload
  IB/isert: fix T10-pi check mask setting
  IB/mlx5: Add counters read support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add counters create and destroy support
  IB/uverbs: Add support for flow counters
  ...
2018-06-07 13:04:07 -07:00
Linus Torvalds
1c8c5a9d38 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Add Maglev hashing scheduler to IPVS, from Inju Song.

 2) Lots of new TC subsystem tests from Roman Mashak.

 3) Add TCP zero copy receive and fix delayed acks and autotuning with
    SO_RCVLOWAT, from Eric Dumazet.

 4) Add XDP_REDIRECT support to mlx5 driver, from Jesper Dangaard
    Brouer.

 5) Add ttl inherit support to vxlan, from Hangbin Liu.

 6) Properly separate ipv6 routes into their logically independant
    components. fib6_info for the routing table, and fib6_nh for sets of
    nexthops, which thus can be shared. From David Ahern.

 7) Add bpf_xdp_adjust_tail helper, which can be used to generate ICMP
    messages from XDP programs. From Nikita V. Shirokov.

 8) Lots of long overdue cleanups to the r8169 driver, from Heiner
    Kallweit.

 9) Add BTF ("BPF Type Format"), from Martin KaFai Lau.

10) Add traffic condition monitoring to iwlwifi, from Luca Coelho.

11) Plumb extack down into fib_rules, from Roopa Prabhu.

12) Add Flower classifier offload support to igb, from Vinicius Costa
    Gomes.

13) Add UDP GSO support, from Willem de Bruijn.

14) Add documentation for eBPF helpers, from Quentin Monnet.

15) Add TLS tx offload to mlx5, from Ilya Lesokhin.

16) Allow applications to be given the number of bytes available to read
    on a socket via a control message returned from recvmsg(), from
    Soheil Hassas Yeganeh.

17) Add x86_32 eBPF JIT compiler, from Wang YanQing.

18) Add AF_XDP sockets, with zerocopy support infrastructure as well.
    From Björn Töpel.

19) Remove indirect load support from all of the BPF JITs and handle
    these operations in the verifier by translating them into native BPF
    instead. From Daniel Borkmann.

20) Add GRO support to ipv6 gre tunnels, from Eran Ben Elisha.

21) Allow XDP programs to do lookups in the main kernel routing tables
    for forwarding. From David Ahern.

22) Allow drivers to store hardware state into an ELF section of kernel
    dump vmcore files, and use it in cxgb4. From Rahul Lakkireddy.

23) Various RACK and loss detection improvements in TCP, from Yuchung
    Cheng.

24) Add TCP SACK compression, from Eric Dumazet.

25) Add User Mode Helper support and basic bpfilter infrastructure, from
    Alexei Starovoitov.

26) Support ports and protocol values in RTM_GETROUTE, from Roopa
    Prabhu.

27) Support bulking in ->ndo_xdp_xmit() API, from Jesper Dangaard
    Brouer.

28) Add lots of forwarding selftests, from Petr Machata.

29) Add generic network device failover driver, from Sridhar Samudrala.

* ra.kernel.org:/pub/scm/linux/kernel/git/davem/net-next: (1959 commits)
  strparser: Add __strp_unpause and use it in ktls.
  rxrpc: Fix terminal retransmission connection ID to include the channel
  net: hns3: Optimize PF CMDQ interrupt switching process
  net: hns3: Fix for VF mailbox receiving unknown message
  net: hns3: Fix for VF mailbox cannot receiving PF response
  bnx2x: use the right constant
  Revert "net: sched: cls: Fix offloading when ingress dev is vxlan"
  net: dsa: b53: Fix for brcm tag issue in Cygnus SoC
  enic: fix UDP rss bits
  netdev-FAQ: clarify DaveM's position for stable backports
  rtnetlink: validate attributes in do_setlink()
  mlxsw: Add extack messages for port_{un, }split failures
  netdevsim: Add extack error message for devlink reload
  devlink: Add extack to reload and port_{un, }split operations
  net: metrics: add proper netlink validation
  ipmr: fix error path when ipmr_new_table fails
  ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds
  net: hns3: remove unused hclgevf_cfg_func_mta_filter
  netfilter: provide udp*_lib_lookup for nf_tproxy
  qed*: Utilize FW 8.37.2.0
  ...
2018-06-06 18:39:49 -07:00
Linus Torvalds
2857676045 - Introduce arithmetic overflow test helper functions (Rasmus)
- Use overflow helpers in 2-factor allocators (Kees, Rasmus)
 - Introduce overflow test module (Rasmus, Kees)
 - Introduce saturating size helper functions (Matthew, Kees)
 - Treewide use of struct_size() for allocators (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsYJ1gWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJlCTEACwdEeriAd2VwxknnsstojGD/3g
 8TTFA19vSu4Gxa6WiDkjGoSmIlfhXTlZo1Nlmencv16ytSvIVDNLUIB3uDxUIv1J
 2+dyHML9JpXYHHR7zLXXnGFJL0wazqjbsD3NYQgXqmun7EVVYnOsAlBZ7h/Lwiej
 jzEJd8DaHT3TA586uD3uggiFvQU0yVyvkDCDONIytmQx+BdtGdg9TYCzkBJaXuDZ
 YIthyKDvxIw5nh/UaG3L+SKo73tUr371uAWgAfqoaGQQCWe+mxnWL4HkCKsjFzZL
 u9ouxxF/n6pij3E8n6rb0i2fCzlsTDdDF+aqV1rQ4I4hVXCFPpHUZgjDPvBWbj7A
 m6AfRHVNnOgI8HGKqBGOfViV+2kCHlYeQh3pPW33dWzy/4d/uq9NIHKxE63LH+S4
 bY3oO2ela8oxRyvEgXLjqmRYGW1LB/ZU7FS6Rkx2gRzo4k8Rv+8K/KzUHfFVRX61
 jEbiPLzko0xL9D53kcEn0c+BhofK5jgeSWxItdmfuKjLTW4jWhLRlU+bcUXb6kSS
 S3G6aF+L+foSUwoq63AS8QxCuabuhreJSB+BmcGUyjthCbK/0WjXYC6W/IJiRfBa
 3ZTxBC/2vP3uq/AGRNh5YZoxHL8mSxDfn62F+2cqlJTTKR/O+KyDb1cusyvk3H04
 KCDVLYPxwQQqK1Mqig==
 =/3L8
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull overflow updates from Kees Cook:
 "This adds the new overflow checking helpers and adds them to the
  2-factor argument allocators. And this adds the saturating size
  helpers and does a treewide replacement for the struct_size() usage.
  Additionally this adds the overflow testing modules to make sure
  everything works.

  I'm still working on the treewide replacements for allocators with
  "simple" multiplied arguments:

     *alloc(a * b, ...) -> *alloc_array(a, b, ...)

  and

     *zalloc(a * b, ...) -> *calloc(a, b, ...)

  as well as the more complex cases, but that's separable from this
  portion of the series. I expect to have the rest sent before -rc1
  closes; there are a lot of messy cases to clean up.

  Summary:

   - Introduce arithmetic overflow test helper functions (Rasmus)

   - Use overflow helpers in 2-factor allocators (Kees, Rasmus)

   - Introduce overflow test module (Rasmus, Kees)

   - Introduce saturating size helper functions (Matthew, Kees)

   - Treewide use of struct_size() for allocators (Kees)"

* tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  treewide: Use struct_size() for devm_kmalloc() and friends
  treewide: Use struct_size() for vmalloc()-family
  treewide: Use struct_size() for kmalloc()-family
  device: Use overflow helpers for devm_kmalloc()
  mm: Use overflow helpers in kvmalloc()
  mm: Use overflow helpers in kmalloc_array*()
  test_overflow: Add memory allocation overflow tests
  overflow.h: Add allocation size calculation helpers
  test_overflow: Report test failures
  test_overflow: macrofy some more, do more tests for free
  lib: add runtime test of check_*_overflow functions
  compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06 17:27:14 -07:00
Kees Cook
acafe7e302 treewide: Use struct_size() for kmalloc()-family
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    void *entry[];
};

instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);

This patch makes the changes for kmalloc()-family (and kvmalloc()-family)
uses. It was done via automatic conversion with manual review for the
"CHECKME" non-standard cases noted below, using the following Coccinelle
script:

// pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len *
//                      sizeof *pkey_cache->table, GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL);
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
identifier VAR, ELEMENT;
expression COUNT;
@@

- alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP)
+ alloc(struct_size(VAR, ELEMENT, COUNT), GFP)

// Same pattern, but can't trivially locate the trailing element name,
// or variable name.
@@
identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc";
expression GFP;
expression SOMETHING, COUNT, ELEMENT;
@@

- alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP)
+ alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-06 11:15:43 -07:00
David Ahern
3fcc773be6 mlxsw: Add extack messages for port_{un, }split failures
Return messages in extack for port split/unsplit errors. e.g.,
    $ devlink port split swp1s1 count 4
    Error: mlxsw_spectrum: Port cannot be split further.
    devlink answers: Invalid argument

    $ devlink port unsplit swp4
    Error: mlxsw_spectrum: Port was not split.
    devlink answers: Invalid argument

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 12:32:38 -04:00
David Ahern
ac0fc8a1bb devlink: Add extack to reload and port_{un, }split operations
Add extack argument to reload, port_split and port_unsplit operations.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 12:32:37 -04:00
Wei Yongjun
47a6ca3f97 net/mlx5e: fix error return code in mlx5e_alloc_rq()
Fix to return error code -ENOMEM from the kvzalloc_node() error handling
case instead of 0, as done elsewhere in this function.

Fixes: 069d11465a ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 10:20:33 -04:00
Wei Yongjun
6f6027a52b net/mlx5e: Make function mlx5e_change_rep_mtu() static
Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:903:5: warning:
 symbol 'mlx5e_change_rep_mtu' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 10:19:17 -04:00
David S. Miller
7d840a6065 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-06-04 22:23:35 -04:00
Petr Machata
1fc68bb7c3 mlxsw: spectrum_span: Suppress VLAN on BRIDGE_VLAN_INFO_UNTAGGED
When offloading mirroring to gretap or ip6gretap netdevices, an 802.1q
bridge is one of the soft devices permissible in the underlay when
resolving the packet path. After the packet path is resolved to a
particular bridge egress device, flags on packet VLAN determine whether
the egressed packet should be tagged.

The current logic however only ever sets the VLAN tag, never suppresses
it. Thus if there's a VLAN netdevice above the bridge that determines
the packet VLAN, that VLAN is never unset, and mirroring is configured
with VLAN tagging.

Fix by setting the packet VLAN on both branches: set to zero (for unset)
when BRIDGE_VLAN_INFO_UNTAGGED, copy the resolved VLAN (e.g. from bridge
PVID) otherwise.

Fixes: 946a11e740 ("mlxsw: spectrum_span: Allow bridge for gretap mirror")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:27:57 -04:00
Petr Machata
f07ff01406 mlxsw: spectrum_switchdev: Postpone respin on object deletion
VLAN deletion notifications are emitted before the relevant change is
projected to bridge configuration. Thus, like with VLAN addition,
schedule SPAN respin for later.

Fixes: c520bc6986 ("mlxsw: Respin SPAN on switchdev events")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 13:27:56 -04:00
Jason Gunthorpe
0f45e69d62 Verbs flow counters support
This series comes to allow user space applications to monitor real time
 traffic activity and events of the verbs objects it manages, e.g.:
 ibv_qp, ibv_wq, ibv_flow.
 
 This API enables generic counters creation and define mapping
 to association with a verbs object, current mlx5 driver using
 this API for flow counters.
 
 With this API, an application can monitor the entire life cycle of
 object activity, defined here as a static counters attachment.
 This API also allows dynamic counters monitoring of measurement points
 for a partial period in the verbs object life cycle.
 
 In addition it presents the implementation of the generic counters interface.
 
 This will be achieved by extending flow creation by adding a new flow count
 specification type which allows the user to associate a previously created
 flow counters using the generic verbs counters interface to the created flow,
 once associated the user could read statistics by using the read function of
 the generic counters interface.
 
 The API includes:
 1. create and destroyed API of a new counters objects
 2. read the counters values from HW
 
 Note:
 Attaching API to allow application to define the measurement points per objects
 is a user space only API and this data is passed to kernel when the counted
 object (e.g. flow) is created with the counters object.
 -----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQT1m3YD37UfMCUQBNwp8NhrnBAZsQUCWxIiqQAKCRAp8NhrnBAZ
 sWJRAPYl06nEfQjRlW//ZE/pO2oKXbfEevg7nnbpe80ERlxLAQDA2LHAcU7ma/NC
 hS5yxIq1gLSA27N+5qAoFVK8vJ5ZCg==
 =EiAV
 -----END PGP SIGNATURE-----

Merge tag 'verbs_flow_counters' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git into for-next

Pull verbs counters series from Leon Romanovsky:

====================
Verbs flow counters support

This series comes to allow user space applications to monitor real time
traffic activity and events of the verbs objects it manages, e.g.: ibv_qp,
ibv_wq, ibv_flow.

The API enables generic counters creation and define mapping to
association with a verbs object, the current mlx5 driver is using this API
for flow counters.

With this API, an application can monitor the entire life cycle of object
activity, defined here as a static counters attachment.  This API also
allows dynamic counters monitoring of measurement points for a partial
period in the verbs object life cycle.

In addition it presents the implementation of the generic counters
interface.

This will be achieved by extending flow creation by adding a new flow
count specification type which allows the user to associate a previously
created flow counters using the generic verbs counters interface to the
created flow, once associated the user could read statistics by using the
read function of the generic counters interface.

The API includes:
1. create and destroyed API of a new counters objects
2. read the counters values from HW

Note:
Attaching API to allow application to define the measurement points per
objects is a user space only API and this data is passed to kernel when
the counted object (e.g. flow) is created with the counters object.
===================

* tag 'verbs_flow_counters':
  IB/mlx5: Add counters read support
  IB/mlx5: Add flow counters read support
  IB/mlx5: Add flow counters binding support
  IB/mlx5: Add counters create and destroy support
  IB/uverbs: Add support for flow counters
  IB/core: Add support for flow counters
  IB/core: Support passing uhw for create_flow
  IB/uverbs: Add read counters support
  IB/core: Introduce counters read verb
  IB/uverbs: Add create/destroy counters support
  IB/core: Introduce counters object and its create/destroy
  IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
  net/mlx5: Export flow counter related API
  net/mlx5: Use flow counter pointer as input to the query function
2018-06-04 08:48:11 -06:00
Varsha Rao
cfed0a2c98 net: ethernet: mlx4: Remove unnecessary parentheses
This patch fixes the clang warning of extraneous parentheses, with the
following coccinelle script.

@@
identifier i;
expression e;
statement s;
@@
if (
-(i == e)
+i == e
 )
s

Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Varsha Rao <rvarsha016@gmail.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-04 10:14:39 -04:00
Eric Dumazet
885892fb37 mlx4_core: restore optimal ICM memory allocation
Commit 1383cb8103 ("mlx4_core: allocate ICM memory in page size chunks")
brought two regressions caught in our regression suite.

The big one is an additional cost of 256 bytes of overhead per 4096 bytes,
or 6.25 % which is unacceptable since ICM can be pretty large.

This comes from having to allocate one struct mlx4_icm_chunk (256 bytes)
per MLX4_TABLE_CHUNK, which the buggy commit shrank to 4KB
(instead of prior 256KB)

Note that mlx4_alloc_icm() is already able to try high order allocations
and fallback to low-order allocations under high memory pressure.

Most of these allocations happen right after boot time, when we get
plenty of non fragmented memory, there is really no point being so
pessimistic and break huge pages into order-0 ones just for fun.

We only have to tweak gfp_mask a bit, to help falling back faster,
without risking OOM killings.

Second regression is an KASAN fault, that will need further investigations.

Fixes: 1383cb8103 ("mlx4_core: allocate ICM memory in page size chunks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Cc: John Sperbeck <jsperbeck@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Cc: Qing Huang <qing.huang@oracle.com>
Cc: Daniel Jurgens <danielj@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 11:01:44 -04:00
Wei Yongjun
8cb77149e8 net/mlx5: Make function mlx5_fpga_tls_send_teardown_cmd() static
Fixes the following sparse warning:

drivers/net/ethernet/mellanox/mlx5/core/fpga/tls.c:199:6: warning:
 symbol 'mlx5_fpga_tls_send_teardown_cmd' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 10:36:31 -04:00
David S. Miller
9c54aeb03a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 09:31:58 -04:00
Raed Salem
5f9bf63ae8 net/mlx5: Export flow counter related API
Exports counters API to be used in both IB and EN.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:53 +03:00
Or Gerlitz
930821e39d net/mlx5: Use flow counter pointer as input to the query function
This allows to un-expose the details of struct mlx5_fc and keep it
internal to the core driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-06-02 07:33:52 +03:00
Tariq Toukan
f65a59ffbc net/mlx5e: TX, Separate cachelines of xmit and completion stats
Avoid false sharing of cachelines by separating the cachelines of
TX stats that are dertied in xmit flow and in completion flow.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
5ffd81943d net/mlx5e: RX, Always prefer Linear SKB configuration
Prefer the linear SKB configuration of Legacy RQ over the
non-linear one of Striding RQ.

This implies that ConnectX-4 LX now uses legacy RQ by default,
as it does not support the linear configuration of Striding RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
069d11465a net/mlx5e: RX, Enhance legacy Receive Queue memory scheme
Enhance the memory scheme of the legacy RQ, such that
only order-0 pages are used.

Whenever possible, prefer using a linear SKB, and build it
wrapping the WQE buffer.

Otherwise (for example, jumbo frames on x86), use non-linear SKB,
with as many frags as needed. In this case, multiple WQE
scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.

This implied to remove support of HW LRO in legacy RQ, as it would
require large number of page allocations and scatter entries per WQE
on archs with PAGE_SIZE = 4KB, yielding bad performance.

In earlier patches, we guaranteed that all completions are in-order,
and that we use a cyclic WQ.
This creates an oppurtunity for a performance optimization:
The mapping between a "struct mlx5e_dma_info", and the
WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
across different cycles of a WQ. This allows initializing
the mapping in the time of RQ creation, and not handle it
in datapath.

A struct mlx5e_dma_info that is shared between different WQEs
is allocated by the first WQE, and freed by the last one.
This implies an important requirement: WQEs that share the same
struct mlx5e_dma_info must be posted within the same NAPI.
Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
point to the new struct mlx5e_dma_info, not the one that was posted
(and the HW wrote to).
This bulking requirement is actually good also for performance reasons,
hence we extend the bulk beyong the minimal requirement above.

With this memory scheme, the RQs memory footprint is reduce by a
factor of 2 on x86, and by a factor of 32 on PowerPC.
Same factors apply for the number of pages in a GRO session.

Performance tests:
ConnectX-4, single core, single RX ring, default MTU.

x86:
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

Packet rate (early drop in TC): no degradation
TCP streams: ~5% improvement

PowerPC:
CPU: POWER8 (raw), altivec supported

Packet rate (early drop in TC): 20% gain
TCP streams: 25% gain

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
99cbfa93a6 net/mlx5e: RX, Use cyclic WQ in legacy RQ
Now that LRO is not supported for Legacy RQ, there is no source of
out-of-order completions in the WQ, and we can use a cyclic one.
This has multiple advantages:
- reduces the WQE size (smaller PCI transactions).
- lower overhead in datapath (no handling of 'next' pointers).
- no reserved WQE for the WQ head (was need in linked-list).
- allows using a constant map between frag and dma_info struct, in downstream patch.

Performance tests:
ConnectX-4, single core, single RX ring.
Major gain in packet rate of single ring XDP drop.
Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
422d4c401e net/mlx5e: RX, Split WQ objects for different RQ types
Replace the common RQ WQ object with two separate ones for the
different RQ types.
This is in preparation for switching to using a cyclic WQ type
in Legacy RQ.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
6c3a823e1e net/mlx5e: RX, Remove HW LRO support in legacy RQ
Current LRO implementation in Legacy RQ uses high-order pages.
In downstream patches of this series we complete the transition
to using only order-0 pages in RX datapath (which was already done
in Striding RQ).

Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
of any non-consumed buffers of non-full LRO sessions, and combining
it with order-0 pages has many performance drawbacks.

Hence, here we totally remove LRO support in Legacy RQ.
This guarantees having no out-of-order completions, which allows using
a cyclic work queue (instead of a linked-list) in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:15 -07:00
Tariq Toukan
386471f16b net/mlx5e: RX, Dedicate a function for copying SKB header
Get the logic of copying the packet header into the SKB linear part
into a generic function. Function does copy length alignment
and dma buffer sync.

It is currently called only within the MPWQE flow.
In a downstream patch, it will be called within the legacy RQ flow
as well.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
fa698366b7 net/mlx5e: RX, Generalise function of SKB frag addition
Rename it and pass truesize as an extra argument, as it will be used also
in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
75aa889fb9 net/mlx5e: RX, Generalise name of non-linear SKB head size
Make name more generic by dropping MPWRQ from it, as it will be
used also in Legacy RQ in a downstream patch.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
5e7d77a9c5 net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data
Instead of maintaining a local copy of skb->len/data and updating
it upon every copy to the WQE inline part, just calculate it once
when needed, using the ihs.

This obsoletes the function mlx5e_tx_skb_pull_inline.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Ilan Tayari
98db16bab5 net/mlx5: FPGA, Handle QP error event
Add handlers for this event to perform graceful teardown of the device.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Adi Nissim
250a42b6a7 net/mlx5e: Support configurable MTU for vport representors
The representor MTU was hard coded to 1500 bytes.
Allow setting arbitrary MTU values up to the max supported by the FW.

Signed-off-by: Adi Nissim <adin@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Maor Gottlieb
93edcb3a75 net/mlx5e: Increase aRFS flow tables size
Increase the aRFS flow table size to 64k so it could contain up to 64k
different streams.

Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Eran Ben Elisha
6c63efe4cf net/mlx5e: Remove redundant active_channels indication
Now, when all channels stats are saved regardless of the channel's state
{open, closed}, we can safely remove this indication and the stats spin
lock which protects it.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Eran Ben Elisha
8bfaf07f78 net/mlx5e: Present SW stats when state is not opened
The driver can present all SW stats even when the state not opened.
Fixed get strings, count and stats to support it.

In addition, fix tc2txq to hold a static mapping which doesn't depend on
the amount of open channels, and cannot have the same value on two
different cells  while moving between configurations.
Example:
- OOB 16 channels
- Change to 2 channels, 8 TCs
- tc2txq[15][0] == tc2txq[1][7] == 15
This will cause multiple appearances of the same TX index in statistics
output.

Fixes: 76c3810bade3 ("net/mlx5e: Avoid reset netdev stats on configuration changes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
c90262f846 net/mlx5e: IPOIB, Add a missing skb_pull
A call to mlx5e_tx_skb_pull_inline was mistakenly dropped
in the cited patch. Get it back.

Fixes: 043dc78ecf ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:14 -07:00
Tariq Toukan
4b3e85a52a net/mlx5e: IPOIB, Fix overflowing SQ WQE memset
IPoIB WQE size is larger than a single WQEBB.  Must not fetch the WQE,
and surely not memset it, until it is guaranteed that there are enough
WQEBBs available before getting to SQ/frag edge.

Fixes: 043dc78ecf ("net/mlx5e: TX, Use actual WQE size for SQ edge fill")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-06-01 16:48:13 -07:00
Ilan Tayari
1f0cf89b09 net/mlx5: Add FPGA QP error event
The FPGA queue pair (QP) event fires whenever a QP on the FPGA
transitions to the error state.

At this stage, this event is unrecoverable, it may become recoverable
in the future.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 15:35:38 -04:00
Ilan Tayari
1865ea9adb net/mlx5: Add temperature warning event to log
Temperature warning event is sent by FW to indicate high temperature
as detected by one of the sensors on the board.
Add handling of this event by writing the numbers of the alert sensors
to the kernel log.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 15:35:37 -04:00
Petr Machata
7edcb8ecbe mlxsw: spectrum_switchdev: Schedule respin during trans prepare
Since there's no special support for the bridge events, the driver
returns -EOPNOTSUPP, and thus the commit never happens. Therefore
schedule respin during the prepare stage: there's no real difference one
way or another.

This fixes the problem that mirror-to-gretap offload wouldn't adapt to
changes in bridge vlan configuration right away and another notification
would have to arrive for mlxsw to catch up.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 14:13:44 -04:00
Petr Machata
ea47217519 mlxsw: spectrum_switchdev: Ignore bridge VLAN events
A follow-up patch enables emitting VLAN notifications for the bridge CPU
port in addition to the existing slave port notifications. These
notifications have orig_dev set to the bridge in question.

Because there's no specific support for these VLANs, just ignore the
notifications to maintain the current behavior.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31 14:13:43 -04:00
Or Gerlitz
01252a2783 net/mlx5e: Get the number of offloaded TC rules from the correct table
As we keep the offloaded TC rules for NIC and e-switch in two different
places, make sure to return the number of offloaded flows according
to the use-case and not blindly from the priv.

Fixes: 655dc3d2b9 ('net/mlx5e: Use shared table for offloaded TC eswitch flows')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:50 -07:00
Ilya Lesokhin
36dd490200 net/mlx5: FPGA, Call DMA unmap with the right size
When mlx5_fpga_conn_unmap_buf is called buf->sg[0].size
should equal the actual buffer size, not the message size.
Otherwise we will trigger the following dma debug warning
"DMA-API: device driver frees DMA memory with different size"

Fixes: 537a505741 ('net/mlx5: FPGA, Add high-speed connection routines')
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:50 -07:00
Ilya Lesokhin
ba869ee0e0 net/mlx5: FPGA, Properly initialize dma direction on fpga conn send
Properly initialize dma direction on fpga conn send.
Do not rely on dma_dir == 0 (DMA_BIDIRECTIONAL).

Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:50 -07:00
Yevgeny Kliteynik
90275d809a net/mlx5: FPGA, Abort FPGA init if the device reports no QP capability
In the case that the reported max number of QPs capability
equals to zero, abort FPGA init.

Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:50 -07:00
Ilan Tayari
98c90f6f10 net/mlx5: FPGA, print SBU identification on init
Add print of the following values on init:
1. ieee vendor id
2. sandbox product id
3. sandbox product version

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:50 -07:00
Ilan Tayari
d1a15b1a7e net/mlx5: FPGA, Add device name
Add device name for Mellanox FPGA devices.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:49 -07:00
Ilan Tayari
d80c9f8831 net/mlx5: FPGA, Add doxygen for access type enum
Add doxygen comments for enum mlx5_fpga_access_type.

Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-29 17:27:49 -07:00
Jiri Pirko
f3a52c6162 mlxsw: pci: Utilize MRSR register to perform FW reset
So far, the PCI BAR0 register is used for triggering FW reset. However,
that is a legacy attitude and it is recommended to use MRSR to perform
reset instead. So do that. Move the reset into init() function as
the cmd interface needs to be used. With that, IRQ initialization needs
to be moved as well. As a side effect, the reset move simplifies
the devlink reload flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:10:03 -04:00
Jiri Pirko
2a360bf0f6 mlxsw: cmd: Handle error after reset gracefully
There is an exception in command interface processing in case the MRSR
register is written to. The register triggers FW reset and during the
reset FW returns an error. So handle this by ignoring this error while
writing to MRSR register.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:10:03 -04:00
Jiri Pirko
12b003b2e4 mlxsw: reg: Add Management Reset and Shutdown Register
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:10:03 -04:00
Petr Machata
47bf9df2e8 mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
VLAN 1 is internally used for untagged traffic. Prevent creation of
explicit netdevice for that VLAN, because that currently isn't supported
and leads to the NULL pointer dereference cited below.

Fix by preventing creation of VLAN devices with VID of 1 over mlxsw
devices or LAG devices that involve mlxsw devices.

[  327.175816] ================================================================================
[  327.184544] UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c:200:12
[  327.193667] member access within null pointer of type 'const struct mlxsw_sp_fid'
[  327.201226] CPU: 0 PID: 8983 Comm: ip Not tainted 4.17.0-rc4-petrm_net_ip6gre_headroom-custom-140 #11
[  327.210496] Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2F"/"SA000874", BIOS 4.6.5 03/08/2016
[  327.219872] Call Trace:
[  327.222384]  dump_stack+0xc3/0x12b
[  327.234007]  ubsan_epilogue+0x9/0x49
[  327.237638]  ubsan_type_mismatch_common+0x1f9/0x2d0
[  327.255769]  __ubsan_handle_type_mismatch+0x90/0xa7
[  327.264716]  mlxsw_sp_fid_type+0x35/0x50 [mlxsw_spectrum]
[  327.270255]  mlxsw_sp_port_vlan_router_leave+0x46/0xc0 [mlxsw_spectrum]
[  327.277019]  mlxsw_sp_inetaddr_port_vlan_event+0xe1/0x340 [mlxsw_spectrum]
[  327.315031]  mlxsw_sp_netdevice_vrf_event+0xa8/0x100 [mlxsw_spectrum]
[  327.321626]  mlxsw_sp_netdevice_event+0x276/0x430 [mlxsw_spectrum]
[  327.367863]  notifier_call_chain+0x4c/0x150
[  327.372128]  __netdev_upper_dev_link+0x1b3/0x260
[  327.399450]  vrf_add_slave+0xce/0x170 [vrf]
[  327.403703]  do_setlink+0x658/0x1d70
[  327.508998]  rtnl_newlink+0x908/0xf20
[  327.559128]  rtnetlink_rcv_msg+0x50c/0x720
[  327.571720]  netlink_rcv_skb+0x16a/0x1f0
[  327.583450]  netlink_unicast+0x2ca/0x3e0
[  327.599305]  netlink_sendmsg+0x3e2/0x7f0
[  327.616655]  sock_sendmsg+0x76/0xc0
[  327.620207]  ___sys_sendmsg+0x494/0x5d0
[  327.666117]  __sys_sendmsg+0xc2/0x130
[  327.690953]  do_syscall_64+0x66/0x370
[  327.694677]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  327.699782] RIP: 0033:0x7f4c2f3f8037
[  327.703393] RSP: 002b:00007ffe8c389708 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[  327.711035] RAX: ffffffffffffffda RBX: 000000005b03f53e RCX: 00007f4c2f3f8037
[  327.718229] RDX: 0000000000000000 RSI: 00007ffe8c389760 RDI: 0000000000000003
[  327.725431] RBP: 00007ffe8c389760 R08: 0000000000000000 R09: 00007f4c2f443630
[  327.732632] R10: 00000000000005eb R11: 0000000000000246 R12: 0000000000000000
[  327.739833] R13: 00000000006774e0 R14: 00007ffe8c3897e8 R15: 0000000000000000
[  327.747096] ================================================================================

Fixes: 9589a7b5d7 ("mlxsw: spectrum: Handle VLAN devices linking / unlinking")
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:07:34 -04:00
Arnd Bergmann
c3cd281e2b net/mlx5e: fix TLS dependency
With CONFIG_TLS=m and MLX5_CORE_EN=y, we get a link failure:

drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In function `mlx5e_tls_handle_ooo':
tls_rxtx.c:(.text+0x24c): undefined reference to `tls_get_record'
drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls_rxtx.o: In function `mlx5e_tls_handle_tx_skb':
tls_rxtx.c:(.text+0x9a8): undefined reference to `tls_device_sk_destruct'

This narrows down the dependency to only allow the configurations
that will actually work. The existing dependency on TLS_DEVICE is
not sufficient here since MLX5_EN_TLS is a 'bool' symbol.

Fixes: c83294b9ef ("net/mlx5e: TLS, Add Innova TLS TX support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 10:03:40 -04:00
David S. Miller
874fcf1de6 mlx5e-updates-2018-05-25
This series includes updates for mlx5e netdev driver.
 
 1) Allowr flow based VF vport mirroring under sriov switchdev scheme,
 added support for offloading the TC mirred mirror sub-action, from
 Chris Mi.
 
 =================
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 The user will typically set the actions order such that the mirror
 port (mirror VF) sees packets as the original port (VF under
 mirroring) sent them or as it will receive them. In the general case,
 it means that packets are potentially sent to the mirror port before
 or after some actions were applied on them.
 
 To properly do that, we follow on the exact action order as set for
 the flow and make sure this will also be the case when we program the
 HW offload.
 
 If all the actions should apply before forwarding to the mirror and dest port,
 mirroring is just multicasting to the two vports. Otherwise, we split
 the TC flow to two HW rules, where the 1st applies only the actions
 needed up to the mirror (if there are such) and the 2nd the rest of
 the actions plus the forwarding to the dest vport.
 =================
 
 2) Move to order-0 only allocations (using fragmented work queues) for all
 work queues used by the driver, RX and TX descriptor rings
 (RQs, SQs and Completion Queues (CQs)), from Tariq Toukan.
 
 3) Avoid resetting netdevice statistics on netdevice
 state changes, from Eran Ben Elisha.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJbCKCcAAoJEEg/ir3gV/o++7sH/1FPmwvpf4qNEsusr714lNnl
 PxzVjnwxKZNYyovIjr6QGSxMM1qDiPejyYnZIAzH00B+XWCq3zn8H3sfJLFbmxN3
 clayd6dGV27HzLZwV2aD9vXfVb7snNhQtTp5zoajsnZY4xO335n3FA3kF5TxLqKO
 bxsnY2xSaSpbrBH2z2UvHc+ib9KnvY1Q+gr2WqdRnN5Tm51Zq+bgaUw0nROefGJK
 XR/aVBub3PsjA8W/0/b3DGfiP1bgPeU6QmLhdjn3IEufFtbEJH+K/4u53l3A4RLR
 fXmJrWKlZn8j7LUBFOD0/G43RU/YzqRgwF/iUwCxbUU2wK/J0cPv23drLDVYzaY=
 =n7Zz
 -----END PGP SIGNATURE-----

Merge tag 'mlx5e-updates-2018-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5e-updates-2018-05-25

This series includes updates for mlx5e netdev driver.

1) Allowr flow based VF vport mirroring under sriov switchdev scheme,
added support for offloading the TC mirred mirror sub-action, from
Chris Mi.

=================
From: Or Gerlitz <ogerlitz@mellanox.com>

The user will typically set the actions order such that the mirror
port (mirror VF) sees packets as the original port (VF under
mirroring) sent them or as it will receive them. In the general case,
it means that packets are potentially sent to the mirror port before
or after some actions were applied on them.

To properly do that, we follow on the exact action order as set for
the flow and make sure this will also be the case when we program the
HW offload.

If all the actions should apply before forwarding to the mirror and dest port,
mirroring is just multicasting to the two vports. Otherwise, we split
the TC flow to two HW rules, where the 1st applies only the actions
needed up to the mirror (if there are such) and the 2nd the rest of
the actions plus the forwarding to the dest vport.
=================

2) Move to order-0 only allocations (using fragmented work queues) for all
work queues used by the driver, RX and TX descriptor rings
(RQs, SQs and Completion Queues (CQs)), from Tariq Toukan.

3) Avoid resetting netdevice statistics on netdevice
state changes, from Eran Ben Elisha.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29 09:45:13 -04:00
David S. Miller
5b79c2af66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Lots of easy overlapping changes in the confict
resolutions here.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-26 19:46:15 -04:00
Eran Ben Elisha
05909babce net/mlx5e: Avoid reset netdev stats on configuration changes
Move all RQ, SQ and channel counters from the channel objects into the
priv structure.  With this change, counters will not be reset upon
channel configuration changes.

Channel's statistics for SQs which are associated with TCs higher than
zero will be presented in ethtool -S, only for SQs which were opened at
least once since the module was loaded (regardless of their open/close
current status).  This is done in order to decrease the total amount of
statistics presented and calculated for the common out of box use (no
QoS).

mlx5e_channel_stats is a compound of CH,RQ,SQs stats in order to
create locality for the NAPI when handling TX and RX of the same
channel.

Align the new statistics struct per ring to avoid several channels
update to the same cache line at the same time.
Packet rate was tested, no degradation sensed.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
CC: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 16:14:28 -07:00
Shalom Lagziel
868a01a27d net/mlx5e: Introducing new statistics rwlock
Introduce a new read/write lock that will protect statistics gathering from
netdev channels configuration changes.
e.g. when channels are being replaced (increase/decrease number of rings)
prevent statistic gathering (ndo_get_stats64) to read the statistics of
in-active channels (channels that are being closed).

Plus update channels software statistics on the fly when calling
ndo_get_stats64, and remove it from stats periodic work.

Fixes: 9218b44dcc ("net/mlx5e: Statistics handling refactoring")
Signed-off-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:01 -07:00
Saeed Mahameed
6ab75516cf net/mlx5e: Move phy link down events counter out of SW stats
PHY link down events counter belongs to phy_counters group.
although it has special handling, it doesn't mean it can't be there.

Move it to phy_counters_grp handler.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:01 -07:00
Tariq Toukan
3a2f703312 net/mlx5: Use order-0 allocations for all WQ types
Complete the transition of all WQ types to use fragmented
order-0 coherent memory instead of high-order allocations.

CQ-WQ already uses order-0.
Here we do the same for cyclic and linked-list WQs.

This allows the driver to load cleanly on systems with a highly
fragmented coherent memory.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

No degradation is sensed.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
549322f2f9 net/mlx5i: Use compilation flag in IPOIB header
If CONFIG_MLX5_CORE_IPOIB is not set, compile-out the
IPOIB related headers.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
043dc78ecf net/mlx5e: TX, Use actual WQE size for SQ edge fill
We fill SQ edge with NOPs to avoid WQEs wrap.
Here, instead of doing that in advance for the maximum possible
WQE size, we do it on-demand using the actual WQE size.
We re-order some parts in mlx5e_sq_xmit to finish the calculation
of WQE size (ds_cnt) before doing any writes to the WQE buffer.

When SQ work queue is fragmented (introduced in an downstream patch),
dealing with WQE wraps becomes more frequent. This change would drastically
reduce the overhead in this case.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

Before: 14.9 Mpps
After:  15.8 Mpps

Improvement of 6%.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Tariq Toukan
ddf385e31f net/mlx5e: Use WQ API functions instead of direct fields access
Use the WQ API to get the WQ size, and to map a counter
into a WQ entry index.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
e4ad91f23f net/mlx5e: Split offloaded eswitch TC rules for port mirroring
If a TC rule needs to be split for mirroring, create two HW rules,
in the first level and the second level flow tables accordingly.

In the first level flow table, forward the packet to the mirror
port and forward the packet to the second level flow table for
further processing, eg. encap, vlan push or header re-write.

Currently the matching is repeated in both stages.

While here, simplify the setup of the vhca id valid indicator also
in the existing code.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
592d365159 net/mlx5e: Parse mirroring action for offloaded TC eswitch flows
Currently, we only support the mirred redirect TC sub-action. In order
to support flow based vport mirroring, add support to parse the mirred
mirror sub-action.

For mirroring, user-space will typically set the action order such that
the mirror port (mirror VF) sees packets as the original port (VF under
mirroring) sent them or as it will receive them.

In the general case, it means that packets are potentially sent to the
mirror port before or after some actions were applied on them. To
properly do that, we should follow on the exact action order as set for
the flow and make sure this will also be the case when we program the HW
offload.

We introduce a counter for the output ports (attr->out_count), which we
increase when parsing each mirred redirect/mirror sub-action and when
dealing with encap.

We introduce a counter (attr->mirror_count) telling us if split is
needed. If no split is needed and mirroring is just multicasting to
vport, the mirror count is zero, all the actions of the TC flow should
apply on that single HW flow.

If split is needed, the mirror count tells where to do the split, all
non-mirred tc actions should apply only after the split.

The mirror count is set while parsing the following actions encap/decap,
header re-write, vlan push/pop.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
a842dd04cf net/mlx5: E-switch, Create a second level FDB flow table
If firmware supports the forward action with a destination list
that includes a flow table, create a second level FDB flow table.

This is going to be used for flow based mirroring under the switchdev
offloads mode.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Chris Mi
52fff3274b net/mlx5: E-Switch, Reorganize and rename fdb flow tables
We have several fdb flow tables for each of the legacy and switchdev
modes. In the switchdev mode, there are fast path and slow path flow
tables. Towards adding more flow tables in upcoming patches, reorganize
and rename the various existing ones to reflect their functionality.

Signed-off-by: Chris Mi <chrism@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-25 14:11:00 -07:00
Qing Huang
1383cb8103 mlx4_core: allocate ICM memory in page size chunks
When a system is under memory presure (high usage with fragments),
the original 256KB ICM chunk allocations will likely trigger kernel
memory management to enter slow path doing memory compact/migration
ops in order to complete high order memory allocations.

When that happens, user processes calling uverb APIs may get stuck
for more than 120s easily even though there are a lot of free pages
in smaller chunks available in the system.

Syslog:
...
Dec 10 09:04:51 slcc03db02 kernel: [397078.572732] INFO: task
oracle_205573_e:205573 blocked for more than 120 seconds.
...

With 4KB ICM chunk size on x86_64 arch, the above issue is fixed.

However in order to support smaller ICM chunk size, we need to fix
another issue in large size kcalloc allocations.

E.g.
Setting log_num_mtt=30 requires 1G mtt entries. With the 4KB ICM chunk
size, each ICM chunk can only hold 512 mtt entries (8 bytes for each mtt
entry). So we need a 16MB allocation for a table->icm pointer array to
hold 2M pointers which can easily cause kcalloc to fail.

The solution is to use kvzalloc to replace kcalloc which will fall back
to vmalloc automatically if kmalloc fails.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Acked-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-25 10:22:53 -04:00
Yossi Kuperman
1dcbc01f73 net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
Sandbox QP Commands are retired in the order they are sent. Outstanding
commands are stored in a linked-list in the order they appear. Once a
response is received and the callback gets called, we pull the first
element off the pending list, assuming they correspond.

Sending a message and adding it to the pending list is not done atomically,
hence there is an opportunity for a race between concurrent requests.

Bind both send and add under a critical section.

Fixes: bebb23e6cb ("net/mlx5: Accel, Add IPSec acceleration interface")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:40:40 -07:00
Eran Ben Elisha
902a545904 net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
When RXFCS feature is enabled, the HW do not strip the FCS data,
however it is not present in the checksum calculated by the HW.

Fix that by manually calculating the FCS checksum and adding it to the SKB
checksum field.

Add helper function to find the FCS data for all SKB forms (linear,
one fragment or more).

Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:40:39 -07:00
Huy Nguyen
ecdf2dadee net/mlx5e: Receive buffer support for DCBX
Add dcbnl's set/get buffer configuration callback that allows user to
set/get buffer size configuration and priority to buffer mapping.

By default, firmware controls receive buffer configuration and priority
of buffer mapping based on the changes in pfc settings. When set buffer
call back is triggered, the buffer configuration changes to manual mode.

The manual mode means mlx5 driver will adjust the buffer configuration
accordingly based on the changes in pfc settings.

ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple
of 128, the buffer size will be rounded down to the nearest multiple of
128.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
0696d60853 net/mlx5e: Receive buffer configuration
Add APIs for buffer configuration based on the changes in
pfc configuration, cable len, buffer size configuration,
and priority to buffer mapping.

Note that the xoff fomula is as below
  xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]
  xoff_threshold = buffer_size - xoff
  xon_threshold = xoff_threshold - MTU

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
50b4a3c236 net/mlx5: PPTB and PBMC register firmware command support
Add firmware command interface to read and write PPTB and PBMC
registers.

PPTB register enables mappings priority to a specific receive buffer.

PBMC registers enables changing the receive buffer's configuration such
as buffer size, xon/xoff thresholds, buffer's lossy property and
buffer's shared property.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Huy Nguyen
2c81bfd5ae net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
  u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
  int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  u32 mlx5e_port_speed2linkmodes(u32 speed);

Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.

Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-24 14:23:33 -07:00
Jason Gunthorpe
c62091bcd9 mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into for-next

mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patche to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

* tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5e: Explicitly set source e-switch in offloaded TC rules
  net/mlx5: Add source e-switch owner
  net/mlx5e: Explicitly set destination e-switch in FDB rules
  net/mlx5: Add destination e-switch owner
  net/mlx5: Properly handle a vport destination when setting FTE
  net/mlx5: Add merged e-switch cap
  IB/mlx5: Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
  net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
2018-05-24 09:40:43 -06:00
Jack Morgenstein
d546b67cda net/mlx4: Fix irq-unsafe spinlock usage
spin_lock/unlock was used instead of spin_un/lock_irq
in a procedure used in process space, on a spinlock
which can be grabbed in an interrupt.

This caused the stack trace below to be displayed (on kernel
4.17.0-rc1 compiled with Lock Debugging enabled):

[  154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
[  154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G          I
[  154.675856] -----------------------------------------------------
[  154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[  154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core]
[  154.700927]
and this task is already holding:
[  154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib]
[  154.718028] which would create a new lock dependency:
[  154.723705]  (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.}
[  154.731922]
but this new dependency connects a SOFTIRQ-irq-safe lock:
[  154.740798]  (&(&cq->lock)->rlock){..-.}
[  154.740800]
... which became SOFTIRQ-irq-safe at:
[  154.752163]   _raw_spin_lock_irqsave+0x3e/0x50
[  154.757163]   mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib]
[  154.762554]   ipoib_tx_poll+0x4a/0xf0 [ib_ipoib]
...
to a SOFTIRQ-irq-unsafe lock:
[  154.815603]  (&(&qp_table->lock)->rlock){+.+.}
[  154.815604]
... which became SOFTIRQ-irq-unsafe at:
[  154.827718] ...
[  154.827720]   _raw_spin_lock+0x35/0x50
[  154.833912]   mlx4_qp_lookup+0x1e/0x50 [mlx4_core]
[  154.839302]   mlx4_flow_attach+0x3f/0x3d0 [mlx4_core]

Since mlx4_qp_lookup() is called only in process space, we can
simply replace the spin_un/lock calls with spin_un/lock_irq calls.

Fixes: 6dc06c08be ("net/mlx4: Fix the check in attaching steering rules")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 15:48:58 -04:00
Colin Ian King
4f7f56b6b1 net/mlx4: fix spelling mistake: "Inrerface" -> "Interface" and rephrase message
Trivial fix to spelling mistake in mlx4_dbg debug message and also
change the phrasing of the message so that is is more readable

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-23 14:58:10 -04:00
David Ahern
5a15a1b07c mlxsw: spectrum_router: Add support for route append
Handle append for gateway based routes. Dev-only multipath routes will
be handled by a follow on patch.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-22 14:44:18 -04:00
David S. Miller
6f6e434aa2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.

TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.

The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.

Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-21 16:01:54 -04:00
Jiri Pirko
ec932fbda7 mlxsw: use devlink helper to generate physical port name
Since devlink knows the info needed to generate the physical port name
in a generic way for all devlink users, use the helper to do the job.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Jiri Pirko
5ec1380a21 devlink: extend attrs_set for setting port flavours
Devlink ports can have specific flavour according to the purpose of use.
This patch extend attrs_set so the driver can say which flavour port
has. Initial flavours are:
physical, cpu, dsa
User can query this to see right away what is the purpose of each port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Jiri Pirko
b9ffcbaf56 devlink: introduce devlink_port_attrs_set
Change existing setter for split port information into more generic
attrs setter. Alongside with that, allow to set port number and subport
number for split ports.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-19 16:30:39 -04:00
Or Gerlitz
a228060a7c net/mlx5e: Add HW vport counters to representor ethtool stats
Currently the representor only report the SW (slow-path) traffic
counters.

Add packet/bytes reporting of the HW counters, which account for the
total amount of traffic that was handled by the vport, both slow and
fast (offloaded) paths. The newly exposed counters are named
vport_rx/tx_packets/bytes.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Or Gerlitz
8f8ae8953f net/mlx5e: Ignore attempts to offload multiple times a TC flow
For VF->VF and uplink->VF rules, the TC core (cls_api) attempts
to offload the same flow multiple times into the driver, b/c we
registered to the egdev callback.

Use the flow cookie to ignore attempts to add such flows, we can't
reject them (return error), b/c this will fail the offload attempt,
so we ignore that. We indentify wrong stat/del calls using the flow
ingress/egress flags, here we do return error to the core.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Or Gerlitz
655dc3d2b9 net/mlx5e: Use shared table for offloaded TC eswitch flows
Currently, each representor netdev use their own hash table to keep
the mapping from TC flow (f->cookie) to the driver offloaded instance.
The table is the one which originally was added for offloading TC NIC
(not eswitch) rules.

This scheme breaks when the core TC code calls us to add the same flow
twice, (e.g under egdev use case) since we don't spot that and offload
a 2nd flow into the HW with the wrong source vport.

As a pre-step to solve that, we move to use a single table which keeps
all offloaded TC eswitch flows. The table is located at the eswitch
uplink representor object.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Or Gerlitz
05866c8236 net/mlx5e: Prepare for shared table to keep TC eswitch flows
This is a refactoring step to be able and store the hash table which
keeps track of offloaded TC flows in a different location for NIC
vs e-switch rules.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Or Gerlitz
60bd4af814 net/mlx5e: Add ingress/egress indication for offloaded TC flows
When an e-switch TC rule is offloaded through the egdev (egress
device) mechanism, we treat this as egress, all other cases (NIC
and e-switch) are considred ingress.

This is preparation step that will allow us to  identify "wrong"
stat/del offload calls made by the TC core on egdev based flows and
ignore them.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Rabie Loulou
b1d90e6bbd net/mlx5e: Offload TC eswitch rules for VFs belonging to different PFs
When the merged eswitch capability is supported, allow offloading rules
between VFs which belong to different PFs (and hence have different
eswitch affinity).

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:48:54 -07:00
Saeed Mahameed
260ab7042e mlx5-updates-2018-05-17
mlx5 core dirver updates for both net-next and rdma-next branches.
 
 From Christophe JAILLET, first three patche to use kvfree where needed.
 
 From: Or Gerlitz <ogerlitz@mellanox.com>
 
 Next six patches from Roi and Co adds support for merged
 sriov e-switch which comes to serve cases where both PFs, VFs set
 on them and both uplinks are to be used in single v-switch SW model.
 When merged e-switch is supported, the per-port e-switch is logically
 merged into one e-switch that spans both physical ports and all the VFs.
 
 This model allows to offload TC eswitch rules between VFs belonging
 to different PFs (and hence have different eswitch affinity), it also
 sets the some of the foundations needed for uplink LAG support.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJa/fLEAAoJEEg/ir3gV/o+7jUH/3n5/Uw1LLt3TfeKArx6i0F1
 3G4U5B0ha03qiDqXprwhyQ3I6lgYmRBmjcxnqmvcqOAqO4/hSsjtTR+A/mgbEDhJ
 YtdekFNEX+72h/N2GIpZwChIWSE3EcMPaLYnV8TwLUgh9YSust2sCLSBbJCjxOKc
 j78M8ept/bXZwTm/iJhEjtmqw0xl91rl011chCAua0iEpH3wxteDARmKABFHMQxl
 I3N/x/e/astgcSCNgpO4uDf9zEIRkNdzcHPzSMJ6C2Oo5W9XiZEekfw7WKj9nXfa
 G+eGckkAyCOQ/r2lZ9nA0ZUvQ2X6JISvxgohuaCNwTgsz3acTxbLnQK4YWHzQCQ=
 =iHi6
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2018-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

mlx5-updates-2018-05-17

mlx5 core dirver updates for both net-next and rdma-next branches.

From Christophe JAILLET, first three patches to use kvfree where needed.

From: Or Gerlitz <ogerlitz@mellanox.com>

Next six patches from Roi and Co adds support for merged
sriov e-switch which comes to serve cases where both PFs, VFs set
on them and both uplinks are to be used in single v-switch SW model.
When merged e-switch is supported, the per-port e-switch is logically
merged into one e-switch that spans both physical ports and all the VFs.

This model allows to offload TC eswitch rules between VFs belonging
to different PFs (and hence have different eswitch affinity), it also
sets the some of the foundations needed for uplink LAG support.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 17:47:55 -07:00
Shahar Klein
10ff5359f8 net/mlx5e: Explicitly set source e-switch in offloaded TC rules
Set a specific source e-switch when setting a rule that matches on the
ingress port.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:35 -07:00
Shahar Klein
3e99df8772 net/mlx5: Add source e-switch owner
The source e-switch owner allows a vport on one e-switch port be associated
with a rule defined on the second port e-switch.

The role of the source eswitch owner valid bit in the flow group is to
allow the firmware fail driver attempts to wild card the source eswitch
match field. If this bit is not set, the firmware ignores the source
eswitch owner field totally.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Rabie Loulou
56e858df9f net/mlx5e: Explicitly set destination e-switch in FDB rules
Set a specific destination e-switch when setting a destination vport.

Signed-off-by: Rabie Loulou <rabiel@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Shahar Klein
b17f7fc10f net/mlx5: Add destination e-switch owner
The destination e-switch owner allows a rule in namespace of one e-switch
owner to point to a vport that is natively associated with another
e-switch owner.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Shahar Klein
65360e5451 net/mlx5: Properly handle a vport destination when setting FTE
When creating FTE, properly distinguish between destination being vport
or tir. The previous code just worked accidentally b/c of both dest being
in the same offset within a union.

Signed-off-by: Shahar Klein <shahark@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-17 14:17:34 -07:00
Ariel Levkovich
71c6e8638c IB/mlx5: Add support for MPLS flow specification
This patch introduces support for the MPLS flow spec and
allows the creation of rules that are matching on the
MPLS label.

Applying the rule matching depends on the flow specs order and
the location of the MPLS in the spec list as there are different
configurations to be made in the device in the cases of MPLSoGRE
and MPLSoUDP vs. non-encapsulated MPLS.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-05-16 21:32:55 -06:00
Christophe JAILLET
e574978ae5 net/mlx5: Eswitch, Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: fed9ce22bf ("net/mlx5: E-Switch, Add API to create vport rx rules")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:49:01 -07:00
Christophe JAILLET
a5898e97f0 net/mlx5: Vport, Use 'kvfree()' for memory allocated by 'kvzalloc()'
When 'kvzalloc()' is used to allocate memory, 'kvfree()' must be used to
free it.

Fixes: 9efa752545 ("net/mlx5_core: Introduce access functions to query vport RoCE fields")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-16 17:48:17 -07:00
Gal Pressman
0e5c04f6b5 net/mlx5e: Remove MLX5E_TEST_BIT macro
MLX5E_TEST_BIT macro is the same as the already existent test_bit,
remove it and replace all usages.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Gal Pressman
bfbe205753 net/mlx5e: Use test bit in en accel xmit flow
Replace (mask & bit) check with test_bit.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Gal Pressman
af5a6c9365 net/mlx5e: Use __set_bit for adaptive-moderation bit in RQ state
Make the code more clear by replacing the existing code with __set_bit.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Eran Ben Elisha
1e7477ae8b net/mlx5e: Report all channels with min RX WQEs timeout
Report all channels which got timeout on posting the minimal number of
RX WQEs and not only the first one. Avoid busy wait on every channel,
when one of the RQs check got timeout, poll once for the remaining RQs.

In addition, add channel index to log when failed to get min RX WQEs
This info is needed in order to debug in case of dysfunctional channel.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
38aa51c134 net/mlx5e: Support offloaded TC flows with no matches on headers
For example:
    tc filter add dev ens2f0_0 parent ffff: flower skip_sw action drop

Note that for eswitch flows, we still always match on the source port.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
d708f90298 net/mlx5e: Get the required HW match level while parsing TC flow matches
Introduce levels of matching on headers of offloaded flows
(none, L2, L3, L4) that follow the inline mode levels.

This is pre-step for us to offload flows without any
matches on headers.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
547829004c net/mlx5e: Properly order min inline mode setup while parsing TC matches
Set the initial value to none instead of L2, and set to L2
where the previous initial value was assumed. Make sure to
parse L2 matches before L3 matches and L3 before L4.

This is a pre-step to get the match level for more purposes
other than the validating the needed vs. actual inline level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
1cab1cd74b net/mlx5e: Use local actions var while processing offloaded TC flow actions
Use local actions variable while parsing the actions of offloaded TC flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
31c8eba5e8 net/mlx5e: Return success when TC offloaded fdb actions parsed ok
Reaching here, means we didn't err anywhere, so lets just
return success.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
c180f67534 net/mlx5e: Avoid redundant zeroing of offloaded TC flow attributes
This is not needed as the attributes are zeroed out on allocation.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
b3a433de7b net/mlx5e: Clean static checker complaints on TC offload and VF reps code
Clean warning/check complaints made by checkpatch on en_{tc,rep}.c

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Or Gerlitz
4601266095 net/mlx5e: Remove double defined DMAC header re-write element
The firmware DMAC_47_16 header re-write token was defined twice,
clean it up.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Tariq Toukan
efb6d7a20c net/mlx5e: Use bool as return type for mlx5e_xdp_handle
Function returns boolean values, use bool instead of int.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Tariq Toukan
bd206fd52e net/mlx5e: Use u8 instead of int for LRO number of segments
Range of LRO number of segments fits in u8.
Also, bring initialization and declaration together to
save code.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Roi Dayan
e36d4810f2 net/mlx5e: Skip redundant checks when providing NUD lastuse feedback
It's redundant to continue the loop if we found one flow whose lastuse value
being newer than the last one we reported, since this is enough for us to
trigger a NUD update (neigh_event_send()).

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-05-14 15:10:21 -07:00
Saeed Mahameed
b8c931ba3c net/mlx5e: Remove redundant vport context vlan update
In delete vlan flow an extra call to mlx5e_vport_context_update_vlans
was added by mistake, remove it.

Fixes: 86d722ad2c ("net/mlx5: Use flow steering infrastructure for mlx5_en")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Gal Pressman <galp@mellanox.com>
2018-05-14 15:10:22 -07:00