This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.
For this, use a new bit on rq->flags.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.
No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.
Fixes: 121e892754 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Shay Agroskin <shayag@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.
A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.
When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.
In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.
For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.
This is expected to improve packet rate in high CPU scale.
Performance test:
As expected, huge improvement in large-scale (48 cores).
xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
Before: Unstable, 7 to 30 Mpps
After: Stable, at 70.5 Mpps
No degradation in other tested scenarios.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The default m88e151x LED configuration is 0x1177, used LED[0]
for 1000M link, LED[1] for 100M link, and LED[2] for active.
But for some boards, which use LED[0] for link, and LED[1] for
active, prefer to be 0x1040. To be compatible with this case,
this patch defines a new dev_flag, and set it before connect
phy in HNS3 driver. When phy initializing, using the new
LED configuration if this dev_flag is set.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.
Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
All we do is write the length/status and address bits to a DMA
descriptor only to write its contents into on-chip registers right
after, eliminate this unnecessary step.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.
Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.
Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is used to set the per-port shared buffer quotas.
Currently, these quotas are only set for front panel ports, but a
subsequent patch will configure these quotas for the CPU port as well.
The configuration required for the CPU port is a bit different than that
of the front panel ports, so split the business logic into a separate
function which will be called with different parameters for the CPU
port.
No functional changes intended.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new ingress pool that was added in the previous patch for
control packets (e.g., STP, LACP) that are trapped to the CPU.
The previous management pool is no longer necessary and therefore its
size is set to 0.
The maximum quota for traffic towards the CPU is increased to 50% of the
free space in the new ingress pool and therefore the reserved space is
reduced by half, to 10KB - in both the shared and headroom buffer. This
allows for more efficient utilization of the shared buffer as reserved
space cannot be used for other purposes.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets that are trapped to the CPU are transmitted through the CPU port
to the attached host. The CPU port is therefore like any other port and
needs to have shared buffer configuration.
The maximum quotas configured for the CPU are provided using dynamic
threshold and cannot be changed by the user. In order to make sure that
these thresholds are always valid, the configuration of the threshold
type of these pools is forbidden.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code currently assumes that ingress pools have lower indices than
egress pools. This makes it impossible to add more ingress pools
without breaking user configuration that relies on a certain pool index
to correspond to an egress pool.
Remove such assumptions from the code, so that more ingress pools could
be added by subsequent patches.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e83c045e53 ("mlxsw: spectrum_buffers: Configure MC pool")
configured the threshold of the multicast TCs as infinite so that the
admission of multicast packets is only depended on per-switch priority
threshold.
Forbid the user from changing the thresholds of these multicast TCs and
their binding to a different pool.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Multicast packets have three egress quotas:
* Per egress port
* Per egress port and traffic class
* Per switch priority
The limits on the switch priority are not exposed to the user and
specified as dynamic threshold on the first egress pool.
Forbid changing the threshold type of the first egress pool so that
these limits are always valid.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e83c045e53 ("mlxsw: spectrum_buffers: Configure MC pool") added
a dedicated pool for multicast traffic. The pool is visible to the user
so that it would be possible to monitor its occupancy, but its
configuration should be forbidden in order to maintain its intended
operation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches are going to need to veto changes in certain TCs'
binding and threshold configurations.
Add fields to the TC's struct that indicate if the TC can be bound to a
different pool and whether its threshold can change and enforce that.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Subsequent patches are going to need to veto changes in certain pools'
size and / or threshold type (mode).
Add two fields to the pool's struct that indicate if either of these
attributes is allowed to change and enforce that.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pool indices are currently hard coded throughout the code, which
makes the code hard to follow and extend.
Overcome this by using defines for the pool indices.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack to shared buffer set operations, so that meaningful error
messages could be propagated to the user.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac_check_ether_addr() checks the MAC address and assigns one in
driver open(). In many cases when we create slave netdevice, the dev
addr is inherited from master but the master dev addr maybe NULL at
that time, so move this call to driver probe so that address is
always valid.
Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Tested-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Signed-off-by: Sneh Shah <snehshah@codeaurora.org>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAlyOup0eHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGHKoIAIKVuBSyD+m65TaM
pjoAFa56weEc67Mmai2A84EOm0MVy9C6L7EOcOgVsJiLxDCYyWQ7xYwV2kceKJpW
H5xauhb3+TxpxYeaeKdPPPHmBdejRwOPYvGAfnDMCqCCWQTad52sQUPCLI+yhF1t
wgnuMi+SwNBWP9aYCXdFPK4fVhh27AcEAOEsRVCh4tIBH/wkf4GwrDr3IX1MFeMX
jE/R43la4hu1swcWBsjkErWUasVPCgJSSQTfKDo9PQTVnoh0PHFp4fkOInVKLymQ
7AGo+Knc+1he+sFsB2IbZwea0xqtJtjtr1oC+at8gNx66qVG+o7UZNi5LR1uPW4Z
4+dwGBk=
=pyXR
-----END PGP SIGNATURE-----
Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next
Linux 5.1-rc1
We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The RTF_ADDRCONF flag filters out routes added by RA's in determining
which routes can be appended to an existing one to create a multipath
route. Restore the flag check and add a comment to document the RA piece.
Fixes: 4e54507ab1 ("ipv6: Simplify rt6_qualify_for_ecmp")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit c7a1ce397a ("ipv6: Change addrconf_f6i_alloc to use
ip6_route_info_create"), the gateway is no longer filled in for fib6_nh
structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can
be dropped from the 'rt6_qualify_for_ecmp'.
Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be
removed from the check as well.
This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking
if the nexthop has a gateway which is the real indication of whether
entries can be coalesced into a multipath route.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw does not support policy-based routing (PBR) and
therefore forbids the installation of non-default FIB rules except for
the l3mdev rule which is used for VRFs.
Relax the check to allow the installation of FIB rules that would never
match packets received by the device. Specifically, if the iif is that
of the loopback netdev. This is useful for users that need to redirect
locally generated packets based on FIB rules.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to get a consistent behavior of traffic flows across reboots /
module unload, we need to use the same ECMP/LAG seed.
Calculate the seed by hashing the base MAC of the device. This results
in a seed that is both unique (to avoid polarization) and consistent.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default VFs are not trusted. Add ndo_set_vf_trust support to toggle
a new per-VF bit. Coupled with FW with this capability allows a
trusted VF to change its MAC even after being administratively set by
the PF. Also populate the trusted field on ndo_get_vf_config. Add the
same ndo to the representors.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PF supports all debugfs command, but VF only supports part of
debugfs command. So VF should not show unsupported help information.
This patch adds a check for PF and PF to show the supportable help
information.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates VF's TQP statistic info in the service task,
and adds a limitation to prevent update too frequently.
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MAC tnl interruptions are different from other type of RAS and MSI-X
errors, because some bits, such as OVF/LR/RF will occur during link up
and down.
The drivers should clear status of all MAC tnl interruption bits but
shouldn't print any message that would mislead the users.
In case that link down and re-up in a short time because of some reasons,
we record when they occurred, and users can query them by debugfs.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allow users to dump content of NCL_CONFIG by using debugfs
command.
Command format:
echo dump ncl_config <offset> <length> > cmd
It will print as follows:
hns3 0000:7d:00.0: offset | data
hns3 0000:7d:00.0: 0x0000 | 0x00000020
hns3 0000:7d:00.0: 0x0004 | 0x00000400
hns3 0000:7d:00.0: 0x0008 | 0x08020401
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for network interface message level
settings. The message level can be changed by module parameter
or ethtool.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we just print few information when tx timeout happens.
In order to find out the cause of timeout, this patch prints more
information about the packet statistics, tqp registers and
napi state.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function hns3_get_tx_timeo_queue_info(), it should use
netdev->num_tx_queues, instead of netdve->real_num_tx_queues
as the loop limitation.
Fixes: 424eb834a9 ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current codes, tx_timeout_cnt is used before increased,
then we can see the tx_timeout_count is still 0 from the
print when tx timeout happens, e.g.
"hns3 0000:7d:00.3 eth3: tx_timeout count: 0, queue id: 0, SW_NTU:
0xa6, SW_NTC: 0xa4, HW_HEAD: 0xa4, HW_TAIL: 0xa6, INT: 0x1"
The tx_timeout_cnt should be updated before used.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When wait for response timeout or response code not match, there
should be more information for debugging.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When received vector0 msix and other event interrupt, it should
print the value of the register for debugging.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds some statistics for VF reset.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds statistics for PF's reset information,
also, provides a debugfs command to dump these statistics.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit f6b19b354d ("net: devlink: select NET_DEVLINK
from drivers") adds implicit select of NET_DEVLINK for
mlxsw, the code does not have to deal with the case
when CONFIG_NET_DEVLINK is not enabled. So remove the ifcase
and adjust Makefile.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Querying EEPROM high pages data for SFP module is currently
not supported by our driver and yet queried, resulting in
invalid FW queries.
Set the EEPROM ethtool data length to 256 for SFP module will
limit the reading for page 0 only and prevent invalid FW queries.
Fixes: bb64143eee ("net/mlx5e: Add ethtool support for dump module EEPROM")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for
NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ.
This commit fixes the calculations and adds a brief explanation for the
formula used.
Fixes: a26a5bdf3e ("net/mlx5e: Restrict the combination of large MTU and XDP")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
xdp_return_frame releases the frame. It leads to releasing the page, so
it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf
is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves
the memory access to precede the return of the frame.
Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
A recent addition to NFP introduced a function that formats a string with
a size_t variable. This is formatted with %ld which is fine on 64-bit
architectures but produces a compile warning on 32-bit architectures.
Fix this by using the z length modifier.
Fixes: a6156a6ab0f9 ("nfp: flower: handle merge hint messages")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver does not advertize NETIF_F_FRAGLIST, the stack can't
pass skbs with frags lists to the xmit function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there are more IOT2040 variants with identical hardware but
different asset tags, the asset tag matching should be adjusted to
support them.
For the board name "SIMATIC IOT2000", currently there are 2 types of
hardware, IOT2020 and IOT2040. The IOT2020 is identified by its unique
asset tag. Match on it first. If we then match on the board name only,
we will catch all IOT2040 variants. In the future there will be no other
devices with the "SIMATIC IOT2000" DMI board name but different
hardware.
Signed-off-by: Su Bao Cheng <baocheng.su@siemens.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If link is down and autoneg is set to on/off, the status in ethtool does
not change.
The reason is when the link is down the function returns with zero
before changing autoneg value.
Move the checking of link state (up/down) to be performed after setting
autoneg value, in order to be sure that autoneg will change in any case.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During driver initialization the driver sends a reset to the device and
waits for the firmware to signal that it is ready to continue.
Commit d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
increased the timeout to 13 seconds due to longer PHY calibration in
Spectrum-2 compared to Spectrum-1.
Recently it became apparent that this timeout is too short and therefore
this patch increases it again to a safer limit that will be reduced in
the future.
Fixes: c3ab435466 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Fixes: d2f372ba09 ("mlxsw: pci: Increase PCI SW reset timeout")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both Spectrum-1 and Spectrum-2 chips are currently configured such that
pairs of TC n (which is used for UC traffic) and TC n+8 (which is used
for MC traffic) are feeding into the same subgroup. Strict
prioritization is configured between the two TCs, and by enabling
MC-aware mode on the switch, the lower-numbered (UC) TCs are favored
over the higher-numbered (MC) TCs.
On Spectrum-2 however, there is an issue in configuration of the
MC-aware mode. As a result, MC traffic is prioritized over UC traffic.
To work around the issue, configure the MC TCs with DWRR mode (while
keeping the UC TCs in strict mode).
With this patch, the multicast-unicast arbitration results in the same
behavior on both Spectrum-1 and Spectrum-2 chips.
Fixes: 7b81953066 ("mlxsw: spectrum: Configure MC-aware mode on mlxsw ports")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently when calculating how much to increment ITR by inside of
ice_update_itr() we do some estimations and intermediate
calculations. Instead of doing estimations, just do the
calculation directly. This allows for a more accurate value and it
makes it easier for the next person to understand and update.
Also, remove the dividing the ITR value by 2 when latency
driven because the ITR values are already so low for 100Gbps
speed. This should help get to the desired ITR value faster.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Update driver version to 0.7.4
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds code to start or stop LLDP and DCBX in firmware through
use of ethtool private flags.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a new function ice_dcb_rebuild which reinitializes
DCB after a reset.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function ice_update_dcb_stats to get DCB stats
from the hardware and ethtool support for displaying these stats.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a new function ice_tx_prepare_vlan_flags_dcb to
insert 802.1p priority information into the VLAN header
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function ice_vsi_cfg_dcb_rings which updates a
VSI's rings based on DCB traffic class information.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support to process LLDP MIB change notifications sent
by the firmware.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the firmware doesn't support LLDP or DCBX, the driver should switch
to "software LLDP mode". This patch adds support for the same.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function ice_pf_dcb_cfg (and related helpers)
which applies the DCB configuration obtained from the firmware. As
part of this, VSIs/netdevs are updated with traffic class information.
This patch requires a bit of a refactor of existing code.
1. For a MIB change event, the associated VSI is closed and brought up
again. The gap between closing and opening the VSI can cause a race
condition. Fix this by grabbing the rtnl_lock prior to closing the
VSI and then only free it after re-opening the VSI during a MIB
change event.
2. ice_sched_query_elem is used in ice_sched.c and with this patch, in
ice_dcb.c as well. However, ice_dcb.c is not built when CONFIG_DCB is
unset. This results in namespace warnings (ice_sched.o: Externally
defined symbols with no external references) when CONFIG_DCB is unset.
To avoid this move ice_sched_query_elem from ice_sched.c to
ice_common.c.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a new top level function ice_init_dcb (and
related lower level helper functions) which continues the DCB init
flow.
This function uses ice_get_dcb_cfg to get, parse and store the DCB
configuration. Once this is done, it sets itself up to be notified
by the firmware on LLDP MIB change events.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a skeleton for ice_init_pf_dcb, the top level
function for DCB initialization. Subsequent patches will add to this
DCB init flow.
In this patch, ice_init_pf_dcb checks if DCB is a supported capability.
If so, an admin queue call to start the LLDP and DCBx in firmware is
issued. If not, an error is reported. Note that we don't fail the driver
init if DCB init fails.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bump driver version to 0.7.3
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Capitalize abbreviations and spell out some that aren't obvious.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes typos in code comments.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are a couple of spelling mistakes in NL_SET_ERR_MSG_MOD error
messages. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc warn this:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c: In function ndesc_init_rx_desc:
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:138:6: warning: variable 'bfsize1' set but not used [-Wunused-but-set-variable]
Like enh_desc_init_rx_desc, we should use bfsize1
in ndesc_init_rx_desc to calculate 'p->des1'
Fixes: 583e636141 ("net: stmmac: use correct DMA buffer size in the RX descriptor")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2019-04-16
This series contains updates to i40e driver only.
Adam fixes i40e so that queues can be restored to its original value if
configuring queue channels fails. Bumped the maximum API version
supported and added the API version to error messages to clarify
supported firmware API versions. Fixed the problem with the driver
being able to add only 7 multicast MAC address filters instead of 16.
Aleksandr adds support for Dynamic Device Personalization (DDP) which
allows loading profiles that change the way internal parser interprets
processed frames.
Nick fixes an issue where if we modify the VLAN stripping options when a
port VLAN is configured, it will break traffic for the VSI, so prevent
changes from being made.
Jake fixes an issue where a device reset can mess up the clock time
because we reset the clock time based on the kernel time every reset.
This causes us to potentially completely reset the PTP time, and can
cause unexpected behavior in programs like ptp4l.
Piotr fixes an LED blink issue with the 'ethtool -p' command, so that
identification blinking will work on all hardware.
Chinh fixed the error returned to correctly reflect the current state
when LLDP or DCBx is not in an operational state.
Grzegorz cleans up a misleading error message when untrusted VF tries to
exceed addresses beyond the NIC limit.
Carolyn fixes the error return code to correctly reflect the error case.
v2: updated the URL provided in the DDP patch (#2)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
By default Flow Control feature is not being enabled in stmmac.
This is a useful feature that can prevent loss of packets and now that
XGMAC already supports it (along with GMAC and QoS) it makes sense to
activate it.
Switch the module parameter to FLOW_AUTO so that Flow Control is
activated.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finish the implementation of Flow Control feature. In order for it to
work correctly we need to set EHFC bit and the correct threshold values
for activating and deactivating it.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On platforms that lack a TCAM (like LS1088A), masking of
flow steering keys is not supported. Until now we didn't
offer flow steering capabilities at all on these platforms,
since our driver implementation configured a "comprehensive"
FS key (containing all supported header fields), with masks
used to ignore the fields not present in the rules provided
by the user.
We now allow ethtool rules that share a common key (i.e. have
the same header fields). The FS key is now kept in the driver
private data and initialized when the first rule is added to
an empty table, rather than at probe time. If a rule with a new
composition key is wanted, the user must first manually delete
all previous rules.
When building a FS table entry to pass to firmware, we still use
the old building algorithm, which assumes an all-supported-fields
key, and later collapse the fields which aren't actually needed.
Masked rules are not supported; if provided, the mask value
will be ignored. For firmware versions older than MC10.7.0
(that only offer the legacy ABIs for configuring distribution
keys) flow steering without masking support remains unavailable.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce an internal id bitfield to uniquely identify header fields
supported by the Rx distribution keys. For the hash key, add a
conversion from the RXH_* bitmask provided by ethtool to the internal
ids.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add two macros to simplify reading DPNI options.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the Rx flow classification enable flag only if key config
operation is successful.
Fixes 3f9b5c9 ("dpaa2-eth: Configure Rx flow classification key")
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is preventive cleanup that may save troubles later.
No need to cancel repeateadly queued work if code is properly
refactored.
Don't let the ethtool -s process interfere with the stat workqueue
scheduling.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The nfp_flower_copy_pre_actions function introduces a case statement with
an intentional fallthrough. However, this generates a warning if built
with the -Wimplicit-fallthrough flag.
Remove the warning by adding a fall through comment.
Fixes: 1c6952ca58 ("nfp: flower: generate merge flow rule")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a DP_INFO message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds support for detecting the P2P (peer-to-peer) event packets.
This is required for timestamping the PTP packets in peer delay mode.
Unmask the below bits (set to 0) for device to detect the p2p packets.
NIG_REG_P0/1_LLH_PTP_PARAM_MASK
NIG_REG_P0/1_TLLH_PTP_PARAM_MASK
bit 1 - IPv4 DA 1 of 224.0.0.107.
bit 3 - IPv6 DA 1 of 0xFF02:0:0:0:0:0:0:6B.
bit 9 - MAC DA 1 of 0x01-80-C2-00-00-0E.
NIG_REG_P0/1_LLH_PTP_RULE_MASK
NIG_REG_P0/1_TLLH_PTP_RULE_MASK
bit 2 - {IPv4 DA 1; UDP DP 0}
bit 6 - MAC Ethertype 0 of 0x88F7.
bit 9 - MAC DA 1 of 0x01-80-C2-00-00-0E.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch performs code cleanup by defining macros for the ptp-timestamp
filters.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes an error code for an admin queue
head overrun to use I40E_ERR_ADMIN_QUEUE_FULL instead
of I40E_ERR_QUEUE_EMPTY.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the problem with the driver being able to add only 7
multicast MAC address filters instead of 16. The problem is fixed by
changing the maximum number of MAC address filters to 16+1+1 (two extra
are needed because the driver uses 1 for unicast MAC address and 1 for
broadcast).
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Defined the advertised link mode field for 40000baseSR4_Full for
use with ethtool.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Added the API version in the error message for clarity.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A new FW has been released, which uses API version 1.8.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Removed misleading messages when untrusted VF tries to
add more addresses than NIC limit
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modify the i40e_init_dcb to return the correct error when LLDP or DCBX
is not in operational state.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some hardware LEDs would not blink after command 'ethtool -p {eth-port}'
in certain circumstances. Now, function does not care about the activity
of the LED (though still preserves its state) but forcibly executes
identification blinking and then restores the LED state.
Signed-off-by: Piotr Marczak <piotr.marczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the case where PTP is running on the hardware clock, but the kernel
system time is not being synced, a device reset can mess up the clock
time.
This occurs because we reset the clock time based on the kernel time
every reset. This causes us to potentially completely reset the PTP
time, and can cause unexpected behavior in programs like ptp4l.
Avoid this by saving the PTP time prior to device reset, and then
restoring using that time after the reset.
Directly restoring the PTP time we saved isn't perfect, because time
should have continued running, but the clock will essentially be stopped
during the reset. This is still better than the current solution of
assuming that the PTP HW clock is synced to the CLOCK_REALTIME.
We can do even better, by saving the ktime and calculating
a differential, using ktime_get(). This is based on CLOCK_MONOTONIC, and
allows us to get a fairly precise measure of the time difference between
saving and restoring the time.
Using this, we can update the saved PTP time, and use that as the value
to write to the hardware clock registers. This, of course is not perfect.
However, it does help ensure that the PTP time is restored as close as
feasible to the time it should have been if the reset had not occurred.
During device initialization, continue using the system time as the
source for the creation of the PTP clock, since this is the best known
current time source at driver load.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modifying the VLAN stripping options when a port VLAN is configured
will break traffic for the VSI, and conceptually doesn't make sense,
so don't allow this.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces DDP (Dynamic Device Personalization) which allows
loading profiles that change the way internal parser interprets processed
frames. To load DDP profiles it utilizes ethtool flash feature. The files
with recipes must be located in /var/lib/firmware directory. Afterwards
the recipe can be loaded by invoking:
ethtool -f <if_name> <file_name> 100
ethtool -f <if_name> - 100
See further details of this feature in the i40e documentation, or
visit
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
The driver shall verify DDP profile can be loaded in accordance with
the rules:
* Package with Group ID 0 are exclusive and can only be loaded the first.
* Packages with Group ID 0x01-0xFE can only be loaded simultaneously
with the packages from the same group.
* Packages with Group ID 0xFF are compatible with all other packages.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Added a new local variable in the i40e_setup_tc function named
old_queue_pairs so num_queue_pairs can be restored to the correct
value in case configuring queue channels fails. Additionally, moved
the exit label in the i40e_setup_tc function so the if (need_reset)
block can be executed.
Also, fixed data packing in the i40e_setup_tc function.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There is a spelling mistake in a BNX2X_ERR message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A merge flow is formed from 2 sub flows. The match fields of the merge are
the same as the first sub flow that has formed it, with the actions being
a combination of the first and second sub flow. Therefore, a merge flow
should replace sub flow 1 when offloaded.
Offload valid merge flows by using a new 'flow mod' message type to
replace an existing offloaded rule. Track the deletion of sub flows that
are linked to a merge flow and revert offloaded merge rules if required.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the merging of 2 sub flows, a new 'merge' flow will be created and
written to FW. The TC layer is unaware that the merge flow exists and will
request stats from the sub flows. Conversely, the FW treats a merge rule
the same as any other rule and sends stats updates to the NFP driver.
Add links between merge flows and their sub flows. Use these links to pass
merge flow stats updates from FW to the underlying sub flows, ensuring TC
stats requests are handled correctly. The updating of sub flow stats is
done on (the less time critcal) TC stats requests rather than on FW stats
update.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When combining 2 sub_flows to a single 'merge flow' (assuming the merge is
valid), the merge flow should contain the same match fields as sub_flow 1
with actions derived from a combination of sub_flows 1 and 2. This action
list should have all actions from sub_flow 1 with the exception of the
output action that triggered the 'implicit recirculation' by sending to
an internal port, followed by all actions of sub_flow 2. Any pre-actions
in either sub_flow should feature at the start of the action list.
Add code to generate a new merge flow and populate the match and actions
fields based on the sub_flows. The offloading of the flow is left to
future patches.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two flows can be merged if the second flow (after recirculation) matches
on bits that are either matched on or explicitly set by the first flow.
This means that if a packet hits flow 1 and recirculates then it is
guaranteed to hit flow 2.
Add a 'can_merge' function that determines if 2 sub_flows in a merge hint
can be validly merged to a single flow.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a merge hint is received containing 2 flows that are matched via an
implicit recirculation (sending to and matching on an internal port), fw
reports that the flows (called sub_flows) may be able to be combined to a
single flow.
Add infastructure to accept and process merge hint messages. The actual
merging of the flows is left as a stub call.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each flow is given a context ID that the fw uses (along with its cookie)
to identity the flow. The flows stats are updated by the fw via this ID
which is a reference to a pre-allocated array entry.
In preparation for flow merge code, enable the nfp_fl_payload structure to
be accessed via this stats context ID. Rather than increasing the memory
requirements of the pre-allocated array, add a new rhashtable to associate
each active stats context ID with its rule payload.
While adding new code to the compile metadata functions, slightly
restructure the existing function to allow for cleaner, easier to read
error handling.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The neighbour table in the FW only accepts next hop entries if the egress
port is an nfp repr. Modify this to allow the next hop to be an internal
port. This means that if a packet is to egress to that port, it will
recirculate back into the system with the internal port becoming its
ingress port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW may receive a packet with its ingress port marked as an internal port.
If a rule does not exist to match on this port, the packet will be sent to
the NFP driver. Modify the flower app to detect packets from such internal
ports and convert the ingress port to the correct kernel space netdev.
At this point, it is assumed that fallback packets from internal ports are
to be sent out said port. Therefore, set the redir_egress bool to true on
detection of these ports.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, it is assumed that fallback packets will be from reprs. Modify
this to allow an app to receive non-repr ports from the fallback channel -
e.g. from an internal port. If such a packet is received, do not update
repr stats.
Change the naming function calls so as not to imply it will always be a
repr netdev returned. Add the option to set a bool value to redirect a
fallback packet out the returned port rather than RXing it. Setting of
this bool in subsequent patches allows the handling of packets falling
back when they are due to egress an internal port.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent FW modifications allow the offloading of non repr ports. These
ports exist internally on the NFP. So if a rule outputs to an 'internal'
port, then the packet will recirculate back into the system but will now
have this internal port as it's incoming port. These ports are indicated
by a specific type field combined with an 8 bit port id.
Add private app data to assign additional port ids for use in offloads.
Provide functions to lookup or create new ids when a rule attempts to
match on an internal netdev - the only internal netdevs currently
supported are of type openvswitch. Have a netdev notifier to release
port ids on netdev unregister.
OvS offloads rules that match on internal ports as TC egress filters.
Ensure that such rules are accepted by the driver.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Write to a FW symbol to indicate that the driver supports flow merging. If
this symbol does not exist then flow merging and recirculation is not
supported on the FW. If support is available, add a stub to deal with FW
to kernel merge hint messages.
Full flow merging requires the firmware to support of flow mods. If it
does not, then do not attempt to 'turn on' flow merging.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting vport->bw_limit to hdev->tm_info.pg_info[0].bw_limit
in hclge_tm_vport_tc_info_update, vport->bw_limit can be as big as
HCLGE_ETHER_MAX_RATE (100000), which can not fit into u16 (65535).
So this patch fixes it by using u32 for vport->bw_limit.
Fixes: 848440544b ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameter "proto" in function hclge_set_vlan_filter_hw()
is asked to be __be16, but got u16 when calling it in function
hclge_update_port_base_vlan_cfg().
This patch fixes it by converting it with htons().
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 21e043cd81 ("net: hns3: fix set port based VLAN for PF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In a similar fashion to routes and FDB entries, the neighbour table is
reflected to the device.
Set an offload indication on the neighbour in case it was programmed to
the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next patch will add offload indication to neighbours, but the indication
should only be altered in case the neighbour was successfully added to /
deleted from the device.
Propagate neighbour update errors, so that they could be taken into
account by the next patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcrPOfAAoJEEg/ir3gV/o+c1sIAIuVUmF95OK6BxrNxQ31HN7i
0V/OW29V6B5musqyGXVa90nl9wJ9BE2tmtHsg2HPABXdGdiYhNRP7Tm+aq+QYBe3
8kJVk5U+HCLeHvf9k3dpJZokMzAgEhuWAbuAE1YelYUtbOXO9Zrj2uTL1NHJTYyc
SNOg9+gATOMsOAuiUyygN0XMoYESTsUE7UH4tuhyYr44cKR85qOQDPAlcDEHGTfO
uHWwmOznZqFVJUVyfwtEkTojsxNiW+QA2PR5faX/+eI7746qXOAzYq2JSjtNEyTz
4xB9a+t47xpGDw4Svwu51pDw+4Uiiy1Yv0kOKKpBqrCk892bZ8l1gWcHRgjYx/8=
=9wkB
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-04-09
This series provides some fixes to mlx5 driver.
I've cc'ed some of the checksum fixes to Eric Dumazet and i would like to get
his feedback before you pull.
For -stable v4.19
('net/mlx5: FPGA, tls, idr remove on flow delete')
('net/mlx5: FPGA, tls, hold rcu read lock a bit longer')
For -stable v4.20
('net/mlx5e: Rx, Check ip headers sanity')
('Revert "net/mlx5e: Enable reporting checksum unnecessary also for L3 packets"')
('net/mlx5e: Rx, Fixup skb checksum for packets with tail padding')
For -stable v5.0
('net/mlx5e: Switch to Toeplitz RSS hash by default')
('net/mlx5e: Protect against non-uplink representor for encap')
('net/mlx5e: XDP, Avoid checksum complete when XDP prog is loaded')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate the overflow handling from the hardware interrupt status analysis.
The interrupt status is a single register and is common for all PFs. The
first PF reading the register is not necessarily the one who overflowed.
All PFs must check their overflow status on every attention.
In this change we clear the sticky indication in the attention handler to
allow doorbells to be processed again as soon as possible, but running
the doorbell recovery is scheduled for the periodic handler to reduce the
time spent in the attention handler.
Checking the need for DORQ flush was changed to "db_bar_no_edpm" because
qed_edpm_enabled()'s result could change dynamically and might have
prevented a needed flush.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the DORQ (doorbell block) is overflowed, all PFs get attentions at the
same time. If one PF finished handling the attention before another PF even
started, the second PF might miss the DORQ's attention bit and not handle
the attention at all.
If the DORQ attention is missed and the issue is not resolved, another
attention will not be sent, therefore each attention is treated as a
potential DORQ attention.
As a result, the attention callback is called more frequently so the debug
print was moved to reduce its quantity.
The number of periodic doorbell recovery handler schedules was reduced
because it was the previous way to mitigating the missed attention issue.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the condition which verifies that doorbell address is inside the
doorbell bar by checking that the end of the address is within range
as well.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DB_REC_DRY_RUN (running doorbell recovery without sending doorbells) is
never used. DB_REC_ONCE (send a single doorbell from the doorbell recovery)
is not needed anymore because by running the periodic handler we make sure
we check the overflow status later instead.
This patch is needed because in the next patches, the only doorbell
recovery type being used is DB_REC_REAL_DEAL, and the fixes are much
cleaner without this enum.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This check isn't really needed and we can simplify the code and save
some CPU cycles by removing it. Only in case of an error none of these
bits are set, and calling the NAPI callback doesn't hurt in this case.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a function pointer array makes this easier to read and better
maintainable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using a function pointer array makes this easier to read and better
maintainable. AFAIK function pointer arrays cause some performance
drawback due to Spectre mitigation, but we're not in a hot path here.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes some redundant BH disable when initializing
and uninitializing command queue.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is pending skb in RX flow when close the port, and the
pending buffer is not cleaned, the new packet will be added to
the pending skb when the port opens again, and the first new
packet has error data.
This patch cleans the pending skb when clean RX ring.
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some cases, PHY may not be connected to MDIO bus, then
the driver will initialize fail since MDIO bus initialization
fails.
This patch fixes it by skipping the MDIO bus initialization
when PHY is inexistent.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware description, reset level that should be
triggered are not consistent in a module. For example, in SSU
common errors, the first two bits has no need to do reset,
but the other bits need global reset.
This patch sets separate reset level for all RAS and MSI-X
interrupts by adding a reset_lvel field in struct hclge_hw_error,
and fixes some incorrect reset level.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently hardware may have not enough buffer to receive packet
when it has used more than two MPS(maximum packet size) of
buffer, but there are still a lot of shared buffer left unused
when TC num is small.
This patch divides shared buffer to be used between TC when
the port supports DCB, and adjusts the waterline and threshold
according to user manual for the port that does not support
DCB.
This patch also change hclge_get_tc_num's return type to u32
to avoid signed-unsigned mix with divide.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently RX shared buffer' threshold size for speific TC is
set to smaller value when the TC's PFC is not enabled, which may
cause performance problem because hardware may not have enough
hardware buffer when PFC is not enabled.
This patch sets the same threshold size for all TC no matter if
the specific TC's PFC is enabled.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a GRO packet is received by driver, the cwr field in the
struct tcphdr needs to be checked to decide whether to set the
SKB_GSO_TCP_ECN for skb_shinfo(skb)->gso_type.
So this patch adds hns3_gro_complete to do that, and adds the
hns3_handle_bdinfo to handle the hns3_gro_complete and
hns3_rx_checksum.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the parameters of hns3_rx_checksum to be more specific to
what is used internally, rather than passing in a pointer to the
whole hns3_desc. Reduces duplicate code and bring this function
inline with the approach used in hns3_set_gro_param.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original codes, ndo_set_vf_vlan() in hns3 driver was implemented
wrong. It adds or removes VLAN into VLAN filter for VF, but VF is
unaware of it.
This patch fixes it. When VF loads up, it firstly queries the port
based VLAN state from PF. When user change port based VLAN state
from PF, PF firstly checks whether the VF is alive. If the VF is
alive, then PF notifies the VF the modification; otherwise PF
configure the port based VLAN state directly.
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original codes, ndo_set_vf_vlan() in hns3 driver was implemented
wrong. It adds or removes VLAN into VLAN filter for VF, but VF is
unaware of it.
Indeed, ndo_set_vf_vlan() is expected to enable or disable port based
VLAN (hardware inserts a specified VLAN tag to all TX packets for a
specified VF) . When enable port based VLAN, we use port based VLAN id
as VLAN filter entry. When disable port based VLAN, we use VLAN id of
VLAN device.
This patch fixes it for PF, enable/disable port based VLAN when calls
ndo_set_vf_vlan().
Fixes: 46a3df9f97 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, in TX direction, driver implements the TX VLAN offload
by checking the VLAN header in skb, and filling it into TX descriptor.
Usually it works well, but if enable inserting VLAN header based on
port, it may conflict when out_tag field of TX descriptor is already
used, and cause RAS error.
In RX direction, hardware supports stripping max two VLAN headers.
For vlan_tci in skb can only store one VLAN tag, when RX VLAN offload
enabled, driver tells hardware to strip one VLAN header from RX
packet; when RX VLAN offload disabled, driver tells hardware not to
strip VLAN header from RX packet. Now if port based insert VLAN
enabled, all RX packets will have the port based VLAN header. This
header is useless for stack, driver needs to ask hardware to strip
it. Unfortunately, hardware can't drop this VLAN header, and always
fill it into RX descriptor, so driver has to identify and drop it.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Our hardware supports inserting a specified VLAN header for each
function when sending packets. User can enable it with command
"ip link set <devname> vf <vfid> vlan <vlan id>".
For this VLAN header is inserted by hardware, not from stack,
hardware also needs to strip it from received packets before
sending to stack. In this case, driver needs to tell
hardware which VLAN to insert or strip.
The current VLAN initialization doesn't allow inserting
VLAN header by hardware, this patch modifies it, in order be
compatible with VLAN inserted base on port.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF's control message handler seems like a good base to built
on for request-reply control messages. Split it out to allow
for reuse.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During probe we clear vNIC configuration in case the device
wasn't closed cleanly by previous driver. Move that code
before netdev init, so netdev init can already try to apply
its config parameters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Soon we will try to write to the vNIC mailbox without RTNL held.
Add a new mutex to protect access to specific parts of the PCI
control BAR.
Move the mailbox size checking to the mailbox lock() helper, where
it can be more effective (happen prior to potential overwrite of
other data).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the reconfig was a quick update, we could have results available from
firmware within 200us.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.
Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.
Addresses-Coverity: ("Write to pointer after free")
Fixes: 035744975a ("qede: Add support for PTP resource locking.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.
Addresses-Coverity: ("Use after free")
Fixes: 528f727279 ("vxge: code cleanup and reorganization")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some SOC like i.MX6SX clock have some limits:
- ahb clock should be disabled before ipg.
- ahb and ipg clocks are required for MAC MII bus.
So, move the ahb clock to runtime management together with
ipg clock.
Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The thunderx driver forbids to load an eBPF program if the MTU is too high,
but this can be circumvented by loading the eBPF, then raising the MTU.
Fix this by limiting the MTU if an eBPF program is already loaded.
Fixes: 05c773f52b ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The thunderx driver splits frames bigger than 1530 bytes to multiple
pages, making impossible to run an eBPF program on it.
This leads to a maximum MTU of 1508 if QinQ is in use.
The thunderx driver forbids to load an eBPF program if the MTU is higher
than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
protocols which need some more headroom.
Fixes: 05c773f52b ("net: thunderx: Add basic XDP support")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While determining offload capabilities of backing hardware during
a device reset, the driver is clobbering current feature settings.
Update hw_features on reset instead of features unless a feature
is enabled that is no longer supported on the current backing device.
Also enable features that were not supported prior to the reset but
were previously enabled or requested by the user.
This can occur if the reset is the result of a carrier change, such
as a device failover or partition migration.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable Generic Receive Offload in the ibmvnic driver.
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-1, when a multicast packet is admitted to the shared buffer
it increases the quotas of all the ports and {port, TC} to which it is
forwarded to.
The above means that multicast packets are accounted multiple times in
the shared buffer and can therefore cause the associated shared buffer
pool to fill up very quickly.
To work around this issue, commit e83c045e53 ("mlxsw:
spectrum_buffers: Configure MC pool") added a dedicated multicast pool
in which multicast packets are accounted.
The issue is not present in Spectrum-2, but in order to be backward
compatible with Spectrum-1, its default behavior is to allow a multicast
packet to increase multiple egress quotas instead of one.
Until the new (non-backward compatible) mode is supported, configure a
dedicated multicast pool as in Spectrum-1.
Fixes: fe099bf682 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 74bc993974 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
addresses") enabled the driver to veto router interface (RIF) MAC
addresses that it cannot support.
This check should only be performed for interfaces for which the driver
actually configures a RIF. A VRF upper is not one of them, so ignore it.
Without this patch it is not possible to set an IP address on the VRF
device and use it as a loopback.
Fixes: 74bc993974 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The workqueue is used to periodically update the networking stack about
activity / statistics of various objects such as neighbours and TC
actions.
It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag.
Fixes: 3d5479e920 ("mlxsw: core: Remove deprecated create_workqueue")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The EMAD workqueue is used to handle retransmission of EMAD packets that
contain configuration data for the device's firmware.
Given the workers need to allocate these packets and that the code is
not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
flag.
Fixes: d965465b60 ("mlxsw: core: Fix possible deadlock")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a new prio in the FDB, it will be used when inserting steering rules
into the FDB from the RDMA side. We create a new PRIO so rules from the
net side and rules from the RDMA side won't be inserted to the same PRIO,
each side has it's own sandbox to play in.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
When creating the FDB prios, use the enum values already defined and not
the hardcoded values.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Although XOR hash function can perform very well on some special use
cases, to align with all drivers, mlx5 driver should use Toeplitz hash
by default.
Toeplitz is more stable for the general use case and it is more standard
and reliable.
On top of that, since XOR (MLX5_RX_HASH_FN_INVERTED_XOR8) gives only a
repeated 8 bits pattern. When used for udp tunneling RSS source port
manipulation it results in fixed source port, which will cause bad RSS
spread.
Fixes: 2be6967cdb ("net/mlx5e: Support ETH_RSS_HASH_XOR")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>