Never used, remove it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 controller is able to support 2.5G speeds, allowing to use
2.5GBASET in conjunction with PHYs that use 2500BASEX as their MII
interface when using this mode.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is no longer necessary after a5084bb71f ("nfp: Implement
ndo_get_port_parent_id()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
eth_change_mtu() is not needed any more, the networking subsystem will
call it automatically when this callback is not implemented.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Hopefully the last pull request for this release. Fingers crossed:
1) Only refcount ESP stats on full sockets, from Martin Willi.
2) Missing barriers in AF_UNIX, from Al Viro.
3) RCU protection fixes in ipv6 route code, from Paolo Abeni.
4) Avoid false positives in untrusted GSO validation, from Willem de
Bruijn.
5) Forwarded mesh packets in mac80211 need more tailroom allocated,
from Felix Fietkau.
6) Use operstate consistently for linkup in team driver, from George
Wilkie.
7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
and PF code paths.
8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.
9) nfp eBPF code gen fixes from Jiong Wang.
10) bnxt_en firmware timeout fix from Michael Chan.
11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.
12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
net: phy: realtek: Dummy IRQ calls for RTL8366RB
tcp: repaired skbs must init their tso_segs
net/x25: fix a race in x25_bind()
net: dsa: Remove documentation for port_fdb_prepare
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
bpf, doc: add bpf list as secondary entry to maintainers file
udp: fix possible user after free in error handler
udpv6: fix possible user after free in error handler
fou6: fix proto error handler argument type
udpv6: add the required annotation to mib type
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Documentation: networking: switchdev: Update port parent ID section
...
This series adds some misc updates to mlx5 driver,
1) Eli Britstein, Introduces tunnel entropy control from PCMR register
and fixes GRE key by controlling port tunnel entropy calculation.
2) Eran Ben Elisha, provides some mlx5 fixes to the latest tx devlink health
reporting mechanism.
3) Huy Nguyen, Added the support for ndo bridge_setlink to allow
VEPA/VEB E-Switch legacy mode configurations.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJccGvRAAoJEEg/ir3gV/o+6kIH/1Y23h0OZrKkkfELujtzoOjx
64BIRhW8Tr+WOXZNKlSvdRkPj0g1yG3B0NO7wChXZDl8dgqBp/Jx5Wxcvb5DKkz/
wK07oKB6KqgcMILzUe5G4iOHw/atMldzSA6sNPh8SDKVKKkXjMKz8ocQZZQRsAaX
1im2D77pX8zE/OhG86Fn23Fed1G6Lg++faPC8GEOxv9LkwaAARa5NfJ6g7AnXOGg
ijmM4R2Hhu0V0YaKXS2qJFb6HQaNO9JKYR1C7wF7z9yqnsn14cdxrAWkE4eA2JnP
zKnDt+ThAKAoX9mn5QHzD8uye56Jeksvo8br4dCH4iDZZSSZT+lo456Dfe+tXpc=
=ef79
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-02-21
This series adds some misc updates to mlx5 driver,
1) Eli Britstein, Introduces tunnel entropy control from PCMR register
and fixes GRE key by controlling port tunnel entropy calculation.
2) Eran Ben Elisha, provides some mlx5 fixes to the latest tx devlink health
reporting mechanism.
3) Huy Nguyen, Added the support for ndo bridge_setlink to allow
VEPA/VEB E-Switch legacy mode configurations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Spectrum-2 ASIC support for the following new port types and speeds:
* 50Gbps 1-lane
* 100Gbps 2-lanes
* 200Gbps 4-lanes
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Spectrum-2 ASIC port type-speed operations.
Since multiple ethtool link modes are represented using a single bit in the
ASIC, the driver forces the user to configure all types per a specific
speed. For example, if the user wants to advertise 100Gbps 4-lanes speed,
he should advertise all the types of 100Gbps 4-lanes speed that are
supported by the ASIC as shown below:
Supported ethtool bits for 100Gbps 4-lanes:
0x1000000000 100000baseKR4 Full
0x2000000000 100000baseSR4 Full
0x4000000000 100000baseCR4 Full
0x8000000000 100000baseLR4_ER4 Full
Command for advertising 100Gbps 4-lanes:
ethtool -s enp3s0np1 advertise 0xF000000000
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PTYS register introduces a new layout for port type-speed fields. These
fields extend the existing ones in order to handle more types and speeds.
For example, the new 200Gbps speed.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename p_eth_proto_adm to p_eth_proto_admin in mlxsw_reg_ptys_eth_unpack
function.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add port type-speed operations in order to have different operations for
different ASICs. For now, both ASICs use the same pointer.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename port speed-type functions to be Spectrum-1 ASIC specific.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of deriving the port connector type from port admin state, query it
from firmware.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove eth_proto_lp_advertise field in PTYS register since it is not
supported by the firmware.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove duplicate port link mode entry from mlxsw_sp_port_link_mode.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cards_found is a static variable, but when it enters atl2_probe(),
cards_found is set to zero, the value is not consistent with last probe,
so next behavior is not our expect.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide precision hints to snprintf() since we know the destination
buffer size of the RX/TX ring names are IFNAMSIZ + 5 - 1. This fixes the
following warnings:
drivers/net/ethernet/intel/e1000e/netdev.c: In function
'e1000_request_msix':
drivers/net/ethernet/intel/e1000e/netdev.c:2109:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
"%s-rx-0", netdev->name);
^
drivers/net/ethernet/intel/e1000e/netdev.c:2107:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->rx_ring->name,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(adapter->rx_ring->name) - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-rx-0", netdev->name);
~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/intel/e1000e/netdev.c:2125:13: warning: 'snprintf'
output may be truncated before the last format character
[-Wformat-truncation=]
"%s-tx-0", netdev->name);
^
drivers/net/ethernet/intel/e1000e/netdev.c:2123:3: note: 'snprintf'
output between 6 and 21 bytes into a destination of size 20
snprintf(adapter->tx_ring->name,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(adapter->tx_ring->name) - 1,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%s-tx-0", netdev->name);
~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Give precision identifiers to the two snprintf() formatting the priority
and TC strings to avoid producing these two warnings:
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_prio_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:37: warning: '%d'
directive output may be truncated writing between 1 and 3 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2132:3: note: 'snprintf'
output between 3 and 36 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_prio_stats[i].str, prio);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c: In function
'mlxsw_sp_port_get_tc_strings':
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:37: warning: '%d'
directive output may be truncated writing between 1 and 11 bytes into a
region of size between 0 and 31 [-Wformat-truncation=]
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~
drivers/net/ethernet/mellanox/mlxsw/spectrum.c:2143:3: note: 'snprintf'
output between 3 and 44 bytes into a destination of size 32
snprintf(*p, ETH_GSTRING_LEN, "%s_%d",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mlxsw_sp_port_hw_tc_stats[i].str, tc);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-02-23
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a bug in BPF's LPM deletion logic to match correct prefix
length, from Alban.
2) Fix AF_XDP teardown by not destroying umem prematurely as it
is still needed till all outstanding skbs are freed, from Björn.
3) Fix unkillable BPF_PROG_TEST_RUN under preempt kernel by checking
signal_pending() outside need_resched() condition which is never
triggered there, from Stanislav.
4) Fix two nfp JIT bugs, one in code emission for K-based xor, and
another one to explicitly clear upper bits in alu32, from Jiong.
5) Add bpf list address to maintainers file, from Daniel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The code waits up to 20 usec for the firmware response to complete
once we've seen the valid response header in the buffer. It turns
out that in some scenarios, this wait time is not long enough.
Extend it to 150 usec and use usleep_range() instead of udelay().
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic that polls for the firmware message response uses a shorter
sleep interval for the first few passes. But there was a typo so it
was using the wrong counter (larger counter) for these short sleep
passes. The result is a slightly shorter timeout period for these
firmware messages than intended. Fix it by using the proper counter.
Fixes: 9751e8e714 ("bnxt_en: reduce timeout on initial HWRM calls")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP BPF JIT compiler is doing a couple of small optimizations when jitting
ALU imm instructions, some of these optimizations could save code-gen, for
example:
A & -1 = A
A | 0 = A
A ^ 0 = A
However, for ALU32, high 32-bit of the 64-bit register should still be
cleared according to ISA semantics.
Fixes: cd7df56ed3 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The intended optimization should be A ^ 0 = A, not A ^ -1 = A.
Fixes: cd7df56ed3 ("nfp: add BPF to NFP code translator")
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow enabling VEPA mode on the HCA's port in legacy devlink mode.
Example:
bridge link set dev ens1f0 hwmode vepa
will turn on VEPA mode on the netdev ens1f0.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Virtual Ethernet Port Aggregator (VEPA) mode, the packet skips
the system internal virtual switch and forwards to external network
switch. In Mellanox HCA case, the virtual switch is the HCA's Eswitch.
To support this, an new FDB flow table are created with level 0 and
linked to the existing FDB flow table in legacy mode. By default,
VEPA is turned off and this FDB flow table is empty. When VEPA is
turned on, two rules are created. One rule to forward on uplink vport
traffic to the legacy FDB. The other rule forward all other traffic
to uplink vport.
Other design alternatives were not chosen as explained below:
1. Create a forward rule in ACL flow table (most efficient design).
This approach is the not chosen because firmware does not support
forward rule to uplink vport (0xffff) for ACL flow table.
2. Add additional source port criteria in all the FDB rules to make the
FDB rules to be received rules only. This approach is not chosen because
it is not efficient as there can many rules in the FDB and VEPA mode
cannot be controlled per vport.
3. Add a highest prioirty flow group in the existing legacy FDB Flow
Table instead of a new flow table. This approoach does not work because the
new flow group has the same match criteria as the promiscuous flow group
and mlx5_add_flow_rules does not allow specifying flow group.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If reporter is ERR_PTR or NULL, error code shall be returned. At all other
cases it shall return success. Fix that.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In case of lost interrupt recover, we shall return success. Fix that.
Fixes: 7d91126b1a ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reported-by: Maria Pasechnik <mariap@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When TX reporter was introduced, it took ownership over TX timeout error
handling. this introduced a regression in case TX reporter is not valid
(NET_DEVLINK is not set, or devlink_health_reporter_create failure).
Fix mlx5e_tx_reporter_timeout function so it can be called at all times.
In addition, remove a warning print that indicates that a TX timeout won't
be handled in case of no valid TX reporter.
Fixes: 7d91126b1a ("net/mlx5e: Add tx timeout support for mlx5e tx reporter")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Print warning message in case of TX reporter creation failure, only if the
return value is ERR_PTR type. NULL pointer return indicates that
NET_DEVLINK is not set, and the warning print can be skipped.
Fixes: de8650a820 ("net/mlx5e: Add tx reporter support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Flow entropy is calculated on the inner packet headers and used for
flow distribution in processing, routing etc. For GRE-type
encapsulations the entropy value is placed in the eight LSB of the key
field in the GRE header as defined in NVGRE RFC 7637. For UDP based
encapsulations the entropy value is placed in the source port of the
UDP header.
The hardware may support entropy calculation specifically for GRE and
for all tunneling protocols. With commit df2ef3bff1 ("net/mlx5e: Add
GRE protocol offloading") GRE is offloaded, but the hardware is
configured by default to calculate flow entropy so packets transmitted
on the wire have a wrong key. To support UDP based tunnels (i.e VXLAN),
GRE (i.e. no flow entropy) and NVGRE (i.e. with flow entropy) the
hardware behaviour must be controlled by the driver.
Ensure port entropy calculation is enabled for offloaded VXLAN tunnels
and disable port entropy calculation in the presence of offloaded GRE
tunnels by monitoring the presence of entropy enabling tunnels (i.e
VXLAN) and entropy disabing tunnels (i.e GRE).
Fixes: df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently changing a PCMR field is done by setting the field in a
zeroed buffer, zeroing other unrelated fields.
Fix this behaviour by modifying only the required field after first
reading the current register values, as a pre-step towards using more
fields in PCMR register.
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With the device tree changes configuring CPSW with a proper PHY driver,
we want to deprecate the old driver to avoid new users for it.
Note that this driver is based on the related dts changes.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEkgNvrZJU/QSQYIcQG9Q+yVyrpXMFAlxwEQ8RHHRvbnlAYXRv
bWlkZS5jb20ACgkQG9Q+yVyrpXOt8Q/7BPvd7UTQy/BK41M1A4NXLX4ojxx+DhEs
vMmGGgRfhfkaVOlN1IWF+A1TgqgUDXfIiAANspZ18hsxNvMihZPklofKO71LM5dm
FjGk4lfLZT2yiFfN03CfgqHJOrph+T2LZ05FOWi8iYwEEvMYVYiCfUjuv/gN2tPT
uMOP8bpibbehNkJSPRDIhpXAlgR1VDFGZU8SKAsxRY69RrgwTy9Pzy4IUHijmKEY
pTX6oF8XR5aXOELRCOX3gBN4YZCz+KmI+KurQjo/pYmVUyAv7uBQrypbevbDNiRW
Qya0Sco3YhGHvkTZDGnrTAZ+ocaG7PzEIiqTUP0YY+AD9MsTrMI6bTq6rp/BU01w
YucV2C0gJdlU9HRhyS0vXAd/n3slNdU9UCtvN7Ezn0zDfhLFpJL0Wf9u4unPWefm
x9vp069fdJOES+Vh3n14FEwzH54do+N918MatX01hj6zkmy67ilJrXR2j8NItxxn
6VxLI7v9Lit5Q6LqEfFdEQ3Pfbfi03HLpg559lFAngzdGbHtLYKunWX05ngcBVo3
WLz+7K/GzsRI3nE/6xjDmwXB6hF5yUJZlcpWSQwzlRWgsPva3TZv5s2yp1rDDbW9
uolMu8fO62PKai1c2l0AyJRwYM7AmWARWd+JFUmcBLdbPNG9mLDYrkqBwRMujqqG
/3dKR8vG5Rw=
=1Hb4
-----END PGP SIGNATURE-----
Merge tag 'omap-for-v5.1/cpsw-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/late
One change to deprecate old CPSW Ethernet PHY mode selection driver
With the device tree changes configuring CPSW with a proper PHY driver,
we want to deprecate the old driver to avoid new users for it.
Note that this driver is based on the related dts changes.
* tag 'omap-for-v5.1/cpsw-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
net: ethernet: ti: cpsw: deprecate cpsw-phy-sel driver
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
After AF_PACKET is fixed to calculate the transport header offset
correctly, trust the value set by the kernel. If the offset wasn't set,
it means there is no transport header in the packet.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb_transport_offset() == 0 is not a special value. The only special
value is when skb->transport_header is ~0U, and it's checked by
skb_transport_header_was_set().
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since link change polling routine was moved to nicvf side,
we don't need anymore polling function at nicpf side along
with link status info for all enabled Vfs as at VF side
this info is already tracked.
This commit is to remove unnecessary code & fields from
nicpf structure.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the link change polling task to VF side in order to
prevent races between VF and PF while sending link change
message(s). This commit is to implement link change request
to be initiated by VF.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases it could happen that nicvf_send_msg_to_pf() could be called
concurrently for the same NIC VF, and thus re-writing mailbox contents and
breaking messaging sequence with PF by re-writing NICVF data.
This commit is to implement mutex for NICVF to protect mailbox registers
and NICVF messaging control data from concurrent access.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To communicate to PF each of ThunderX NIC VF uses mailbox which is
pair of 64 bit registers available to both VFn and PF.
This commit is to change the xcast message structure in order to
fit it into 64 bit.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_set_mode invokes number of messages to be send to PF for receive
mode configuration. In case if there any issues we need to stop sending
messages and release allocated memory.
This commit is to implement check of nicvf_msg_send_to_pf() result.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the end of NIC VF initialization VF sends CFG_DONE message to PF without
using nicvf_msg_send_to_pf routine. This potentially could re-write data in
mailbox. This commit is to implement common way of sending CFG_DONE message
by the same way with other configuration messages by using
nicvf_send_msg_to_pf() routine.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Having one work queue for receive mode configuration ndo_set_rx_mode()
call for all VFs results in making each of them wait till the
set_rx_mode() call completes for another VF if any of close, set
receive mode and change flags calls being already invoked. Potentially
this could cause device state change before appropriate call of receive
mode configuration completes, so the call itself became meaningless,
corrupt data or break configuration sequence.
We don't need any delays in NIC VF configuration sequence so having delayed
work call with 0 delay has no sense.
This commit is to implement one work queue for each NIC VF for set_rx_mode
task and to let them work independently and replacing delayed_work
with work_struct.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Correct STREERING to STEERING at macro name for BGX steering register.
Signed-off-by: Vadim Lomovtsev <vlomovtsev@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three smallish patches fixing regressions in v5.0:
- Fix cxgb4 to work again with non-4k page sizes
- NULL pointer oops in SRP during sg_reset
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlxvKuwACgkQOG33FX4g
mxrNExAAiiCqJlQ9ZaEnYKQZydRn+JZjkjfoxTumqhNhTRFXgj/1tgLuVf2xNM6N
iNroyZS8zvZMZvGlrhTRZIBBXQizhkUUCwnLV1BIJw4b8VBIyWgF6iQUCBLcM9E6
wG18U3ySGzIUccBcD6fn+yHd1xMST7wEtU5cgqeGFLpGfT4rwS6n1vQia0gokG/i
94qhifxvz4MIt66v95JnFaXM4n/funVL36DHGiDMglqmpE7aujUyMuL4qDQfsRue
HDoxeH8b3eMxkmZ3UQPVZhwbF4SqQw6Zjb7vFepHU4dFtbZgUZMZKNTNYKD9oI3d
P5KGCtOg0PiDKqT+DAYda+7BrykBYHTQZwuUzAoh5Vyv19UiwaPQn34Ub7EX+VvX
yhJcWnh8eDNkjVd0tNMZn9LxYjm48WyTOg4qUWIIo512HfsMua5jCuWMA1xRNZ1u
kh9rY1oaTdtWPA8n1HbK0w6L3d6e6oiXZXlVdLUSnGXEyo8vEoGKckUOXxb1e0Up
h8wAyisbZ2oQi8+d1hfyQ8EFOLzibjHmbR0IhM72/rhrIAEIC2Aj/F0ej3gwi49B
Vv324j3LbdSNnCzeQnVeU/PfQkuc8haFNnyNgakN4srNmZKAfT/o6aoGWrUaLwdF
aYvmYhsBUBGdxUld9qYItjpEudew6g1LaBJ71D5NplUvcz9S5iU=
=JFU9
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Small set of three regression fixing patches, things are looking
pretty good here.
- Fix cxgb4 to work again with non-4k page sizes
- NULL pointer oops in SRP during sg_reset"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
iw_cxgb4: cq/qp mask depends on bar2 pages in a host page
cxgb4: Export sge_host_page_size to ulds
RDMA/srp: Rework SCSI device reset handling
A missing break keyword should have been added after adding support for
PRE_BRIDGE_FLAGS.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 93700458ff ("rocker: Check Handle PORT_PRE_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the hardware's description, the driver should clear
the command queue's registers when uloading VF driver. Otherwise,
these existing value may lead the IMP get into a wrong state.
Fixes: fedd0c15d2 ("net: hns3: Add HNS3 VF IMP(Integrated Management Proc) cmd interface")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to the hardware's description, the driver should clear
the command queue's registers when uloading driver. Otherwise,
these existing value may lead the IMP get into a wrong state.
Also this patch adds hclge_cmd_uninit() to do the command queue
uninitialization which includes clearing registers and freeing
memory.
Fixes: 68c0a5c706 ("net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Record the vlan tables that the VF sends to the chip.
After the VF exception, the PF actively clears the VF to chip config.
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Record the unicast and multicast tables that the VF sends to the chip.
After the VF exception, the PF actively clears the VF to chip config.
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modify print message of 6th bit of ppp mpf abnormal errors,
there is a extra letter e in it.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These bits are enabled now and have been test.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 3rd and 4th of PPU(RCB) PF Abnormal is RAS errors instead of MSI-X
like other bits. This patch adds process of handling and logging this
two bits. Otherwise, this patch modifies print message of 28th and 29th
bit of PPU MPF Abnormal errors, which keep same with other errors now.
Fixes: f69b10b317 ("net: hns3: handle hw errors of PPU(RCB)")
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch add information of specific bit in log to be consistent
with other type of errors, so that we can know which memory of ssu
has occurred a ecc ras errors.
Signed-off-by: Weihang Li <liweihang@hisilicon.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In original codes, for copper port which doesn't connect to phy,
it always returns -EOPNOTSUPP when query port information. This
patch fixes it by return the port information of MAC.
Fixes: 5f373b1585 ("net: hns3: Fix speed/duplex information loss problem when executing ethtool ethx cmd of VF")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link mode with bits has been up to more than 31 for some MAC
and phy. Convert to using a linkmode bitmap, which can support all
link modes.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hnae3_register_ae_dev(), ae_algo->ops is assigned to ae_dev->ops
before check that ae_algo->ops is valid.
And in hnae3_register_ae_algo(), missing check for ae_algo->ops.
This patch fixes them.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions are exported, add pointer checking at the beginning
can make them more safe.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
cap_max_headroom_size holds maximum headroom size supported.
Overstepping that limit might under certain conditions lead to ASIC
freeze.
Query and store the value, and add mlxsw_sp_sb_max_headroom_cells() for
obtaining the stored value. In __mlxsw_sp_port_headroom_set(), reject
requests where the total port buffer is larger than the advertised
maximum.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recommendation for headroom size for 100Gbps port and 100m cable is
101.6KB, reduced accordingly for split ports. The closest higher number
evenly divisible by cell size for both Spectrum-1 and Spectrum-2, and
such that the number of cells can be further divided by maximum split
factor of 4, is 102528 bytes, or 25632 bytes per lane.
Update mlxsw_sp_port_pb_init() to compute the headroom taking into
account this recommended per-lane value and number of lanes actually
dedicated to a given port.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Customize the tables related to shared buffer configuration to match the
current recommendation for Spectrum-2 systems.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBMM register configures the shared buffer quota for MC packets
according to Switch-Priority. The default configuration depends on the
chip type. Therefore keep the table and length in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBCM register configures shared buffer quota according to
port-priority resp. port-TC. The default configuration depends on the
chip type. Therefore keep the tables and their lengths in struct
mlxsw_sp_sb_vals. Redirect the references from the global definitions to
the fields.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBPR register configures shared buffer pools. The default
configuration depends on the chip type. Therefore keep it in struct
mlxsw_sp_sb_vals. Redirect the one reference from the global array to
the field.
Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPR array.
Drop the now useless MLXSW_SP_SB_PRS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The SBPM register can be used to configure quotas for packets ingressing
from a certain pool to a certain port, and egressing from a certain pool
to a certain port. The default configuration depends on the chip type.
Therefore keep it in struct mlxsw_sp_sb_vals. Redirect the one reference
from the global array to the field.
Because the pool descriptor ID is implicit in the ordering of array
members, both this array and the pool descriptor array have the same
length. Therefore reuse mlxsw_sp_sb.pool_dess_len for the purpose of
determining the length of SBPM array.
Drop the now useless MLXSW_SP_SB_PMS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep the table of pool descriptors and its length in struct
mlxsw_sp_sb_vals so that it can be specialized per chip type. Redirect
all users from the global definitions to the mlxsw_sp_sb fields.
Give mlxsw_sp_pool_count() an extra mlxsw_sp parameter so that it can
access the descriptor table.
Drop the now unnecessary MLXSW_SP_SB_POOL_DESS_LEN.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-2 will be configured with a different set of pools than
Spectrum-1. The size of prs and pms buffers will therefore depend on the
chip type of the device.
Therefore, instead of reserving an array directly in a structure
definition, allocate the buffer in mlxsw_sp_sb_port{,s}_init().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-2 will be configured with a different shared buffer
configuration than Spectrum-1. Therefore introduce a structure for
keeping the chip-specific default and immutable configuration.
Configuration mutable in runtime will still be kept in struct
mlxsw_sp_sb.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TBU interrupt is a normal interrupt and can be used to trigger the
cleaning of TX path. Lets check if it's active in DMA interrupt handler.
While at it, refactor a little bit the function:
- Don't check if RI is enabled because at function exit we will
only clear the interrupts that are enabled so, no event will
be missed.
In my tests withe XGMAC2 this increased performance.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TBU interrupt is a normal interrupt and can be used to trigger the
cleaning of TX path. Lets check if it's active in DMA interrupt handler.
While at it, refactor a little bit the function:
- Don't check if RI is enabled because at function exit we will
only clear the interrupts that are enabled so, no event will be
missed.
In my tests with GMAC5 this increased performance.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8fce333170 introduced the concept of NAPI per-channel and
independent cleaning of TX path.
This is currently breaking performance in some cases. The scenario
happens when all packets are being received in Queue 0 but the TX is
performed in Queue != 0.
Fix this by using different NAPI instances per each TX and RX queue, as
suggested by Florian.
Changes from v2:
- Only force restart transmission if there are pending packets
Changes from v1:
- Pass entire ring size to TX clean path (Florian)
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the bridge no longer calling switchdev_port_attr_get() to obtain
the supported bridge port flags from a driver but instead trying to set
the bridge port flags directly and relying on driver to reject
unsupported configurations, we can effectively get rid of
switchdev_port_attr_get() entirely since this was the only place where
it was called.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have converted the bridge code and the drivers to check for
bridge port(s) flags at the time we try to set them, there is no need
for a get() -> set() sequence anymore and
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT therefore becomes unused.
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for getting rid of switchdev_port_attr_get(), have rocker
check for the bridge flags being set through switchdev_port_attr_set()
with the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for getting rid of switchdev_port_attr_get(), have mlxsw
check for the bridge flags being set through switchdev_port_attr_set()
when the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier is
used.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
To resolve conflicts with net-next and pick up the first patch.
* branch 'mlx5-next':
net/mlx5: Factor out HCA capabilities functions
IB/mlx5: Add support for 50Gbps per lane link modes
net/mlx5: Add support to ext_* fields introduced in Port Type and Speed register
net/mlx5: Add new fields to Port Type and Speed register
net/mlx5: Refactor queries to speed fields in Port Type and Speed register
net/mlx5: E-Switch, Avoid magic numbers when initializing offloads mode
net/mlx5: Relocate vport macros to the vport header file
net/mlx5: E-Switch, Normalize the name of uplink vport number
net/mlx5: Provide an alternative VF upper bound for ECPF
net/mlx5: Add host params change event
net/mlx5: Add query host params command
net/mlx5: Update enable HCA dependency
net/mlx5: Introduce Mellanox SmartNIC and modify page management logic
IB/mlx5: Use unified register/load function for uplink and VF vports
net/mlx5: Use consistent vport num argument type
net/mlx5: Use void pointer as the type in address_of macro
net/mlx5: Align ODP capability function with netdev coding style
mlx5: use RCU lock in mlx5_eq_cq_get()
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
An issue has been found while testing zero-copy XDP that
causes a reset to be triggered. As it takes some time to
turn the carrier on after setting zc, and we already
start trying to transmit some packets, watchdog considers
this as an erroneous state and triggers a reset.
Don't do any work if netif carrier is not OK.
Fixes: 8221c5eba8 (ixgbe: add AF_XDP zero-copy Tx support)
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the driver clears the XDP xmit ring due to re-configuration or
teardown, in-progress ndo_xdp_xmit must be taken into consideration.
The ndo_xdp_xmit function is typically called from a NAPI context that
the driver does not control. Therefore, we must be careful not to
clear the XDP ring, while the call is on-going. This patch adds a
synchronize_rcu() to wait for napi(s) (preempt-disable regions and
softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY
flag is checked in the ndo_xdp_xmit implementation to avoid touching
the XDP xmit queue during re-configuration.
Fixes: d9314c474d ("i40e: add support for XDP_REDIRECT")
Fixes: 123cecd427 ("i40e: added queue pair disable/enable functions")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the RX rings are created they are also populated with buffers so
that packets can be received. Usually these are kernel buffers, but
for AF_XDP in zero-copy mode, these are user-space buffers and in this
case the application might not have sent down any buffers to the
driver at this point. And if no buffers are allocated at ring creation
time, no packets can be received and no interrupts will be generated so
the NAPI poll function that allocates buffers to the rings will never
get executed.
To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once after
an XDP program has loaded and once after the umem is registered. This
take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.
Fixes: d0bcacd0a1 ("ixgbe: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the RX rings are created they are also populated with buffers
so that packets can be received. Usually these are kernel buffers,
but for AF_XDP in zero-copy mode, these are user-space buffers and
in this case the application might not have sent down any buffers
to the driver at this point. And if no buffers are allocated at ring
creation time, no packets can be received and no interrupts will be
generated so the NAPI poll function that allocates buffers to the
rings will never get executed.
To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once
after an XDP program has loaded and once after the umem is registered.
This take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.
Fixes: 0a714186d3 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Read port count from the shared memory instead of driver deriving this
value. This change simplifies the driver implementation and also avoids
any dependencies for finding the port-count.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enabling L3/L4 filtering for transmit switched packets for all
devices caused unforeseen issue on older devices when trying to send UDP
traffic in an ordered sequence. This bit was originally intended for X550
devices, which supported this feature, so limit the scope of this bit to
only X550 devices.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Deprecate cpsw-phy-sel driver as it's been replaced with new
TI phy-gmii-sel PHY driver.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
There are rare cases where a PL_INT_CAUSE bit may end up getting
set when the corresponding PL_INT_ENABLE bit isn't set.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This series includes misc updates to mlx5 drivers and one ethtool update.
1) From Aya Levin:
- ethtool: Define 50Gbps per lane link modes
- add support for 50Gbps per lane link modes in mlx5 driver
2) From Tariq Toukan,
- Add a helper function to unify mlx5 resource reloading
3) From Vlad Buslov,
- Remove wrong and superfluous tc pedit header type check
4) From Tonghao Zhang,
- Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow
5) From Leon Romanovsky & Saeed,
- Compilation warning fixes
6) From Bodong wang,
- E-Switch fixes that are related to the SmarNIC series
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcbH/oAAoJEEg/ir3gV/o+bJQIAKIbILkQsAIn3B3U1Y7WgE8l
4pXdcYe6A3Mr9ZOi+U2ovJqVEHKPc7ASNfBfK5zsRayxhDQmoNhaloZSlwkgxenN
+guf86OBqTmXq9vzk3AFfPbiceQ+ENkM8ZHsv0yc9q8ZgZX5KR3ghG0j7wONDdgv
UOGuHNbO2QPA2d38V+xdwGkr/M8eep2xjYwDqT8+s7F+7BZeRThF2s9NBhE4rgob
KlPXwYTqkx5fiODtlFE9igryi/aSr9iaVIVCOs9ZBwQrETc/WLttWbfMgrMWFW5z
r4sa/84oVe0GVwZ+KPJKjUpXR4x1C3O+vSIAREsZfBDfjADHvGgXyRCYw+xrnq4=
=9kOH
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-02-19
This series includes misc updates to mlx5 drivers and one ethtool update.
1) From Aya Levin:
- ethtool: Define 50Gbps per lane link modes
- add support for 50Gbps per lane link modes in mlx5 driver
2) From Tariq Toukan,
- Add a helper function to unify mlx5 resource reloading
3) From Vlad Buslov,
- Remove wrong and superfluous tc pedit header type check
4) From Tonghao Zhang,
- Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow
5) From Leon Romanovsky & Saeed,
- Compilation warning fixes
6) From Bodong wang,
- E-Switch fixes that are related to the SmarNIC series
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Booting 4.20 on SolidRun Clearfog issues this warning with DMA API
debug enabled:
WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
[<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
[<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
[<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
[<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
[<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
[<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
[<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
[<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
[<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
[<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
...
This appears to be caused by mvneta_rx_hwbm() calling
dma_sync_single_range_for_cpu() with the wrong struct device pointer,
as the buffer manager device pointer is used to map and unmap the
buffer. Fix this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Another platform requires even longer delay to make the device work
correctly after S3.
So increase the delay to 300ms.
BugLink: https://bugs.launchpad.net/bugs/1798921
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing the struct ptp_clock_info caps by parameter is passing over 130 bytes
of data by value on the stack. Optimize this by passing it by reference instead.
Also shinks the object code size:
Before:
text data bss dec hex filename
12596 2160 64 14820 39e4 drivers/ptp/ptp_qoriq.o
After:
text data bss dec hex filename
12567 2160 64 14791 39c7 drivers/ptp/ptp_qoriq.o
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When disabling vport, relevant vport configurations will be cleaned
up. These cleanups should be done to the vports which had these configs
applied at vport enablement. As esw manager vport didn't have such
vport config applied, cleanup should not touch it.
Fixes: de9e6a8136c5 ("net/mlx5: E-Switch, Properly refer to host PF vport as other vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch gets vport data structure, the index should not be out
of the range of the vport array. Driver mistakenly used vport number
to check the range.
Fixes: 22b8ddc86bf4 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
fpga_qpn was assigned but never used and compilation with W=1
produced the following warning:
drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c: In function _mlx5_fpga_event_:
drivers/net/ethernet/mellanox/mlx5/core/fpga/core.c:320:6: warning:
variable _fpga_qpn_ set but not used [-Wunused-but-set-variable]
u32 fpga_qpn;
^~~~~~~~
Fixes: 98db16bab5 ("net/mlx5: FPGA, Handle QP error event")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Compilation with W=1 produces following warning:
drivers/net/ethernet/mellanox/mlx5/core/en/monitor_stats.c:69:6:
warning: no previous prototype for _mlx5e_monitor_counter_start_ [-Wmissing-prototypes]
void mlx5e_monitor_counter_start(struct mlx5e_priv *priv)
^~~~~~~~~~~~~~~~~~~~~~~~~~~
Avoid it by declaring mlx5e_monitor_counter_start() as a static function.
Fixes: 5c7e8bbb02 ("net/mlx5e: Use monitor counters for update stats")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch is a little improvement. Simplify the mlx5e_tc_add_fdb_flow().
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With recent introduction of flow_rule infrastructure drivers no longer
directly include action headers, so it is no longer possible to use
constants defined in them. Instead, one of flow_rule patches substituted
pedit action header constant with hardcoded value '2' in mlx5
set_pedit_val() function conditional which verifies that header type is in
range of values allowed by pedit action. That conditional is now both
wrong (hardcoded value is '2' but __PEDIT_HDR_TYPE_MAX is 6 in current
version) and superfluous (pedit action already verifies that header type is
in allowed range during init). Remove the described check from mlx5 code.
Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Dmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Take into a function the common code structure of opening
a side set of channels followed by a call to apply them.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In previous patch, driver added new speed modes: 50Gbps per lane support
for 50G/100G/200G. This patch modifies mlx5e_get_link_ksettings and
mlx5e_set_link_ksettings to set and get these link modes via ethtool.
In order to do so, added mapping of new HW bits to ethtool bitmap and
enforce mutual exclusion between extended link modes and previously
defined link modes.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch is to do code cleanup for ns83820_probe_phy().
It deletes unused variable 'first', commented out code,
and the pointless 'for' loop.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver returns -ENOSPC when tc_can_offload() check fails. Since that
routine checks for flow parameters that are not supported by the driver,
we should return the more appropriate -EOPNOTSUPP.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for SIOCGMIIREG and SIOCSMIIREG ioctls to
mdio read/write to external PHY.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware understands the concept of a trusted VF, so propagate the
trusted VF attribute set by the PF admin. to the firmware. Also, check
the firmware trusted setting when considering the VF MAC address change
and reporting the trusted setting to the user.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for BCM957504 with device ID 1751
Signed-off-by: Erik Burrows <erik.burrows@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware error recover is the major change in this spec.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Combine all HCA capabilities setters under one function
and compile out the ODP related function in case kernel
was compiled without ODP support.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Recent commit below has introduced a bug in netcp driver that causes
the ethss driver probe failure and thus break the networking function
on K2 SoCs such as K2HK, K2L, K2E etc. This patch fixes the issue to
restore networking on the above SoCs.
Fixes: 21c328dcec ("net: ethernet: Convert to using %pOFn instead of device_node.name")
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the missing device reference release-after-use in
the positive leg of the roce reset API of the HNS DSAF.
Fixes: c969c6e7ab ("net: hns: Fix object reference leaks in hns_dsaf_roce_reset()")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are saving the status of EEE even before we try to enable it. This
leads to a race with XMIT function that tries to arm EEE timer before we
set it up.
Fix this by only saving the EEE parameters after all operations are
performed with success.
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Fixes: d765955d2a ("stmmac: add the Energy Efficient Ethernet support")
Cc: Joao Pinto <jpinto@synopsys.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ll2 forwards all syn packets to the driver without validating the mac
address. Add validation check in the driver's iWARP listener flow and drop
the packet if it isn't intended for the device.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The assumption that the maximum size of a syn packet is 128 bytes
is wrong. Tunneling headers were not accounted for.
Allocate buffers large enough for mtu.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function 'bnx2x_get_hwinfo':
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:11940:10: warning:
variable 'mfw_vn' set but not used [-Wunused-but-set-variable]
It's never used since introduction.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix following warning:
drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c:1453:35: warning: Using plain integer as NULL pointer
drivers/net/ethernet/cavium/liquidio/lio_main.c:2910:23: warning: Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
writex() has implicit barriers, that's what makes it different from
writex_relaxed(). Therefore these calls to mmiowb() can be removed.
This patch was recently reverted due to a dependency with another
problematic patch. But because it didn't contribute to the problem
it was rebased and can be resubmitted.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IP2ME packet trap is triggered by packets hitting local routes.
After evaluating current defaults used by the driver it was decided to
reduce the amount of traffic generated by this trap to 1Kpps and
increase the burst size. This is inline with similarly deployed systems.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in a en_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pasemi driver never set a DMA mask, and given that the powerpc
DMA mapping routines never check it this worked ok so far. But the
generic dma-direct code which I plan to switch on for powerpc checks
the DMA mask and fails unsupported mapping requests, so we need to
make sure the proper 64-bit mask is set.
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes the following sparse warning:
drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2431:5: warning:
symbol 'hclge_set_all_vf_rst' was not declared. Should it be static?
Fixes: aa5c4f175b ("net: hns3: add reset handling for VF when doing PF reset")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function meth_init_tx_ring() is called from meth_tx_timeout(),
in which spin_lock is held, so we should use GFP_ATOMIC instead.
Fixes: 8d4c28fbc2 ("meth: pass struct device to DMA API functions")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If mv643xx_eth_shared_of_probe() fails, mv643xx_eth_shared_probe()
leaves clk enabled.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
GMAC IP is little-endian and used on several kind of CPU (big or little
endian). Main callbacks functions of the stmmac drivers take care about
it. It was not the case for dwmac4_get_timestamp function.
Fixes: ba1ffd74df ("stmmac: fix PTP support for GMAC4")
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check mask fields of tcp and ip flags when setting the corresponding mask
flag used in hardware.
Fixes: 8f2566225a ("flow_offload: add flow_rule and flow_match")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Devlink now allows updating device flash. Implement this
callback.
Compared to ethtool update we no longer have to release
the networking locks - devlink doesn't take them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a dedicated thermal zone for each QSFP/SFP module. The current
temperature is obtained from the module's temperature sensor and the
trip points are set based on the warning and critical thresholds
read from the module.
A cooling device (fan) is bound to all the thermal zones. The
thermal zone governor is set to user space in order to avoid
collisions between thermal zones.
For example, one thermal zone might want to increase the speed of
the fan, whereas another one would like to decrease it.
Deferring this decision to user space allows the user to the take
the most suitable decision.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function-local variable "delay" enters the loop interpreted as delay
in bits. However, inside the loop it gets overwritten by the result of
mlxsw_sp_pg_buf_delay_get(), and thus leaves the loop as quantity in
cells. Thus on second and further loop iterations, the headroom for a
given priority is configured with a wrong size.
Fix by introducing a loop-local variable, delay_cells. Rename thres to
thres_cells for consistency.
Fixes: f417f04da5 ("mlxsw: spectrum: Refactor port buffer configuration")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexei Starovoitov says:
====================
pull-request: bpf-next 2019-02-16
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) numerous libbpf API improvements, from Andrii, Andrey, Yonghong.
2) test all bpf progs in alu32 mode, from Jiong.
3) skb->sk access and bpf_sk_fullsock(), bpf_tcp_sock() helpers, from Martin.
4) support for IP encap in lwt bpf progs, from Peter.
5) remove XDP_QUERY_XSK_UMEM dead code, from Jan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In dwmac4_wrback_get_rx_timestamp_status we looking for a RX timestamp.
For that receive descriptors are handled and so we should use defines
related to receive descriptors. It'll no change the functional behavior
as RDES3_RDES1_VALID=TDES3_RS1V=BIT(26) but it makes code easier to read.
Signed-off-by: Alexandre Torgue <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bodong Wang says,
BlueField device is a multi-core ARM processor in a highly integrated
system on chip coupled with the ConnectX interconnect controller.
BlueField device can be presented in one out of two modes:
- SEPARATED_HOST: ARM processors as a separated and orthogonal host
like any other external host in the multi-host virtualization model.
- EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
external hosts virtualization model.
While existing driver already supports the device on separated_host
mode, this patch series focus on the functionalities of embedded_cpu
mode.
On embedded_cpu mode, BlueField device exposes regular network
controller PCI function in the BlueField host(e.g, x86). However, a
separate PCI function called Embedded CPU Physical Function(ECPF) is
also added to the ARM host side, where standard Linux distributions is
able to run on the ARM cores. Depends on the NV configuration from
firmware, ECPF can be the e-switch manager and firmware pages supplier.
If ECPF is configured as e-switch manager and page supplier, it will
take over the responsibilities from the PF on BlueField host includes:
- Owns, controls and manages all e-switch parts, and takes e-switch
traffic by default. It also should perform ENABLE_HCA for the host
PF just like a PF does for its VFs.
- Provides and manages the ICM host memory required for the HCA to
store various contexts for itself, the PF and VFs belong the
e-switch it manages.
The PF on BlueField host side is still responsible for:
- Control its own permanent MAC.
- PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.
The ECPF can also retrieve information about the external host it
controls, like host identifier, PCI BDF and number of virtual functions.
As these parameters may be changed dynamically, an event will be triggered
to the driver on ECPF side.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcZ2amAAoJEEg/ir3gV/o+lzMIALvLNUoD6pXi41MWsOwvmAHg
07mzg1N80Z66MCcFau40I8T3h9NLiRMzNtFrBNtxx+ruwKFUpAjJHjaU0sms0yQH
WVjr35vk4XsZyDPSCJ4g/hCQVlgCT/1tIUvPO0YM9hjhDuVa9mT4wEpucQRDu8bO
KfeXNXLDFnlWxjokhpSVj369ozh+LTv4Kzy0MBBbji97bG6MktGAT8uCimUy7wG0
7dlYimnZ1+iUD1/DZQadiLCUHUu/rTcnvF2+DcdG/nbSU8ydVLgj6vgtIfCYt4e5
kcQO5hmatnl0iJUA8GNpdHQwGjYytKneoGmIfbMAK4KjvFNF+3/8N4Bytoa5kzk=
=DgTl
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2019-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Support Mellanox BlueField SmartNIC (mlx5-updates-2019-02-15)
Bodong Wang says,
BlueField device is a multi-core ARM processor in a highly integrated
system on chip coupled with the ConnectX interconnect controller.
BlueField device can be presented in one out of two modes:
- SEPARATED_HOST: ARM processors as a separated and orthogonal host
like any other external host in the multi-host virtualization model.
- EMBEDDED_CPU: ARM processors as Embedded CPU (EC) and part of the
external hosts virtualization model.
While existing driver already supports the device on separated_host
mode, this patch series focus on the functionalities of embedded_cpu
mode.
On embedded_cpu mode, BlueField device exposes regular network
controller PCI function in the BlueField host(e.g, x86). However, a
separate PCI function called Embedded CPU Physical Function(ECPF) is
also added to the ARM host side, where standard Linux distributions is
able to run on the ARM cores. Depends on the NV configuration from
firmware, ECPF can be the e-switch manager and firmware pages supplier.
If ECPF is configured as e-switch manager and page supplier, it will
take over the responsibilities from the PF on BlueField host includes:
- Owns, controls and manages all e-switch parts, and takes e-switch
traffic by default. It also should perform ENABLE_HCA for the host
PF just like a PF does for its VFs.
- Provides and manages the ICM host memory required for the HCA to
store various contexts for itself, the PF and VFs belong the
e-switch it manages.
The PF on BlueField host side is still responsible for:
- Control its own permanent MAC.
- PCI and SRIOV configurations and perform ENABLE_HCA for its VFs.
The ECPF can also retrieve information about the external host it
controls, like host identifier, PCI BDF and number of virtual functions.
As these parameters may be changed dynamically, an event will be triggered
to the driver on ECPF side.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SYSTEMPORT has its RXCHK parser block that attempts to validate the
packet structures, unfortunately setting the L2 header check bit will
cause Bridge PDUs (BPDUs) to be incorrectly rejected because they look
like LLC/SNAP packets with a non-IPv4 or non-IPv6 Ethernet Type.
Fixes: 4e8aedfe78c7 ("net: systemport: Turn on offloads by default")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a spelling mistake in several dev_err messages, fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the e-switch driver requires going to legacy mode before
changing to the offloads mode. This makes sense for regular case as
the legacy mode is done by creating VFs.
However, it's problematic when ECPF is the eswitch manager. In such
case, ECPF will control the vports on peer host including the peer
PF and VFs. But ECPF doesn't need and shall not create VFs as the
VFs are created in the peer PF host.
Grant ECPF the ability to change from none to the offloads mode. Note
that currently the only way to go back to none mode is by unloading
the ECPF driver.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When host PF changes the number of VFs, the ECPF esw driver will get
a FW event. It should query the number of VFs enabled by host PF and
update the VF reps accordingly. Note that host PF can't change the
number of VFs dynamically, it has to reset the number of VFs to 0
before changing to a new positive number.
The host event is registered when driver is moving to switchdev mode,
and it's the last step to do in esw_offloads_init. It's unregistered
and the work queue is flushed when driver quits from switchdev mode.
In this way, the host event and devlink command are serialized.
When driver is enabling switchdev mode, pay attention to the following
two facts:
1. Host PF must not have VF initialized as the flow table in ECPF has
ENCAP enabled as default. Such flow table can't be created with
existing initialized VFs.
2. ECPF doesn't know how many VFs the host PF will enable, ECPF
offloads flow steering shall create the flow table/groups based on
the max number of VFs possibly supported by host PF.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
ECPF connects to the eswitch through vport 0xfffe. ECPF may or may
not be the eswitch manager depending on firmware configuration.
1. If ECPF is eswitch manager: ECPF will take over the eswitch manager
responsibility. A rep of the host PF shall be created at the ECPF
side for the eswitch manager to control.
2. If ECPF is not eswitch manager: host PF will be the eswitch manager,
ECPF acts similar as a VF to the host PF. Host PF will be aware
of the ECPF vport presence and control it's rep.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In offloads mode, the current implementation puts the uplink
representor at index zero of the vport reps array. It is not "natural"
to place it at index 0 since we want to put the representor for vport
0 at index 0 with the introduction of SmartNIC. A separate patch will
handle the case whether a rep is needed for vport 0 (PF vport).
So, we want to have a different placeholder for uplink vport and
representor. It was placed at the end of vport and rep array. Since
vport number can no longer act as an index into the vport or
representors arrays, use functions to map vport numbers to indices
when accessing the vports or representors arrays, and vice versa.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eswitch has two users: IB and ETH. They both register repersentors
when mlx5 interface is added, and unregister the repersentors when
mlx5 interface is removed. Ideally, each driver should only deal with
the entities which are unique to itself. However, current IB and ETH
drivers have to perform the following eswitch operations:
1. When registering, specify how many vports to register. This number
is the same for both drivers which is the total available vport
numbers.
2. When unregistering, specify the number of registered vports to do
unregister. Also, unload the repersentors which are already loaded.
It's unnecessary for eswitch driver to hands out the control of above
operations to individual driver users, as they're not unique to each
driver. Instead, such operations should be centralized to eswitch
driver. This consolidates eswitch control flow, and simplified IB and
ETH driver.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the driver loads and unloads all reps in an unbreakable
group. However, with ECPF, the reps of special vports such as uplink
and host PF should always be loaded in switchdev mode where the reps
for VFs will be loaded on-demand and unloaded on no-demand. This is
a pre-step for that change.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently the eswitch vport reps have a valid indicator, which is
set on register and unset on unregister. However, a rep can be loaded
or not loaded when doing unregister, current driver checks if the
vport of that rep is enabled as a flag to imply the rep is loaded.
However, for ECPF, this is not valid as the host PF will enable the
vports for its VFs instead.
Add three states: {unregistered, registered, loaded}, with the
following state changes across different operations:
create: (none) -> unregistered
reg: unregistered -> registered
load: registered -> loaded
unload: loaded -> registered
unreg: registered -> unregistered
Note that the state shall only be updated inside eswitch driver rather
than individual drivers such as ETH or IB.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Suggested-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With only PF and VF, it is sufficient to have the vport/rep array
index as the vport number. This is because PF and VF vports numbers
are consecutive serial numbers. In downstream patches with
introducing of ECPF and UPLINK vports, it's not consecutive any more.
Use getter to get specific vport/rep, and use iterator to traversal
a list of vport/rep. This hides the translation between array index
and vport number, and provides flexibility of using different
translation mechanism in the future.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When driver is entering offloads mode, there are two major tasks to
do: initialize flow steering and create representors. Flow steering
should make sure enough flow table/group spaces are reserved for all
reps. Representors will be created in a group, all or none.
With the introduction of ECPF, flow steering should still reserve the
same spaces. But, the representors are not always loaded/unloaded in a
single piece. Once ECPF is in offloads mode, it will get the number
of VF changing event from host PF. In such scenario, only the VF reps
should be loaded/unloaded, not the reps for special vports (such as
the uplink vport).
Thus, when entering offloads mode, driver should specify the total
number of reps, and the number of VF reps separately. When leaving
offloads mode, the cleanup should use the information self-contained
in eswitch such as number of VFs.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
E-switch offloads mode initialize/cleanup multiple steering related
entities (flow table/group). Refactor these operations to internal
helper functions for better block design.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Commands referring to vports use the following scheme:
1. When referring to my own vport, put 0 in vport and 0 in other_vport.
2. When referring to another vport, put the vport number of the
referred vport and put 1 in other_vport. It was assumed that driver
is accessing other vport when vport number is greater than 0.
With the above scheme, the case that ECPF eswitch manager is trying
to access host PF vport will fall over with scheme 1 as the vport
number is 0. This is apparently wrong as driver is trying to refer
other vport.
As such usage can only happen in the eswitch context, change relevant
functions to provide other vport input properly.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In SmartNIC mode, the eswitch manager is not necessarily the PF
(vport 0). Use a helper function to get the correct eswitch manager
vport number and cache on the eswitch instance for fast reference.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When bonding is added, driver assumes that it's RoCE LAG if no VF is
enabled. This is not enough for ECPF as the VF is enabled in host PF
side. LAG should only choose RoCE mode when both slave devices meet
conditions below:
1. E-Switch offloads mode is NONE.
2. No VF is enabled.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Merge mlx5-next shared branched into net-next,
From Bodong Wang:
1) Introduction of ECPF (Embedded CPU Physical Function), and low level
bits for mlx5 SmartNic capabilities support.
2) Vport enumeration refactoring that affect mlx5_ib and mlx5_core
From Aya Levin,
3) Add support for 50Gbps per lane link modes in the Port Type and Speed
register (PTYS)
4) Refactor low level query functions for PTYS register
5) Add support for 50Gbps per lane link modes to mlx5_ib
Note: due to a change in API in mlx5/core and a later patch from net-next,
a fixup was squashed with this merge commit that replaces FDB_UPLINK_VPORT
with MLX5_VPORT_UPLINK which exists only in upstream net-next.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The netfilter conflicts were rather simple overlapping
changes.
However, the cls_tcindex.c stuff was a bit more complex.
On the 'net' side, Cong is fixing several races and memory
leaks. Whilst on the 'net-next' side we have Vlad adding
the rtnl-ness support.
What I've decided to do, in order to resolve this, is revert the
conversion over to using a workqueue that Cong did, bringing us back
to pure RCU. I did it this way because I believe that either Cong's
races don't apply with have Vlad did things, or Cong will have to
implement the race fix slightly differently.
Signed-off-by: David S. Miller <davem@davemloft.net>
Export the sge_host_page_size field to ULDs via cxgb4_lld_info, so that
iw_cxgb4 can make use of this in calculating the correct qp/cq mask.
Fixes: 2391b0030e ("cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size")
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Commit c9b47cc1fa ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") moved the umem query code to the AF_XDP
core, and therefore removed the need to query the netdevice for a
umem.
This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
behavior, which is just dead code.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch exposes new link modes (including 50Gbps per lane), and ext_*
fields which describes the new link modes in Port Type and Speed
register (PTYS).
Access functions, translation functions (speed <-> HW bits) and
link max speed function were modified.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This patch fascicles queries to speed related fields in Port Type and
Speed register (PTYS) into a single API. I addition, this patch
refactors functions which serves only Ethernet driver: remove the
protocol type as an input parameter, move code from 'core' directory
into 'en' directory and add 'eth' prefix to the function's name. The
patch also encapsulates functions that are not used outside the Ethernet
driver removes redundant include files.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When dealing with the offloads mode initialization, driver refers to
the number of VFs and add magic number one (1) to take account of the
uplink. This is not clear and will make the code less readable after
adding other vports (e.g. host PF). As these are special vports
compared to VF vports, add a helper macro to denote such special
vports and eliminate the use of magic number.
Moreover, when creating offloads flow table and groups, the driver
reserves two more slots for UC and MC miss rules. Replace this magic
number with a helper macro as well.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
These are two macros in the driver general header which deal with the
number of total vports and if a vport is vport manager. Such macros
are vport entities, better to place them at the vport header file.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Driver used to name uplink vport as FDB_UPLINK_VPORT, it's hard to
comply with the same naming convention along with the introduction of
other vports. Use MLX5_VPORT as the prefix for such vports and
relocate the uplink vport definition to public header file for the
benefits of both net and IB drivers.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
In Embedded CPU (EC) configurations, the EC driver needs to know when
the number of virtual functions change on the corresponding PF at the
host side. This is required so the EC driver can create or destroy
representor net devices that represent the VFs ports.
Whenever a change in the number of VFs occurs, firmware will generate an
event towards the EC which will trigger a work to complete the rest of
the handling. The specifics of the handling will be introduced in a
downstream patch.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The QUERY_HOST_PARAMS command is used by an Embedded CPU Physical
Function (ECPF) driver to identify and retrieve information about the
PF on the host side. E.g, number of virtual functions and PCI BDF.
The number of VFs can be changed on the fly, a function is added to
query current number of VFs and will be used in downstream patches.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
With the introduction of ECPF, we require that the ECPF driver will
aways call enable/disable HCA for that PF in the same way a PF does
this for its VFs. The PF is still responsible for calling enable and
disable HCA for its VFs.
To distinguish between the ECPF executing enable/disable HCA for
itself or for the PF, it sets the embedded CPU function bit in the
input params struct of these commands. When the bit is cleared and
function ID is zero, it refers to the peer PF.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mellanox's SmartNIC combines embedded CPU(e.g, ARM) processing power
with advanced network offloads to accelerate a multitude of security,
networking and storage applications.
With the introduction of the SmartNIC, there is a new PCI function
called Embedded CPU Physical Function(ECPF). And it's possible for a
PF to get its ICM pages from the ECPF PCI function. Driver shall
identify if it is running on such a function by reading a bit in
the initialization segment.
When firmware asks for pages, it would issue a page request event
specifying how many pages it requests and for which function. That
driver responds with a manage_pages command providing the requested
pages along with an indication for which function it is providing these
pages.
The encoding before this patch was as follows:
function_id == 0: pages are requested for the function receiving
the EQE.
function_id != 0: pages are requested for VF identified by the
function_id value
A new one bit field in the EQE identifies that pages are requested for
the ECPF.
The notion of page_supplier can be introduced here and to support that,
manage pages and query pages were modified so firmware can distinguish
the following cases:
1. Function provides pages for itself
2. PF provides pages for its VF
3. ECPF provides pages to itself
4. ECPF provides pages for another function
This distinction is possible through the introduction of the bit
"embedded_cpu_function" in query_pages, manage_pages and page request
EQE.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use u16 for vport number, which matches how hardware refers to this
argument throughout commands.
This patch doesn't change any functionality.
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
After failing to allocate a receive buffer the driver may fail to ever
request additional allocations. EF10 NICs require new receive buffers to
be pushed in batches of eight or more. The test for whether a slow fill
should be scheduled failed to take account of this. There is little
downside to *always* requesting a slow fill if we failed to allocate a
buffer, so the condition has been removed completely. The timer that
triggers the request for a refill has also been shortened.
Signed-off-by: Robert Stonehouse <rstonehouse@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the ethtool_regs version is set to 0 for FEC devices.
Use this field to store the register dump version exposed by the
kernel. The choosen version 2 corresponds to the kernel compile test:
#if defined(CONFIG_M523x) || defined(CONFIG_M527x)
|| defined(CONFIG_M528x) || defined(CONFIG_M520x)
|| defined(CONFIG_M532x) || defined(CONFIG_ARM)
|| defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST)
and version 1 corresponds to the opposite. Binaries of ethtool unaware
of this version will dump the whole set as usual.
Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in intr_handler() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in arc_emac_tx_clean() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in i596_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch gets/sets SGE Doorbell Queue timer ticks via ethtool
Original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
T6 introduced a Timer Mechanism in SGE called the
SGE Doorbell Queue Timer. With this we can now configure
TX Queues to get CIDX Updates when:
Time(CIDX == PIDX) >= Timer
Previously we rely on TX Queue Status Page updates by hardware
for DMA completions. This will make Hardware/Firmware actually
deliver the CIDX Updates as Ingress Queue messages with
commensurate Interrupts.
So we now have a new RX Path component for processing CIDX Updates
and reclaiming TX Descriptors faster.
Original work by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The of_find_device_by_node() takes a reference to the underlying device
structure, we should release that reference.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb should be freed by dev_consume_skb_any() in efx_tx_tso_fallback()
when skb is still used. The skb will be replaced by segments, so the
original skb should be consumed(not drop).
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb should be freed by dev_consume_skb_any() in macb_pad_and_fcs()
when *skb is still used. The *skb is be replaced by nskb, so the
original *skb should be consumed(not drop).
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit
done.It makes drop profiles more friendly.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
modify the code style in order to removing the following warning
when excute the script checkpatch.pl
WARNING: space prohibited between function name and open parenthesis '('
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in w90p910_ether_start_xmit()
when skb xmit done. It makes drop profiles(dropwatch, perf) more
friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in ks8695_tx_irq() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in myri10ge_tx_done() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in intr_handler() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Remove a redundant blank line in intr_handler().
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow thermal zone binding to an external cooling device from the
cooling devices white list.
It provides support for Mellanox next generation systems on which
cooling device logic is not controlled through the switch registers.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add label attribute to hwmon object for exposing QSFP module's
temperature sensor name. Modules are labeled as "front panel xxx". The
label is used by utilities such as "sensors":
front panel 001: +0.0C (crit = +0.0C, emerg = +0.0C)
..
front panel 020: +31.0C (crit = +70.0C, emerg = +80.0C)
..
front panel 056: +41.0C (crit = +70.0C, emerg = +80.0C)
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new attributes to hwmon object for exposing QSFP module temperature
input, fault indication, critical and emergency thresholds. Temperature
input and fault indication are read from Management Temperature Bulk
Register. Temperature thresholds are read from Management Cable Info
Access Register.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new fan hwmon attribute for exposing fan faults (fault indication is
read from Fan Out of Range Event Register).
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename cooling device from "Fan" to "mlxsw_fan". Name "Fan" is too
common name, and such name is misleading, while it's interpreted by
user. For example name "Fan" could be used by ACPI.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace thermal hardcoded temperature trip values with defines.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify thermal zone trip points setting for better alignment with system
thermal requirement.
Add hysteresis thresholds for thermal trips in order to avoid throttling
around thermal trip point. If hysteresis temperature is not considered,
PWM can have side effect of flip up/down on thermal trip point boundary.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add low frequency bus capability in order to allow core functionality
separation based on bus type. Driver could run over PCIe, which is
considered as high frequency bus or I2C, which is considered as low
frequency bus. In the last case time setting, for example, for thermal
polling interval, should be increased.
Use different thermal monitoring based on bus type. For I2C bus time is
set to 20 seconds, while for PCIe 1 second polling interval is used.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new API to read QSFP module's temperature thresholds - warning and
critical.
New internal API reads the temperature thresholds from the modules,
which are equipped with the thermal sensor. These thresholds will be
exposed via hwmon subsystem.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FORE (Fan Out of Range Event Register), which is used for fan fault
reading.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MTBR (Management Temperature Bulk Register), which is used for port
temperature reading in a bulk mode.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move QSFP EEPROM definitions to common location from the spectrum driver
in order to make them available for other mlxsw modules. They are common
for all kind of chips and have relation to SFF specifications 8024,
8436, 8472, 8636, rather than to chip type.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CR4_QSFP 10G Speed technology should be 10000baseKR_Full
And also report available FEC modes.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some statements that are indented incorrectly. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are some statements in an if-block that are not correctly
indented. Fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in eth_txdone_irq() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in at91ether_interrupt() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in intr_handler() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in moxart_tx_finished() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in mace_interrupt() when skb
xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called in emac_mac_tx_process() when
skb xmit done. It makes drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_consume_skb_irq() should be called when skb xmit done. It makes
drop profiles(dropwatch, perf) more friendly.
Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently mlx5 driver creates xdp redirect hw queues unconditionally on
netdevice open, This is great until someone starts redirecting XDP traffic
via ndo_xdp_xmit on mlx5 device and changes the device configuration at
the same time, this might cause crashes, since the other device's napi
is not aware of the mlx5 state change (resources un-availability).
To fix this we must synchronize with other devices napi's on the system.
Added a new flag under mlx5e_priv to determine XDP TX resources are
available, set/clear it up when necessary and use synchronize_rcu()
when the flag is turned off, so other napi's are in-sync with it, before
we actually cleanup the hw resources.
The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and
it is sufficient to determine if it safe to transmit or not. The other
two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become
unnecessary. Thus, they are removed from data path.
Fixes: 58b99ee3e3 ("net/mlx5e: Add support for XDP_REDIRECT in device-out side")
Reported-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eliminate the following compilation warning:
drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str'
may be used uninitialized in this function [-Wuninitialized]: => 238:3
Fixes: c2fb3db22d ("net/mlx5: Rework handling of port module events")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Mikhael Goikhman <migo@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When EEH is injected and PCI bus stalls, mlx5's pci error detect
function is called to deactivate the command interface and tear down
the device. The issue is that there can be a thread that already
passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command
and stuck in the wait_func.
Solution:
Add function mlx5_cmd_flush to disable command interface and clear all
the pending commands. When device state is set to
MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all
pending threads waiting for firmware commands completion are terminated.
Fixes: c1d4d2e92a ("net/mlx5: Avoid calling sleeping function by the health poll thread")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
New channels are applied to the priv channels only after they
are successfully opened. Then, the indirection table should be built
according to the new number of channels.
Currently, such build is preformed independently of whether the
channels opening is successful, and is not reverted on failure.
The bug is caused due to removal of rss params from channels struct
and moving it to priv struct. That change cause to independency between
channels and rss params.
This causes a crash on a later point, when accessing rqn of a non
existing channel.
This patch fixes it by moving the indirection table build right before
switching the priv channels to new channels struct, after the new set of
channels was successfully opened.
Fixes: bbeb53b8b2 ("net/mlx5e: Move RSS params to a dedicated struct")
Signed-off-by: Maria Pasechnik <mariap@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Update newly introduced function to be aligned to netdev coding style.
Fixes: 46861e3e88 ("net/mlx5: Set ODP SRQ support in firmware")
Reported-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
After 1ecb195753 ("mlxsw: spectrum_switchdev: Remove getting
PORT_BRIDGE_FLAGS") we are not accessing any driver private data
structure, so the mlxsw_sp_port and mlxsw_sp variables are unused.
Fixes: 1ecb195753 ("mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After 610d2b601b ("rocker: Remove getting PORT_BRIDGE_FLAGS") we no
longer have a port_attr_bridge_flags_get member in the rocker_world_ops
structre, fix that.
Fixes: 610d2b601b ("rocker: Remove getting PORT_BRIDGE_FLAGS")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix race condition between ena_update_on_link_change() and
ena_restore_device().
This race can occur if link notification arrives while the driver
is performing a reset sequence. In this case link can be set up,
enabling the device, before it is fully restored. If packets are
sent at this time, the driver might access uninitialized data
structures, causing kernel crash.
Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on()
after ena_up() to ensure the device is ready when link is set up.
Fixes: d18e4f6834 ("net: ena: fix race condition between device reset and link up setup")
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.
Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.
Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to add PTP clock driver for ENETC.
The driver reused QorIQ PTP clock driver.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Strings containing "ptp_qoriq" or "qoriq_ptp" which were used for
structure/function names were complained by users. Let's just use
the unique "ptp_qoriq" to make these names more consistent.
This patch is just to unify the names using "ptp_qoriq". It hasn't
changed any functions.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no code that attempts to get the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS attribute, remove support for that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no code that will query the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
attribute remove support for that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several places which make the decision whether to access the
XLGMAC vs GMAC that only check for PHY_INTERFACE_MODE_10GKR and not its
XAUI variant. Switch these to use the new helper so that we have
consistency through the driver.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a mvpp2_is_xlg() helper to identify whether the interface mode
should be using the XLGMAC rather than the GMAC.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an ethernet frame is padded to meet the minimum ethernet frame
size, the padding octets are not covered by the hardware checksum.
Fortunately the padding octets are usually zero's, which don't affect
checksum. However, it is not guaranteed. For example, switches might
choose to make other use of these octets.
This repeatedly causes kernel hardware checksum fault.
Prior to the cited commit below, skb checksum was forced to be
CHECKSUM_NONE when padding is detected. After it, we need to keep
skb->csum updated. However, fixing up CHECKSUM_COMPLETE requires to
verify and parse IP headers, it does not worth the effort as the packets
are so small that CHECKSUM_COMPLETE has no significant advantage.
Future work: when reporting checksum complete is not an option for
IP non-TCP/UDP packets, we can actually fallback to report checksum
unnecessary, by looking at cqe IPOK bit.
Fixes: 88078d98d1 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit b639583f9e.
As per discussion with Jakub Kicinski and Michal Kubecek,
this will be better addressed by soon-too-come ethtool netlink
API with additional indication that given configuration request
is supposed to be persisted.
Also, remove the parameter support from bnxt_en driver.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Michal Kubecek <mkubecek@suse.cz>
Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Note that smc911x apparently is a PIO chip with an external DMA
handshake, and we probably use the wrong device here. But at least
it matches the mapping side, which apparently works or at least
worked in the not too distant past.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Also use GFP_KERNEL instead of GFP_ATOMIC as the gfp_t for the memory
allocation, as we aren't in interrupt context or under a lock.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Note that this driver seems to entirely lack dma_map_single error
handling, but that is left for another time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Note this driver seems to lack dma_unmap_* calls entirely, but fixing
that is left for another time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons. Pass the easily
available struct device from the platform_device to remedy this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver does not support VLAN push and pop, but only VLAN modify.
Fixes: 7386788175 ("drivers: net: use flow action infrastructure")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the register access failed an error would be logged anyway, so
we can drop the warning.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LAG port collecting (receive) function was mistakenly set when the
port was registered as a LAG member, while it should be set only when
the port collection state is set to true. Set LAG port to collecting
when it is set to distributing, as described in the IEEE link
aggregation standard coupled control mux machine state diagram.
Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bitmap of found partitions in efx_ef10_mtd_probe was not
initialised, causing partitions to be suppressed based off whatever
value was in the bitmap at the start.
Fixes: 3366463513 ("sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent cls_flower offload rewrite added a double new line.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bpf_offload_dev does not have any priv pointer, forcing
the drivers to work backwards from the netdev in program metadata.
This is not great given programs are conceptually associated with
the offload device, and it means one or two unnecessary deferences.
Add a priv pointer to bpf_offload_dev.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The manufacturing team requests we include vendor and product
in the serial number field, as the serial number itself is not
unique across manufacturing facilities and products.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vendor may sound ambiguous, let's rename the fab string to
"board.manufacture" (which was just added as a generic identifier).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside
gets a lot of contentions when we test some heavy workload
with 60 RX queues and 80 CPU's, and it is clearly shown in the
flame graph.
In fact, radix_tree_lookup() is perfectly fine with RCU read lock,
we don't have to take a spinlock on this hot path. This is pretty
much similar to commit 291c566a28
("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths
are still serialized with the spinlock, and with synchronize_irq()
it should be safe to just move the fast path to RCU read lock.
This patch itself reduces the latency by about 50% for our memcached
workload on a 4.14 kernel we test. In upstream, as pointed out by Saeed,
this spinlock gets some rework in commit 02d92f7903
("net/mlx5: CQ Database per EQ"), so the difference could be smaller.
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This reverts commit 2e6eedb481.
Sander reported a regression causing a kernel panic[1],
therefore let's revert this commit.
[1] https://marc.info/?t=154965066400001&r=1&w=2
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit bd7153bd83.
There doesn't seem to be anything wrong with this patch,
it's just reverted to get a stable baseline again.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mvpp2 configures the flow control modes in mvpp2_xlg_config() for
10G mode, it only ever set the flow control enable bits. There is no
mechanism to clear these bits, which means that userspace is unable to
use standard APIs to disable flow control (the only way is to poke the
register directly.)
Fix the missing bit clearance to allow flow control to be disabled.
This means that, by default, as there is no negotiation in 10G modes
with mvpp2, flow control is now disabled rather than being rx-only.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
phylink already limits which interface modes are able to call the
MACs AN restart function, but in any case, the commentry seems
incorrect: the AN restart bit does not automatically clear when
set. This has been found via manual setting using devmem2, and
we can observe that the AN does indeed restart and complete, yet
the AN restart bit remains set. Explicitly clear the AN restart
bit.
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reading the pause bits in mac_link_state, mvpp2 was reporting
the state of the "active pause" bits, which are set when the MAC is
in pause mode. This is not what phylink wants - we want the
negotiated pause state. Fix the definition so we read the correct
bits.
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
mac_config() can be called at any point, and the expected behaviour
from MAC drivers is to only reprogram when necessary - and certainly
avoid taking the link down on every call.
Unfortunately, mvpp2 does exactly that - it takes the link down, and
reprograms everything, and then releases the forced-link down.
This is bad, it can cause the link to bounce:
- SFP detects signal, disables LOS indication.
- SFP code calls into phylink, calling phylink_sfp_link_up() which
triggers a resolve.
- phylink_resolve() calls phylink_get_mac_state() and finds the MAC
reporting link up.
- phylink wants to configure the pause mode on the MAC, so calls
phylink_mac_config()
- mvpp2 takes the link down temporarily, generating a MAC link down
event followed by another MAC link event.
- phylink calls mac_link_up() and then processes the MAC link down
event.
- phylink_resolve() gets called again, registers the link down, and
calls mach_link_down() before re-running itself.
- phylink_resolve() starts again at step 3 above. This sequence
repeats.
GMAC versions prior to mvpp2 do not require the link to be taken down
except when certain link properties (eg, switching between SGMII and
1000base-X mode, or enabling/disabling in-band negotiation) are
changed. Implement this for mvpp2.
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
It appears that the mvpp22 can get stuck with SGMII negotiation. The
symptoms are that in-band negotiation never completes and the partner
(eg, PHY) never reports SGMII link up, or if it supports negotiation
bypass, goes into negotiation bypass mode (which will happen when the
PHY sees that the MAC is alive but gets no response.)
Triggering the PHY end of the link to re-negotiate results in the
bypass bit clearing on the PHY, and then re-setting - indicating that
the problem is at the mvpp22 GMAC end.
Asserting the GMAC reset and de-asserting it resolves the issue.
Arrange to assert the GMAC reset at probe time, and deassert it only
after we have configured the GMAC for the appropriate mode. This
resolves the issue.
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Auhagen reported issues with negotiation on a couple of his
platforms using a mixture of SFP and PHYs in various different
modes. Debugging to root cause proved difficult, but essentially
the problem comes down to the mvpp2 phylink implementation being
slightly at odds with what is expected.
phylink operates in three modes: phy, fixed-link, and in-band mode.
In the first two modes, the expected behaviour from a MAC driver is
that phylink resolves the operating mode and passes the mode to the
MAC driver for it to program, including when the link should be
brought up or taken down. This is basically the same as the libphy
approach. This does not negate the requirement to advertise a correct
control word for interface modes that have control words where that
can be reasonably controlled.
The second mode is in-band mode, where the MAC is expected to use the
in-band control word to determine the operating mode.
The mvneta driver implements the correct pattern required to support
this: configure the port interface type separately from the in-band
mode(s). This is now specified in the phylink documentation patches.
mvpp2 was programming in-band mode for SGMII and the 802.3z modes no
what, and avoided forcing the link up in fixed/phy modes. This caused
a problem with some boards where the PHY is by default programmed to
enter AN bypass mode, the PHY would report that the link was up, but
the mvpp2 never completed the exchange of control word.
Another issue that mvpp2 has is it sets SGMII AN format control word
for both SGMII and 802.3z modes. The format of the control word is
defined by MVPP2_GMAC_INBAND_AN_MASK, which should be set for SGMII
and clear for 802.3z. Available Marvell documentation for earlier
GMAC implementations does not make this clear, but this has been
ascertained via extensive testing on earlier GMAC implementations,
and then confirmed with a Macchiatobin Single Shot connected to a
Clearfog: when MVPP2_GMAC_INBAND_AN_MASK is set, the clearfog does
not receive the advertised pause mode settings.
Lastly, there is no flow control in the in-band control word in Cisco
SGMII, setting the flow control autonegotiation bit even with a PHY
that has the Marvell extension to send this information does not result
in the flow control being enabled at the MAC. We need to do this
manually using the information provided via phylink.
Re-code mvpp2's mac_config() and mac_link_up() to follow this pattern.
This allows Sven Auhagen's board and Macchiatobin to reliably bring
the link up with the 88e1512 PHY with phylink operating in PHY mode
with COMPHY built as a module but the rest of the networking built-in,
and u-boot having brought up the interface. in-band mode requires an
additional patch to resolve another problem.
Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
Notice that, in this case, variable size is not necessary, hence
it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL)
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL)
Notice that, in this case, variable size is not necessary, hence
it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = alloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
size = struct_size(instance, entry, count);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
struct boo entry[];
};
size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kzalloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
Notice that, in this case, variable size is not necessary, hence
it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
size = sizeof(struct foo) + count * sizeof(void *);
instance = alloc(size, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = alloc(struct_size(instance, entry, count), GFP_KERNEL);
Notice that, in this case, variable size is not necessary, hence
it is removed.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>