Commit Graph

936660 Commits

Author SHA1 Message Date
Dave Ertman
7d9c9b791f ice: Implement LFC workaround
There is a bug where the LFC settings are not being preserved
through a link event.  The registers in question are the ones
that are touched (and restored) when a set_local_mib AQ command
is performed.

On a link-up event, make sure that a set_local_mib is being
performed.

Move the function ice_aq_set_lldp_mib() from the DCB specific
ice_dcb.c to ice_common.c so that the driver always has access
to this AQ command.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-07-29 08:38:54 -07:00
David S. Miller
490ed0b908 Merge branch 'net-stmmac-improve-WOL'
Jisheng Zhang says:

====================
net: stmmac: improve WOL

Currently, stmmac driver relies on the HW PMT to support WOL. We want
to support phy based WOL.

patch1 is a small improvement to disable WAKE_MAGIC for PMT case if
no pmt_magic_frame.
patch2 and patch3 are two prepation patches.
patch4 implement the phy based WOL
patch5 tries to save a bit energy if WOL is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
77b2898394 net: stmmac: Speed down the PHY if WoL to save energy
When WoL is enabled and the machine is powered off, the PHY remains
waiting for wakeup events at max speed, which is a waste of energy.

Slow down the PHY speed before stopping the ethernet if WoL is enabled,

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
1d8e5b0f3f net: stmmac: Support WOL with phy
Currently, the stmmac driver WOL implementation relies on MAC's PMT
feature. We have a case: the MAC HW doesn't enable PMT, instead, we
rely on the phy to support WOL. Implement the support for this case.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
e8377e7a29 net: stmmac: only call pmt() during suspend/resume if HW enables PMT
This is to prepare WOL support with phy. Compared with WOL
implementation which relies on the MAC's PMT features, in phy
supported WOL case, device_may_wakeup() may also be true, but we
should not call mac's pmt() function if HW doesn't enable PMT.

And during resume, we should call phylink_start() if PMT is disabled.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
2f45f7a13e net: stmmac: Move device_can_wakeup() check earlier in set_wol
If !device_can_wakeup(), there's no need to futher check. And return
-EOPNOTSUPP rather than -EINVAL if !device_can_wakeup().

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:20 -07:00
Jisheng Zhang
1057d685c6 net: stmmac: Remove WAKE_MAGIC if HW shows no pmt_magic_frame
Remove WAKE_MAGIC from supported modes if the HW capability register
shows no support for pmt_magic_frame.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:48:19 -07:00
David S. Miller
f11df0454f Merge branch 'RTL8366-VLAN-callback-fixes'
Linus Walleij says:

====================
RTL8366 VLAN callback fixes

While we are pondering how to make the core set up the VLANs
the right way, let's merge the uncontroversial fixes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Linus Walleij
788abc6d9d net: dsa: rtl8366: Fix VLAN set-up
Alter the rtl8366_vlan_add() to call rtl8366_set_vlan()
inside the loop that goes over all VIDs since we now
properly support calling that function more than once.
Augment the loop to postincrement as this is more
intuitive.

The loop moved past the last VID but called
rtl8366_set_vlan() with the port number instead of
the VID, assuming a 1-to-1 correspondence between
ports and VIDs. This was also a bug.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Linus Walleij
15ab7906cc net: dsa: rtl8366: Fix VLAN semantics
The RTL8366 would not handle adding new members (ports) to
a VLAN: the code assumed that ->port_vlan_add() was only
called once for a single port. When intializing the
switch with .configure_vlan_while_not_filtering set to
true, the function is called numerous times for adding
all ports to VLAN1, which was something the code could
not handle.

Alter rtl8366_set_vlan() to just |= new members and
untagged flags to 4k and MC VLAN table entries alike.
This makes it possible to just add new ports to a
VLAN.

Put in some helpful debug code that can be used to find
any further bugs here.

Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Fixes: d8652956cf ("net: dsa: realtek-smi: Add Realtek SMI driver")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:44:23 -07:00
Brian Vazquez
b9aaec8f0b fib: use indirect call wrappers in the most common fib_rules_ops
This avoids another inderect call per RX packet which save us around
20-40 ns.

Changelog:

v1 -> v2:
- Move declaraions to fib_rules.h to remove warnings

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:42:31 -07:00
Cong Wang
608b4adab1 net_sched: initialize timer earlier in red_init()
When red_init() fails, red_destroy() is called to clean up.
If the timer is not initialized yet, del_timer_sync() will
complain. So we have to move timer_setup() before any failure.

Reported-and-tested-by: syzbot+6e95a4fabf88dc217145@syzkaller.appspotmail.com
Fixes: aee9caa03f ("net: sched: sch_red: Add qevents "early_drop" and "mark"")
Cc: Petr Machata <petrm@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:41:23 -07:00
David S. Miller
1255457429 Merge branch 'hinic-add-some-error-messages-for-debug'
Luo bin says:

====================
hinic: add some error messages for debug

patch #1: support to handle hw abnormal event
patch #2: improve the error messages when functions return failure and
	  dump relevant registers in some exception handling processes
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:03 -07:00
Luo bin
90f86b8a36 hinic: add log in exception handling processes
improve the error message when functions return failure and dump
relevant registers in some exception handling processes

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:03 -07:00
Luo bin
c15850c709 hinic: add support to handle hw abnormal event
add support to handle hw abnormal event such as hardware failure,
cable unplugged,link error

Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:22:02 -07:00
David S. Miller
aff7543126 Merge branch 'introduce-PLDM-firmware-update-library'
Jacob Keller says:

====================
introduce PLDM firmware update library

This series goal is to enable support for updating the ice hardware flash
using the devlink flash command.

The ice firmware update files are distributed using the file format
described by the "PLDM for Firmware Update" standard:

https://www.dmtf.org/documents/pmci/pldm-firmware-update-specification-100

Because this file format is standard, this series introduces a new library
that handles the generic logic for parsing the PLDM file header. The library
uses a design that is very similar to the Mellanox mlxfw module. That is, a
simple ops table is setup and device drivers instantiate an instance of the
pldmfw library with the device specific operations.

Doing so allows for each device to implement the low level behavior for how
to interact with its firmware.

This series includes the library and an implementation for the ice hardware.
I've removed all of the parameters, and the proposed changes to support
overwrite mode. I'll be working on the overwrite mask suggestion from Jakub
as a follow-up series.

Because the PLDM file format is a standard and not something that is
specific to the Intel hardware, I opted to place this update library in
lib/pldmfw. I should note that while I tried to make the library generic, it
does not attempt to mimic the complete "Update Agent" as defined in the
standard. This is mostly due to the fact that the actual interfaces exposed
to software for the ice hardware would not allow this.

This series depends on some work just recently and is based on top of the
patch series sent by Tony published at:

https://lore.kernel.org/netdev/20200723234720.1547308-1-anthony.l.nguyen@intel.com/T/#t

Changes since v2 RFC
* Removed overwrite mode patches, as this can become a follow up series with
  a separate discussion
* Fixed a minor bug in the pldm_timestamp structure not being packed.
* Dropped Cc for other driver maintainers, as this series no longer includes
  changes to the core flash update command.
* Re-ordered patches slightly.
Changes since v1 RFC
* Removed the "allow_downgrade_on_flash_update" parameter. Instead, the
  driver will always attempt to flash the device, even when firmware
  indicates that it would be a downgrade. A dev_warn is used to indicate
  when this occurs.
* Removed the "ignore_pending_flash_update". Instead, the driver will always
  check for and cancel any previous pending update. A devlink flash status
  message will be sent when this cancellation occurs.
* Removed the "reset_after_flash_update" parameter. This will instead be
  implemented as part of a devlink reset interface, work left for a future
  change.
* Replaced the "flash_update_preservation_level" parameter with a new
  "overwrite" mode attribute on the flash update command. For ice, this mode
  will select the preservation level. For all other drivers, I modified them
  to check that the mode is "OVERWRITE_NOTHING", and have Cc'd the
  maintainers to get their input.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacob Keller
d69ea414c9 ice: implement device flash update via devlink
Use the newly added pldmfw library to implement device flash update for
the Intel ice networking device driver. This support uses the devlink
flash update interface.

The main parts of the flash include the Option ROM, the netlist module,
and the main NVM data. The PLDM firmware file contains modules for each
of these components.

Using the pldmfw library, the provided firmware file will be scanned for
the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
the main NVM module containing the primary device firmware, and
"fw.netlist" containing the netlist module.

The flash is separated into two banks, the active bank containing the
running firmware, and the inactive bank which we use for update. Each
module is updated in a staged process. First, the inactive bank is
erased, preparing the device for update. Second, the contents of the
component are copied to the inactive portion of the flash. After all
components are updated, the driver signals the device to switch the
active bank during the next EMP reset (which would usually occur during
the next reboot).

Although the firmware AdminQ interface does report an immediate status
for each command, the NVM erase and NVM write commands receive status
asynchronously. The driver must not continue writing until previous
erase and write commands have finished. The real status of the NVM
commands is returned over the receive AdminQ. Implement a simple
interface that uses a wait queue so that the main update thread can
sleep until the completion status is reported by firmware. For erasing
the inactive banks, this can take quite a while in practice.

To help visualize the process to the devlink application and other
applications based on the devlink netlink interface, status is reported
via the devlink_flash_update_status_notify. While we do report status
after each 4k block when writing, there is no real status we can report
during erasing. We simply must wait for the complete module erasure to
finish.

With this implementation, basic flash update for the ice hardware is
supported.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacob Keller
2ab560a78e ice: add flags indicating pending update of firmware module
After a flash update, the pending status of the update can be determined
from the device capabilities.

Read the appropriate device capability and store whether there is
a pending update awaiting a reboot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Cudzilo, Szymon T
544cd2ac13 ice: Add AdminQ commands for FW update
Add structures, identifiers, and helper functions for several AdminQ
commands related to performing a firmware update for the ice hardware.
These will be used in future code for implementing the devlink
.flash_update handler.

Signed-off-by: Cudzilo, Szymon T <szymon.t.cudzilo@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacek Naczyk
de9b277ee0 ice: Add support for unified NVM update flow capability
Extends function parsing response from Discover Device
Capability AQC to check if the device supports unified NVM update flow.

Signed-off-by: Jacek Naczyk <jacek.naczyk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
Jacob Keller
b8265621f4 Add pldmfw library for PLDM firmware update
The pldmfw library is used to implement common logic needed to flash
devices based on firmware files using the format described by the PLDM
for Firmware Update standard.

This library consists of logic to parse the PLDM file format from
a firmware file object, as well as common logic for sending the relevant
PLDM header data to the device firmware.

A simple ops table is provided so that device drivers can implement
device specific hardware interactions while keeping the common logic to
the pldmfw library.

This library will be used by the Intel ice networking driver as part of
implementing device flash update via devlink. The library aims to be
vendor and device agnostic. For this reason, it has been placed in
lib/pldmfw, in the hopes that other devices which use the PLDM firmware
file format may benefit from it in the future. However, do note that not
all features defined in the PLDM standard have been implemented.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:07:06 -07:00
David S. Miller
323410ef74 Merge branch 'mptcp-Exchange-MPTCP-DATA_FIN-DATA_ACK-before-TCP-FIN'
Mat Martineau says:

====================
mptcp: Exchange MPTCP DATA_FIN/DATA_ACK before TCP FIN

This series allows the MPTCP-level connection to be closed with the
peers exchanging DATA_FIN and DATA_ACK according to the state machine in
appendix D of RFC 8684. The process is very similar to the TCP
disconnect state machine.

The prior code sends DATA_FIN only when TCP FIN packets are sent, and
does not allow for the MPTCP-level connection to be half-closed.

Patch 8 ("mptcp: Use full MPTCP-level disconnect state machine") is the
core of the series. Earlier patches in the series have some small fixes
and helpers in preparation, and the final four small patches do some
cleanup.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
721e908990 mptcp: Safely store sequence number when sending data
The MPTCP socket's write_seq member can be read without the msk lock
held, so use WRITE_ONCE() to store it.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
c75293925f mptcp: Safely read sequence number when lock isn't held
The MPTCP socket's write_seq member should be read with READ_ONCE() when
the msk lock is not held.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
06827b348b mptcp: Skip unnecessary skb extension allocation for bare acks
Bare TCP ack skbs are freed right after MPTCP sees them, so the work to
allocate, zero, and populate the MPTCP skb extension is wasted. Detect
these skbs and do not add skb extensions to them.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
067a0b3dc5 mptcp: Only use subflow EOF signaling on fallback connections
The MPTCP state machine handles disconnections on non-fallback connections,
but the mptcp_sock still needs to get notified when fallback subflows
disconnect.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
43b54c6ee3 mptcp: Use full MPTCP-level disconnect state machine
RFC 8684 appendix D describes the connection state machine for
MPTCP. This patch implements the DATA_FIN / DATA_ACK exchanges and
MPTCP-level socket state changes described in that appendix, rather than
simply sending DATA_FIN along with TCP FIN when disconnecting subflows.

DATA_FIN is now sent and acknowledged before shutting down the
subflows. Received DATA_FIN information (if not part of a data packet)
is written to the MPTCP socket when the incoming DSS option is parsed by
the subflow, and the MPTCP worker is scheduled to process the
flag. DATA_FIN received as part of a full DSS mapping will be handled
when the mapping is processed.

The DATA_FIN is acknowledged by the worker if the reader is caught
up. If there is still data to be moved to the MPTCP-level queue, ack_seq
will be incremented to account for the DATA_FIN when it reaches the end
of the stream and a DATA_ACK will be sent to the peer.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
16a9a9da17 mptcp: Add helper to process acks of DATA_FIN
After DATA_FIN has been sent, the peer will acknowledge it. An ack of
the relevant MPTCP-level sequence number will update the MPTCP
connection state appropriately.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
6920b85158 mptcp: Add mptcp_close_state() helper
This will be used to transition to the appropriate state on close and
determine if a DATA_FIN needs to be sent for that state transition.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
3721b9b646 mptcp: Track received DATA_FIN sequence number and add related helpers
Incoming DATA_FIN headers need to propagate the presence of the DATA_FIN
bit and the associated sequence number to the MPTCP layer, even when
arriving on a bare ACK that does not get added to the receive queue. Add
structure members to store the DATA_FIN information and helpers to set
and check those values.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:42 -07:00
Mat Martineau
7279da6145 mptcp: Use MPTCP-level flag for sending DATA_FIN
Since DATA_FIN information is the same for every subflow, store it only
in the mptcp_sock.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:41 -07:00
Mat Martineau
242e63f651 mptcp: Remove outdated and incorrect comment
mptcp_close() acquires the msk lock, so it clearly should not be held
before the function is called.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:41 -07:00
Mat Martineau
57baaf2875 mptcp: Return EPIPE if sending is shut down during a sendmsg
A MPTCP socket where sending has been shut down should not attempt to
send additional data, since DATA_FIN has already been sent.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:41 -07:00
Mat Martineau
0bac966a1f mptcp: Allow DATA_FIN in headers without TCP FIN
RFC 8684-compliant DATA_FIN needs to be sent and ack'd before subflows
are closed with TCP FIN, so write DATA_FIN DSS headers whenever their
transmission has been enabled by the MPTCP connection-level socket.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 17:02:41 -07:00
David S. Miller
0003041e7a Merge branch 'sockptr_t-fixes-v2'
Christoph Hellwig says:

====================
sockptr_t fixes v2

a bunch of fixes for the sockptr_t conversion

Changes since v1:
 - fix a user pointer dereference braino in bpfilter
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:43:40 -07:00
Christoph Hellwig
a31edb2059 net: improve the user pointer check in init_user_sockptr
Make sure not just the pointer itself but the whole range lies in
the user address space.  For that pass the length and then use
the access_ok helper to do the check.

Fixes: 6d04fe15f7 ("net: optimize the sockptr_t for unified kernel/user address spaces")
Reported-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:43:40 -07:00
Christoph Hellwig
d3c4815151 net: remove sockptr_advance
sockptr_advance never properly worked.  Replace it with _offset variants
of copy_from_sockptr and copy_to_sockptr.

Fixes: ba423fdaa5 ("net: add a new sockptr_t type")
Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:43:40 -07:00
Christoph Hellwig
035bfd051e net: make sockptr_is_null strict aliasing safe
While the kernel in general is not strict aliasing safe we can trivially
do that in sockptr_is_null without affecting code generation, so always
check the actually assigned union member.

Reported-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:43:40 -07:00
Christoph Hellwig
a3ad434ad7 netfilter: arp_tables: restore a SPDX identifier
This was accidentally removed in an unrelated commit.

Fixes: c2f12630c6 ("netfilter: switch nf_setsockopt to sockptr_t")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:43:40 -07:00
David S. Miller
21db923eae Merge branch 'mlxsw-Add-support-for-QSFP-DD-transceiver-type'
Ido Schimmel says:

====================
mlxsw: Add support for QSFP-DD transceiver type

This patch set from Vadim adds support for Quad Small Form Factor
Pluggable Double Density (QSFP-DD) modules in mlxsw.

Patch #1 enables dumping of QSFP-DD module information through ethtool.

Patch #2 enables reading of temperature thresholds from QSFP-DD modules
for hwmon and thermal zone purposes.

Changes since v1 [1]:

Only rebase on top of net-next. After discussing with Andrew and Adrian
we agreed that current approach is OK and that in the future we can
follow Andrew's suggestion to "make a new API where user space can
request any pages it want, and specify the size of the page". This
should allow us "to work around known issues when manufactures get their
EEPROM wrong".

[1] https://lore.kernel.org/netdev/20200626144724.224372-1-idosch@idosch.org/#t
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:28:02 -07:00
Vadim Pasternak
f152b41ba6 mlxsw: core: Add support for temperature thresholds reading for QSFP-DD transceivers
Allow QSFP-DD transceivers temperature thresholds reading for hardware
monitoring and thermal control.

For this type, the thresholds are located in page 02h according to the
"Module and Lane Thresholds" description from Common Management
Interface Specification.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:28:02 -07:00
Vadim Pasternak
6af496adcb mlxsw: core: Add ethtool support for QSFP-DD transceivers
The Quad Small Form Factor Pluggable Double Density (QSFP-DD) hardware
specification defines a form factor that supports up to 400 Gbps in
aggregate over an 8x50-Gbps electrical interface. The QSFP-DD supports
both optical and copper interfaces.

Implementation is based on Common Management Interface Specification;
Rev 4.0 May 8, 2019. Table 8-2 "Identifier and Status Summary (Lower
Page)" from this spec defines "Id and Status" fields located at offsets
00h - 02h. Bit 2 at offset 02h ("Flat_mem") specifies QSFP EEPROM memory
mode, which could be "upper memory flat" or "paged". Flat memory mode is
coded "1", and indicates that only page 00h is implemented in EEPROM.
Paged memory is coded "0" and indicates that pages 00h, 01h, 02h, 10h
and 11h are implemented. Pages 10h and 11h are currently not supported
by the driver.

"Flat" memory mode is used for the passive copper transceivers. For this
type only page 00h (256 bytes) is available. "Paged" memory is used for
the optical transceivers. For this type pages 00h (256 bytes), 01h (128
bytes) and 02h (128 bytes) are available. Upper page 01h contains static
advertising field, while upper page 02h contains the module-defined
thresholds and lane-specific monitors.

Extend enumerator 'mlxsw_reg_mcia_eeprom_module_info_id' with additional
field 'MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID'. This field is used to
indicate for QSFP-DD transceiver type which memory mode is to be used.

Expose 256 bytes buffer for QSFP-DD passive copper transceiver and
512 bytes buffer for optical.

Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:28:02 -07:00
David S. Miller
0082dd8ae1 mlx5-updates-2020-07-28
Misc and small update to mlx5 driver:
 
 1) Aya adds PCIe relaxed ordering support for mlx5 netdev queues.
 2) Eran Refactors pages data base to be per vf/function to speedup
    unload time.
 3) Parav changes eswitch steering initialization to account for
    tota_vports rather than for only active vports and
    Link non uplink representors to PCI device, for uniform naming scheme.
 
 4) Tariq, trivial RX code improvements and missing inidirect calls
    wrappers.
 
 5) Small cleanup patches
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl8f8fgACgkQSD+KveBX
 +j5pgggAtjGZ3dWt43WNIMEYum/OZYrIhAVIaB4IgPY2Q90i26A7VMpBUOpQxTzk
 Z+PYiSuiwplWeQ0k9GW2yMCAEACsuiSQXQG/eUA4jKRKf9hQEXi5guuVEIYJRm68
 uL4aI4m4Aygx/n2UTdajARB69RVZ02ZhoNucMGETw4gzvqLdSgVCyR2tiDnPiKKW
 3+y9nViglik7XqvDrYT9H+7vo8geCg5zBWEABy/Ls0VUCWN2xCjPNbGL9VCMG8Q+
 A6n5SdhaOMIc9u5vkcE4CVgOnUPwNPQeu+GVgEodZ7GhFwX897A8dvFwljfML/UU
 rBRPgC2kUTtGDZxVcSBMlBjrXl6zMQ==
 =iwh/
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-07-28

Misc and small update to mlx5 driver:

1) Aya adds PCIe relaxed ordering support for mlx5 netdev queues.
2) Eran Refactors pages data base to be per vf/function to speedup
   unload time.
3) Parav changes eswitch steering initialization to account for
   tota_vports rather than for only active vports and
   Link non uplink representors to PCI device, for uniform naming scheme.

4) Tariq, trivial RX code improvements and missing inidirect calls
   wrappers.

5) Small cleanup patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 13:23:31 -07:00
Vaibhav Gupta
f21bbd6330 farsync: use generic power management
The .suspend() and .resume() callbacks are not defined for this driver.
Still, their power management structure follows the legacy framework. To
bring it under the generic framework, simply remove the binding of
callbacks from "struct pci_driver".

Change code indentation from space to tab in "struct pci_driver".

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-28 12:56:52 -07:00
Julia Lawall
22f9d2f4ee net/mlx5: drop unnecessary list_empty
list_for_each_entry is able to handle an empty list.
The only effect of avoiding the loop is not initializing the
index variable.
Drop list_empty tests in cases where these variables are not
used.

Note that list_for_each_entry is defined in terms of list_first_entry,
which indicates that it should not be used on an empty list.  But in
list_for_each_entry, the element obtained by list_first_entry is not
really accessed, only the address of its list_head field is compared
to the address of the list head, so the list_first_entry is safe.

The semantic patch that makes this change is as follows (with another
variant for the no brace case): (http://coccinelle.lip6.fr/)

<smpl>
@@
expression x,e;
iterator name list_for_each_entry;
statement S;
identifier i;
@@

-if (!(list_empty(x))) {
   list_for_each_entry(i,x,...) S
- }
 ... when != i
? i = e
</smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:57 -07:00
Gustavo A. R. Silva
c8b838d108 net/mlx5: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:55 -07:00
Alex Vesker
ffdc8ec0b7 net/mlx5: DR, Reduce print level for matcher print
There is no need to print on each unsuccessful matcher
ip_version combination since it probably will happen when
trying to create all the possible combinations.
On a real failure we have a print in the calling function.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:52 -07:00
Aya Levin
17347d5430 net/mlx5e: Add support for PCI relaxed ordering
The concept of Relaxed Ordering in the PCI Express environment allows
switches in the path between the Requester and Completer to reorder some
transactions just received before others that were previously enqueued.

In ETH driver, there is no question of write integrity since each memory
segment is written only once per cycle. In addition, the driver doesn't
access the memory shared with the hardware until the corresponding CQE
arrives indicating all PCI transactions are done.

Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has
300% improvement in the bandwidth.

With relaxed ordering turned off: BW:10 [GB/s]
With relaxed ordering turned on: BW:40 [GB/s]

The driver turns relaxed ordering with respect to the firmware
capabilities and the return value from pcie_relaxed_ordering_enabled().

Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:49 -07:00
Tariq Toukan
5d0b847694 net/mlx5e: Use indirect call wrappers for RX post WQEs functions
Use the indirect call wrapper API macros for declaration and scope
of the RX post WQEs functions.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:47 -07:00
Tariq Toukan
b307f7f163 net/mlx5e: Move exposure of datapath function to txrx header
Move them from the generic header file "en.h", to the
datapath header file "txrx.h".

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-07-28 02:37:44 -07:00