Use the newly added pldmfw library to implement device flash update for
the Intel ice networking device driver. This support uses the devlink
flash update interface.
The main parts of the flash include the Option ROM, the netlist module,
and the main NVM data. The PLDM firmware file contains modules for each
of these components.
Using the pldmfw library, the provided firmware file will be scanned for
the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
the main NVM module containing the primary device firmware, and
"fw.netlist" containing the netlist module.
The flash is separated into two banks, the active bank containing the
running firmware, and the inactive bank which we use for update. Each
module is updated in a staged process. First, the inactive bank is
erased, preparing the device for update. Second, the contents of the
component are copied to the inactive portion of the flash. After all
components are updated, the driver signals the device to switch the
active bank during the next EMP reset (which would usually occur during
the next reboot).
Although the firmware AdminQ interface does report an immediate status
for each command, the NVM erase and NVM write commands receive status
asynchronously. The driver must not continue writing until previous
erase and write commands have finished. The real status of the NVM
commands is returned over the receive AdminQ. Implement a simple
interface that uses a wait queue so that the main update thread can
sleep until the completion status is reported by firmware. For erasing
the inactive banks, this can take quite a while in practice.
To help visualize the process to the devlink application and other
applications based on the devlink netlink interface, status is reported
via the devlink_flash_update_status_notify. While we do report status
after each 4k block when writing, there is no real status we can report
during erasing. We simply must wait for the complete module erasure to
finish.
With this implementation, basic flash update for the ice hardware is
supported.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After a flash update, the pending status of the update can be determined
from the device capabilities.
Read the appropriate device capability and store whether there is
a pending update awaiting a reboot.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add structures, identifiers, and helper functions for several AdminQ
commands related to performing a firmware update for the ice hardware.
These will be used in future code for implementing the devlink
.flash_update handler.
Signed-off-by: Cudzilo, Szymon T <szymon.t.cudzilo@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extends function parsing response from Discover Device
Capability AQC to check if the device supports unified NVM update flow.
Signed-off-by: Jacek Naczyk <jacek.naczyk@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow QSFP-DD transceivers temperature thresholds reading for hardware
monitoring and thermal control.
For this type, the thresholds are located in page 02h according to the
"Module and Lane Thresholds" description from Common Management
Interface Specification.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Quad Small Form Factor Pluggable Double Density (QSFP-DD) hardware
specification defines a form factor that supports up to 400 Gbps in
aggregate over an 8x50-Gbps electrical interface. The QSFP-DD supports
both optical and copper interfaces.
Implementation is based on Common Management Interface Specification;
Rev 4.0 May 8, 2019. Table 8-2 "Identifier and Status Summary (Lower
Page)" from this spec defines "Id and Status" fields located at offsets
00h - 02h. Bit 2 at offset 02h ("Flat_mem") specifies QSFP EEPROM memory
mode, which could be "upper memory flat" or "paged". Flat memory mode is
coded "1", and indicates that only page 00h is implemented in EEPROM.
Paged memory is coded "0" and indicates that pages 00h, 01h, 02h, 10h
and 11h are implemented. Pages 10h and 11h are currently not supported
by the driver.
"Flat" memory mode is used for the passive copper transceivers. For this
type only page 00h (256 bytes) is available. "Paged" memory is used for
the optical transceivers. For this type pages 00h (256 bytes), 01h (128
bytes) and 02h (128 bytes) are available. Upper page 01h contains static
advertising field, while upper page 02h contains the module-defined
thresholds and lane-specific monitors.
Extend enumerator 'mlxsw_reg_mcia_eeprom_module_info_id' with additional
field 'MLXSW_REG_MCIA_EEPROM_MODULE_INFO_TYPE_ID'. This field is used to
indicate for QSFP-DD transceiver type which memory mode is to be used.
Expose 256 bytes buffer for QSFP-DD passive copper transceiver and
512 bytes buffer for optical.
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_for_each_entry is able to handle an empty list.
The only effect of avoiding the loop is not initializing the
index variable.
Drop list_empty tests in cases where these variables are not
used.
Note that list_for_each_entry is defined in terms of list_first_entry,
which indicates that it should not be used on an empty list. But in
list_for_each_entry, the element obtained by list_first_entry is not
really accessed, only the address of its list_head field is compared
to the address of the list head, so the list_first_entry is safe.
The semantic patch that makes this change is as follows (with another
variant for the no brace case): (http://coccinelle.lip6.fr/)
<smpl>
@@
expression x,e;
iterator name list_for_each_entry;
statement S;
identifier i;
@@
-if (!(list_empty(x))) {
list_for_each_entry(i,x,...) S
- }
... when != i
? i = e
</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
There is no need to print on each unsuccessful matcher
ip_version combination since it probably will happen when
trying to create all the possible combinations.
On a real failure we have a print in the calling function.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The concept of Relaxed Ordering in the PCI Express environment allows
switches in the path between the Requester and Completer to reorder some
transactions just received before others that were previously enqueued.
In ETH driver, there is no question of write integrity since each memory
segment is written only once per cycle. In addition, the driver doesn't
access the memory shared with the hardware until the corresponding CQE
arrives indicating all PCI transactions are done.
Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has
300% improvement in the bandwidth.
With relaxed ordering turned off: BW:10 [GB/s]
With relaxed ordering turned on: BW:40 [GB/s]
The driver turns relaxed ordering with respect to the firmware
capabilities and the return value from pcie_relaxed_ordering_enabled().
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the indirect call wrapper API macros for declaration and scope
of the RX post WQEs functions.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Instead of exposing the RQ datapath handlers (from en_rx.c) so that
they are set in the control path (in en_main.c), wrap this logic
in a single function in en_rx.c and expose it alone.
Every profile will now have a pointer to the new mlx5e_rx_handlers
structure, instead of directly pointing to the previously-exposed
RQ handlers.
This significantly improves locality and modularity of the driver,
and allows many functions in en_rx.c to become static.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently PF and VF representors are exposed as virtual device.
They are not linked to its parent PCI device like how uplink
representor is linked.
Due to this, PF and VF representors cannot benefit of the
systemd defined naming scheme. This requires special handling
by the users.
Hence, link the PF and VF representors to their parent PCI device
similar to existing uplink representor netdevice.
Example:
udevadm output before linking to PCI device:
$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
Unload module index
Unloaded link configuration context.
udevadm output after linking to PCI device:
$ udevadm test-builtin net_id /sys/class/net/eth6
Load module index
Network interface NamePolicy= disabled on kernel command line, ignoring.
Parsed configuration file /usr/lib/systemd/network/99-default.link
Created link configuration context.
Using default interface naming scheme 'v243'.
ID_NET_NAMING_SCHEME=v243
ID_NET_NAME_PATH=enp0s8f0npf0vf0
Unload module index
Unloaded link configuration context.
In past there was little concern over seeing 10,000 lines output
showing up at thread [1] is not applicable as ndo ops for VF
handling is not exposed for all the 100 repesentors for mlx5 devices.
Additionally alternative device naming [2] to overcome shorter device
naming is also part of the latest systemd release v245.
[1] https://marc.info/?l=linux-netdev&m=152657949117904&w=2
[2] https://lwn.net/Articles/814068/
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently steering table and rx group initialization helper
routines works on the total_vports passed as input parameter.
Both eswitch helpers work on the mlx5_eswitch and thereby have access
to esw->total_vports. Hence use it directly instead of passing it
via function input arguments.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Total e-switch vports are already stored in mlx5_eswitch total_vports.
Avoid copy of it in nvports and reuse existing total_vports calculation.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When eswitch is enabled, VFs might not be enabled. Hence, consider
maximum number of VFs.
This further closes the gap between handling VF vports between ECPF and
PF.
Fixes: ea2128fd63 ("net/mlx5: E-switch, Reduce dependency on num_vfs during mode set")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add function ID to reclaim pages debug log for better user visibility.
Signed-off-by: Avihu Hagag <avihuh@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".
As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.
Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.
NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After: 0.014 Sec
The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2020-07-27
This series contains updates to igc driver only.
Sasha cleans up double definitions, unneeded and non applicable
registers, and removes unused fields in structs. Ensures the Receive
Descriptor Minimum Threshold Count is cleared and fixes a static checker
error.
v2: Remove fields from hw_stats in patches that removed their uses.
Reworded patch descriptions for patches 1, 2, and 4.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In ef100_reset(), make the MCDI call to do the reset.
Also, do a reset at start-of-day during probe, to put the function in
a clean state.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MC_CMD_GET_CAPABILITIES now has a third word of flags; extend the
efx_has_cap() machinery to cover it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently RX and TX-completion events are unhandled, as neither the RX
nor the TX path has been implemented yet.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Channels are probed, but actual event handling is still stubbed out.
Stub implementation of check_caps is needed because ptp.c will call into
it from efx_ptp_use_mac_tx_timestamps() to decide if it wants TXQs.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We handle everything ourselves in ef100_reset(), rather than relying on
the generic down/up routines.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can't actually do the MCDI to probe it fully until we have working
MCDI, which comes later, but we need efx->phy_data to be allocated so
that when we get MCDI events the link-state change handler doesn't
NULL-dereference.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We don't actually do the efx_mcdi_reset() because we don't have MCDI yet.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No TX or RX path, no MCDI, not even an ifup/down handler.
Besides stubs, the bulk of the patch deals with reading the Xilinx
extended PCIe capability, which tells us where to find our BAR.
Though in the same module, EF100 has its own struct pci_driver,
which is named sfc_ef100.
A small number of additional nic_type methods are added; those in the
TX (tx_enqueue) and RX (rx_packet) paths are called through indirect
call wrappers to minimise the performance impact.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
EF100 adds a few new valid addresses for efx_writed_page(), as well as
a Function Control Window in the BAR whose location is variable.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An MDIO-based n-way restart does not make sense for any of the NICs
supported by this driver, nor for the coming EF100.
Unlike on Falcon (which was already split off into a separate driver),
the PHY on all of Siena, EF10 and EF100 is managed by MC firmware.
While Siena can talk to the PHY over MDIO, doing so for anything other
than debugging purposes (mdio_mii_ioctl) is likely to confuse the
firmware.
(According to the SFC firmware team, this support was originally added
to the Siena driver early in the development of that product, before
it was decided to have firmware manage the PHY.)
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan reports static checker warning:
"The patch 9b6ee3cf95: "qed: sanitize PBL chains allocation" from Jul
23, 2020, leads to the following static checker warning:
drivers/net/ethernet/qlogic/qed/qed_chain.c:299 qed_chain_alloc_pbl()
error: uninitialized symbol 'pbl_virt'.
drivers/net/ethernet/qlogic/qed/qed_chain.c
249 static int qed_chain_alloc_pbl(struct qed_dev *cdev, struct qed_chain *chain)
250 {
251 struct device *dev = &cdev->pdev->dev;
252 struct addr_tbl_entry *addr_tbl;
253 dma_addr_t phys, pbl_phys;
254 __le64 *pbl_virt;
^^^^^^^^^^^^^^^^
[...]
271 if (chain->b_external_pbl)
272 goto alloc_pages;
^^^^^^^^^^^^^^^^ uninitialized
[...]
298 /* Fill the PBL table with the physical address of the page */
299 pbl_virt[i] = cpu_to_le64(phys);
^^^^^^^^^^^
[...]
"
This issue was introduced with commit c3a321b06a ("qed: simplify
initialization of the chains with an external PBL"), when
chain->pbl_sp.table_virt initialization was moved up to
qed_chain_init_params().
Fix it by initializing pbl_virt with an already filled chain struct field.
Fixes: c3a321b06a ("qed: simplify initialization of the chains with an external PBL")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alexander Lobakin <alobakin@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to dump PXP registers and PCIe statistics.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we can report all the full 64-bit CPU endian software accumulated
counters instead of the hw counters, some of which may be less than
64-bit wide. Define the necessary macros to access the software
counters.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have the infrastructure in place, add the new function
bnxt_accumulate_all_stats() to periodically accumulate and check for
counter rollover of all ring stats and port stats.
A chip bug was also discovered that could cause some ring counters to
become 0 during DMA. Workaround by ignoring zeros on the affected
chips.
Some older frimware will reset port counters during ifdown. We need
to check for that and free the accumulated port counters during ifdown
to prevent bogus counter overflow detection during ifup.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If supported by newer firmware, make the firmware call to query all
the port counter masks. If not supported, assume 40-bit port
counter masks.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Newer firmware has a new call HWRM_FUNC_QSTATS_EXT to retrieve the
masks of all ring counters. Make this call when supported to
initialize the hardware masks of all ring counters. If the call
is not available, assume 48-bit ring counter masks on P5 chips.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of these DMAed hardware counters are not full 64-bit counters and
so we need to accumulate them as they overflow. Allocate copies of these
DMA statistics memory blocks with the same size for accumulation. The
hardware counter widths are also counter specific so we allocate
memory for masks that correspond to each counter.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver manages multiple statistics structures of different sizes.
They are all allocated, freed, and handled practically the same. Define
a new bnxt_stats_mem structure and common allocation and free functions
for all staistics memory blocks.
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port statistics structures have hard coded padding and offset.
Define macros to make this look cleaner.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Main changes are 200G support and fixing the definitions of discard and
error counters to match the hardware definitions.
Because the HWRM_PORT_PHY_QCFG message size has now exceeded the max.
encapsulated response message size of 96 bytes from the PF to the VF,
we now need to cap this message to 96 bytes for forwarding. The forwarded
response only needs to contain the basic link status and speed information
and can be capped without adding the new information.
v2: Fix bnxt_re compile error.
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Reviewed-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Edwin Peer <edwin.peer@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove PCIe non-counters display from ethtool statistics, as
they are not simple counters but register dump. The next few
patches will add logic to detect counter roll-over and it won't
work with these PCIe non-counters.
There will be a follow up patch to get PCIe information via
ethtool register dump.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
list_for_each_safe is able to handle an empty list.
The only effect of avoiding the loop is not initializing the
index variable.
Drop list_empty tests in cases where these variables are not
used.
The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)
<smpl>
@@
expression x,e;
iterator name list_for_each_safe;
statement S;
identifier i,j;
@@
-if (!(list_empty(x))) {
list_for_each_safe(i,j,x) S
- }
... when != i
when != j
(
i = e;
|
? j = e;
)
</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/intel/igc/igc_mac.c:424 igc_check_for_copper_link()
error: uninitialized symbol 'link'.
This patch come to fix this warning and initialize the 'link' symbol.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 707abf0695 ("igc: Add initial LTR support")
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>