This structure was mapped but never subsequently unmapped.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_copy_channel() doesn't correctly clear the napi_hash related state.
This means that when napi_hash_add is called for that channel nothing is
done, and we are left with a copy of the napi_hash_node from the old
channel. When we later call napi_hash_del() on this channel we have a
stale napi_hash_node.
Corruption is only seen when there are multiple entries in one of the
napi_hash lists. This is made more likely by having a very large number
of channels. Testing was carried out with 512 channels - 32 channels on
each of 16 ports.
This failure typically appears as protection faults within napi_by_id()
or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring
sizes are changed (ethtool -G).
Fixes: 36763266bb ("sfc: Add support for busy polling")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to stop ctlr if it was already stopped. It can cause timeout
warns. Steps:
- ifconfig eth0 down
- ethtool -l eth0 rx 8 tx 8
- ethtool -l eth0 rx 1 tx 1
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dma ctlr is reseted to 0 while cpdma soft reset, thus cpdma ctlr
cannot be configured after cpdma is stopped. So restoring content
of cpdma ctlr while off/on procedure is needed. The cpdma ctlr off/on
procedure is present while interface down/up and while changing number
of channels with ethtool. In order to not restore content in many
places, move it to cpdma_ctlr_start().
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device's neighbour table is periodically dumped in order to update
the kernel about active neighbours. A single dump session may span
multiple queries, until the response carries less records than requested
or when a record (can contain up to four neighbour entries) is not full.
Current code stops the session when the number of returned records is
zero, which can result in infinite loop in case of high packet rate.
Fix this by stopping the session according to the above logic.
Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When binding port to a newly created span entry, its refcount is
initialized to zero even though it has a bound port. That leads
to unexpected behaviour when the user tries to delete that port
from the span entry.
Fix this by initializing the reference count to 1.
Also add a warning to put function.
Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the physical link is down and the VF virtual link is set to "enable",
the current code does not always work. If the link is down but the
cable is attached, the firmware returns LINK_SIGNAL instead of
NO_LINK. The current code is treating LINK_SIGNAL as link up.
The fix is to treat link as down when the link_status != LINK.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic is missing the check on whether the tx and rx rings are sharing
completion rings or not.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides support for the presence of a KR redriver chip in
between the device PCS and an external PHY. When a redriver chip is
present the device must perform clause 73 auto-negotiation in order to
set the redriver chip for the downstream connection.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the phylib support in the kernel to communicate with and control an
MDIO attached PHY. Use the hardware's MDIO communication mechanism to
communicate with the PHY.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for recognizing and using SFP+ modules directly. This includes
using the I2C support to read and interpret the information returned from
an SFP+ module and configuring things properly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make phy_aneg_done() available to drivers so that the result of the
auto-negotiation initiated by phy_start_aneg() can be determined.
Remove the local implementation of phy_aneg_done() from the Aeroflex
driver and use the phy library version.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to initialize and use the I2C controller within the hardware
in order to perform sideband communication, e.g. determine the SFP media
type that is installed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some versions of the amd-xgbe device are capable of reporting ECC error
information back to the driver. Add support to process, track and report
on this information.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current per channel DMA interrupt support is based on an edge
triggered interrupt that is not maskable. This results in having to call
the disable_irq/enable_irq functions in order to prevent interrupts
during napi processing. The hardware now has a way to configure the per
channel DMA interrupt that will allow for masking the interrupt which
prevents calling disable_irq/enable_irq now. This patch makes use of
this support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the call to netif_get_num_default_rss_queues() and replace it
with num_online_cpus() to allow for the possibility of using all of
the hardware DMA channels available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for new PCI devices to the driver.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the reading of the Tx timestamp to account for a hardware issue
on how the fields and interrupt are cleared. The "seconds" portion of
the timestamp should be read first, followed by the "nanoseconds" portion.
Reading the "nanoseconds" portion should clear the timestamp data and the
interrupt. Because of an issue with the hardware this order is reversed
and reading the "seconds" portion actually clears the timestamp. The code
currently follows this workaround, but to guard against future versions
where this is fixed add a field to the version data to indicate if the
workaround is required or not.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a hardware issue, it is possible for interrupt events to be
incorrectly generated when performing a soft reset. To guard against
this, perform the soft reset twice.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 67f8b1dcb9 ("net/mlx4_en: Refactor the XDP forwarding rings
scheme") added a bug in that the prog's reference count is not dropped
in the error path when mlx4_en_try_alloc_resources() is failing from
mlx4_xdp_set().
We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we
need to release again. Earlier in the call path, dev_change_xdp_fd()
itself holds a reference to the prog as well (hence the '- 1' in the
bpf_prog_add()), so a simple atomic_sub() is safe to use here. When
an error is propagated, then bpf_prog_put() is called eventually from
dev_change_xdp_fd()
Fixes: 67f8b1dcb9 ("net/mlx4_en: Refactor the XDP forwarding rings scheme")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While create/destroy channel operation memory is not freed. It was
supposed that memory is freed while driver remove. But a channel
can be created and destroyed many times while changing number of
channels with ethtool.
Based on net-next/master
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since now, the table with same id in multiple netnamespaces were squashed
to a single virtual router. That is not only incorrect, it also causes
error messages when trying to use RALUE register to do double remove
of FIB entries, like this one:
mlxsw_spectrum 0000:03:00.0: EMAD reg access failed (tid=facb831c00007b20,reg_id=8013(ralue),type=write,status=7(bad parameter))
Since we don't allow ports to change namespaces (NETIF_F_NETNS_LOCAL),
and the infrastructure is not yet prepared to handle netnamespaces, just
ignore FIB notification events for non-init namespaces. That is clear to
do since we don't need to offload them.
Fixes: b45f64d16d ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__neigh_create function works in a different way than assumed.
It passes "n" as a parameter to ndo_neigh_construct. But this "n" might
be destroyed right away before __neigh_create() returns in case there is
already another neighbour struct in the hashtable with the same dev and
primary key. That is not expected by mlxsw_sp_router_neigh_construct()
and the stored "n" points to freed memory, eventually leading to crash.
Fix this by doing tight 1:1 coupling between neighbour struct and
internal driver neigh_entry. That allows to narrow down the key in
internal driver hashtable to do lookups by "n" only.
Fixes: 6cf3c971dc ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previous fix has broken RoCE support as the rdma_pf_params are now
being set into the parameters only after the params are alrady assigned
into the hw-function.
Fixes: 0189efb8f4 ("qed*: Fix Kconfig dependencies with INFINIBAND_QEDR")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently RoCE v2 won't operate with RDMA CM due to missing setting of
the roce-flavour in the ll2 configuration.
This patch properly sets the flavour, and deletes incorrect HSI
that doesn't [yet] exist.
Fixes: abd49676c7 ("qed: Add RoCE ll2 & GSI support")
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support to add or remove the unicast entries
to the table and remove from the table.
Reported-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no clear operation before add a new multicast tcam table,
so the tcam table will be overflow when add more entries.
Reported-by: Daode Huang <huangdaode@hisilicon.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packets of wrong mac address(only the last bit is different) can be
received in Big-endian by current definition of mask_key. Thus it needs
to be modified to support Big-endian and ensure Big-endian normal.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current definition of mac_mc_entry is only suitable for
Little-endian. Thus it needs to modify tcam table of mac mc-entry
to support both Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Little-endian is only supported by current tcam table to add
or delete mac mc-port. This patch makes it support both
Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Big-endian is not supported by the current definition of table index to get
mac entry. It needs to be modified to support both Little-endian
and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current definition of mac_uc_entry is only suitable for
Little-endian. Thus it needs to modify tcam table of mac uc-entry
to support both Little-endian and Big-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current definition of dsaf_drv_tbl_tcam_key is only suitable for
Little-endian. If data is stored in Big-endian, this may lead to
error in data use. Shift operation can make it work normally in both
Big-endian and Little-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware ring buffer data is stored in Little-endian. Thus cpu data
should be modified to Little-endian.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In current scenario, when the interface is disabled we reset the XGMAC
RX/TX functionality. This operation does not affects the PHY layer/SFP
and which appears UP to the remote end(this behaviour is unlike GMAC).
The result is remote end keeps on sending the packets which gets partly
processed by XMAC and dropped. Since these are partly processed these
appears as errored packets in the packet counter statistics.
This patch fixes this behaviour and adds local-fault and remote-fault
functionality which can be used to intimate the remote peer whenever
the state of the interface changes. This patch also removes the
existing hns_dsaf_xge_core_srst_by_port function which was being used
to reset the RX/TX functionality at XGE Core.
Reported-by: Jun He <hjat2005@huawei.com>
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modify the gmac_rx_filt_pkt and gmac_rx_octets_total_filt
statistics value. The two statistics is inconsistent with register,
and just the opposite.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Jun He <hjat2005@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch deletes redundant macro definitions in hns drivers.
And change the .h file containing relation to make the layers
more clearly
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Weiwei Deng <dengweiwei@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When set auto-negotiation off and duplex half, if run "ethtool -r ethX"
on port with phy, then the port will be failed to work. It should
forbid to start auto-negotiation when auto-negotiate is off. This
patch add the limited condition.
Reported-by: Jinchuang Tian <tianjinchuang1@huawei.com>
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default mac pause time set to 0xff which is too short for pausing,
this patch change it to the max value 0xffff.
Signed-off-by: Daode Huang <huangdaode@hisilicon.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Reviewed-by: lipeng <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If set promisc mode when there is some traffic, The service nic will
cause system halted. We reserve the last 6 tcam entry for the 6 ports.
If promisc mode is enabled, we can config the relative tcam as fuzzy
matching and set to be valid, or set the tcam to be invalid
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there is not enough tcam table entries for vlan and multicast
address, HNSv2 needs to add support of fuzzy matching of TCAM tables.
To add fuzzy match of TCAM, we Add the property to mask the bits to
be fuzzy matched
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Reviewed-by: Yisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 82580 and related devices offer a frequency resolution of about
0.029 ppb. This patch lets users of the device benefit from the
increased frequency resolution when tuning the clock.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is automatically done from netif_napi_add(), and we want to not
export napi_hash_add() anymore in the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the opt_* fields to determine the starting point for negotiating the
number of tx/rx completion queues with the vnic server. These contain the
number of queues that the vnic server estimates that it will be able to
allocate. While renegotiation may still occur, using the opt_* fields will
reduce the number of times this needs to happen and will prevent driver
probe timeout on systems using large numbers of ibmvnic client devices per
vnic port.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the PHY has been configured to allow pause frames, then the MAC
should be configured to generate and/or accept those frames.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pause frames are used to enable flow control. A MAC can send and
receive pause frames in order to throttle traffic. However, the PHY
must be configured to allow those frames to pass through.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This improves UDP spreading, and also slightly improves GRO performance
of encapsulated TCP on 7000 series NICs.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mlx5 HW, encapsulation is offloaded by the steering rule having
index into an encapsulation table containing the entire set of headers
to be added by the HW. The driver sets these headers in a buffer when we
are offloading the action.
The code maintains mlx5_encap_entry for each encap header it has
encountered when attempted to offload TC tunnel set action.
This entry maintains a linked list of all the flows sharing the same
encap header, when the last flow is removed from the list the encap
entry is removed.
The actual encap_header is allocated by the driver in the hardware only
if we have layer two neighbour info when the encap entry is created.
While the flow is in the driver, the driver holds a reference on the
neighbour.
When a new flow with encap action is inserted, the code first checks if
the required encap entry exists according to the tunnel set parameters.
If it does the encap is shared, otherwise a new mlx5_encap_entry is
created.
TC action parsing implementation in the driver assumes that tunnel set
action is provided in the same order set by the user, e.g before the
mirred_redirect action.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By implementing this ndo, the host stack will set the vxlan udp port
also to VF representor netdevices. This will allow the TC offload code
in the driver when it gets a tunnel key set action to identify the UDP
port as vxlan, and hence the rule will be a candidate for offloading.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enhance the parsing of offloaded TC rules to set HW matching on outer
(encapsulation) headers.
Parse TC tunnel release action and set it as mlx5 decap action when the
required capabilities are supported.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support steering rules which add encapsulation headers,
encap_id parameter is needed.
Add new mlx5_flow_act struct which holds action related parameter:
action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new
steering rule.
This patch doesn't change any functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of comparing to a const value, check the value of max encap
header size capability as reported by the Firmware.
Fixes: 575ddf5888 ('net/mlx5: Introduce alloc_encap and dealloc_encap commands')
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The alloc and dealloc encap commands will be used in the mlx5e driver,
as such, declare them in a common header file.
Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with
mlx5_encap_{de}alloc.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes regression introduced by patch adding feature flags. It was
already reported and patch followed (it got accepted) but it appears it
was incorrect. Instead of fixing reversed condition it broke a good one.
This patch was verified to actually fix SoC hanges caused by bgmac on
BCM47186B0.
Fixes: db791eb297 ("net: ethernet: bgmac: convert to feature flags")
Fixes: 4af1474e61 ("net: bgmac: Fix errant feature flag check")
Cc: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
We received two reports of BUG_ON in bnad_txcmpl_process() where
hw_consumer_index appeared to be ahead of producer_index. Out of order
write/read of these variables could explain these reports.
bnad_start_xmit(), as a producer of tx descriptors, has a few memory
barriers sprinkled around writes to producer_index and the device's
doorbell but they're not paired with anything in bnad_txcmpl_process(), a
consumer.
Since we are synchronizing with a device, we must use mandatory barriers,
not smp_*. Also, I didn't see the purpose of the last smp_mb() in
bnad_start_xmit().
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 9d2afba058.
The original issue would possibly exist if an external module
tried calling our "ethtool_ops" without checking if it still
exists.
The right way of solving it is by simply doing the check in
the caller side.
Currently, no action is required as there's no such use case.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver uses a union for copying data to & from management firmware
when interacting with it.
Problem is that the function always copies sizeof(union) while commit
2edbff8dcb ("qed: Learn resources from management firmware") is casting
a union elements which is of smaller size [24-byte instead of 88-bytes].
Also, the union contains some inappropriate elements which increase its
size [should have been 32-bytes]. While this shouldn't corrupt other
PF messages to the MFW [as management firmware enforces permissions so
that each PF is allowed to write only to its own mailbox] we fix this
here as well.
Fixes: 2edbff8dcb ("qed: Learn resources from management firmware")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When moving from typhoon_get_settings to typhoon_getlink_ksettings
in the commit f7a5537cd2 ("net: 3com: typhoon: use new api
ethtool_{get|set}_link_ksettings"), we use a local variable supported
but we forgot to update the struct ethtool_link_ksettings with
this value.
We also initialize advertising to zero, because otherwise it may
be uninitialized if no case of the switch (tp->xcvr_select) is used.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of adding hooks inside stmmac_platform it is better to just use
the standard PM callbacks within the specific dwmac-driver. This only
used by the dwmac-rk driver.
This reverts commit cecbc5563a ("stmmac: allow to split suspend/resume
from init/exit callbacks").
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the rk_gmac_init() only calls another function move this
function call into probe so rk_gmac_init() can be removed.
Since commit cecbc5563a ("stmmac: allow to split suspend/resume
from init/exit callbacks") the init hook is no longer used in
dwmac-rk so this can be removed.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the exit hook into a standard driver remove function as
the hook doesn't really buy us anything extra.
Eventually the exit hook will be deprecated in favor of the driver
remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use standard PM resume/suspend callbacks instead of the hooks in
stmmac_platform. This gives the driver more control and flexibility
when implementing PM functionality. The hooks in stmmac_platform
also doesn't buy us anything extra.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to drop the reference taken by class_find_device() in
hnae_get_handle() on errors and when later releasing the handle.
Fixes: 6fe6611ff2 ("net: add Hisilicon Network Subsystem...")
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to drop the references taken by bus_find_device() before
returning from emac_dev_open().
Note that phy_connect still takes a reference to the phy device.
Fixes: 5d69e0076a ("net: davinci_emac: switch to new mdio")
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure to drop the references taken by of_get_child_by_name() and
bus_find_device() before returning from cpsw_phy_sel().
Note that holding a reference to the cpsw-phy-sel device does not
prevent the devres-managed private data from going away.
Fixes: 5892cd135e ("drivers: net: cpsw-phy-sel: Add new driver...")
Cc: Mugunthan V N <mugunthanvnm@ti.com>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-omap@vger.kernel.org
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the variant of amac hardware present in the Broadcom
Northstar2 based SoCs. Northstar2 requires an additional register to be
configured with the port speed/duplexity (NICPM). This can be added to
the link callback to hide it from the instances that do not use this.
Also, clearing of the pending interrupts on init is required due to
observed issues on some platforms.
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the bgmac driver to allow for phy's defined by the device tree
Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Acked-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename sti_dwmac_init to sti_dwmac_set_mode which is a better
description for what it really does.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.
Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since dwmac-sti is a DT only driver checking for OF node is not necessary.
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-11-04
This series contains updates to ixgbe and ixgbevf only.
Don does cleanup and configuration for our X553 devices, related to LED,
auto-negotiation, flow control and SFP+ setup and config. Adds the
(not secret) sauce for B0 hardware for X553 hardware.
Emil provides several fixes, first replaces the driver specific MDIO
defines for the more preferred equivalent kernel ones. Provides a fix
for auto-negotiaion status, by reading a PHY register twice. Introduces
ixgbe_link_operations structure to allow X550EM_a to override the
methods for MDIO access while X550EM_x provides methods to use I2C
combined access.
Mark fixes an issue where the driver was crashing when msix_entires
were not there because they were freed by a previous suspend or remove.
Sowmini Varadhan fixes an issue where an incorrect check for IPPROTO_UDP
in ixgbe_atr(). Then makes sure that the network and transport headers
in the paged data are available in the headlen bytes to calculate the
l4_proto.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove including <generated/utsrelease.h> that don't need it.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The msix_entries memory can be freed by a previous suspend or
remove, so don't crash on close when it isn't there. Also only
clear the interrupts when the interface is up, because there
aren't any when it is not up.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
passed down an sk_buff that has the network and transport
header in the paged data, so it needs to make sure these
headers are available in the headlen bytes to calculate the
l4_proto.
This patch expect that network and transport headers are
already available in the non-paged header dat. The assumption
is that the caller has set this up if l4_proto based Tx
steering is desired.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit 9f12df906c ("ixgbe: Store VXLAN port number in network order")
incorrectly checks for hdr.ipv4->protocol != IPPROTO_UDP
in ixgbe_atr(). This check should be for "==" instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We were using an old Alpha version of the X550 phy ID. This was leading
to unnecessary queries of the PHY. I removed the old ID (which shouldn't
be on any HW) and add the two that are.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch add X553 FW ALEF support for B0. ALEF is the new unified
FW. This contains updated register defines for ALEF speed
configuration. Likewise it also removes the AN_CNTL_8 usage from
the native SFI flow as it is no longer supported by FW.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix an issue where set_phy_power was NULL for X550 copper devices
because get_invariants was called before hw->device_id was set.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce ixgbe_link_operations struct with the following changes:
read_i2c_combined => read_link
read_i2c_combined_unlocked => read_link_unlocked
write_i2c_combined => write_link
write_i2c_combined_unlocked => write_link_unlocked
This will allow X550EM_a to override these methods for MDIO access
while X550EM_x provides methods to use I2C combined access. This
also adds a new structure, ixgbe_link_info, to hold information
about the link. Initially this is just method pointers and a bus
address.
The functions involved in combined I2C accesses were moved from
ixgbe_phy.c to ixgbe_x550.c. The underlying functions that carry
out the combined I2C accesses were left in ixgbe_phy.c because
they share some functions with other I2C methods.
v2 - set hw->link.ops in probe.
v3 - check ii->link_ops before setting it since we don't have it
for all devices.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove SFP ixfi code since there is no HW that currently supports it.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The msix_entries memory can be freed by a previous suspend or
remove, so don't crash on close when it isn't there.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds X553 flow control auto negotiation for fiber and
backplain. To enable this new function pointers were added as well
as creating a function to dynamically set function pointer we can't
define only on MAC type.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Read the PHY register twice in order to get the correct value for
autoneg_status.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace some ixgbe specific MDIO defines with their equivalent
from the kernel.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch updates ixgbe_setup_phy_link_generic to set/unset
auto-negotiation for all speeds. This ensures that unsupported
speeds are unset. This is necessary since the PHY NVM may
advertise unsupported speeds.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support to get the LED link active via the LEDCTL
register. If the LEDCTL register does not have LED link active
(LED mode field = 0x0100) set then default LED link active returned.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
X553 doesn't need all the initialization that X552 did for iXFI. This
patch will allow native SPI SFP+ to work with X553 devices. Future
patches will add additional configuration as needed.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When prof_sel is invalid, mlx5_core_warn is called but the
mlx5_core_dev is not initialized yet. Solution is moving the prof_sel code
after dev->pdev assignment
Fixes: 2974ab6e8b ('net/mlx5: Improve driver log messages')
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As for the current generation of the mlx5 HW (CX4/CX4-Lx) per flow vlan
push/pop actions are emulated, we must not program them to the firmware.
Fixes: f5f8247609 ('net/mlx5: E-Switch, Support VLAN actions in the offloads mode')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We ignored the vlan priority in offloaded TC rules matching part,
fix that.
Fixes: 095b6cfd69 ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VF reps should be altogether on the same NS as they were created.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mlx5e_open_channel CQs must be created before napi is enabled.
Here we move the XDP CQ creation to satisfy that fact.
mlx5e_close_channel is already working according to the right order.
Fixes: b5503b994e ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of mlx5e_open_rq fails the error handling will jump to
label err_close_xdp_sq and will try to close the xdp_sq unconditionally.
xdp_sq is valid only in case of XDP use cases, i.e priv->xdp_prog is
not null.
To fix this in this patch we test xdp_sq validity prior to closing it.
In addition we now close the xdp_sq.cq as well.
Fixes: b5503b994e ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most infrastructure can be reused, provide separate handling
of context offsets and exit codes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_bpf_offload() takes all .setup_tc() parameters but it
doesn't use them at the moment. Remove unnecessary ones to make
it possible for XDP to reuse this function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add XDP support. Separate stack's and XDP's TX rings logically.
Add functions for handling XDP_TX and cleanup of XDP's TX rings.
For XDP allocate all RX buffers as separate pages and map them
with DMA_BIDIRECTIONAL.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calculate packet offsets early in nfp_net_rx() so that we will be
able to use them in upcoming XDP handler. While at it move relevant
variables into the loop scope.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow changing the number of rings via ethtool .set_channels API.
Runtime reconfig needs to be extended to handle number of rings.
We need to be able to activate interrupt vectors before rings are
assigned to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We will need to rerun the initialization of the RSS indirection table
after the number of rings is changed. Move the code to a separate
function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of fixing ring -> vector relations up in ring swap functions
put the reassignment into a helper function which will reinit all
links.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming XDP support will break the assumption that one can iterate
over IRQ vectors to get to all the rings easily. Use nn->.x_ring
arrays directly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ring allocation helpers encapsulate all ring allocation and
initialization steps nicely. Reuse them on .ndo_open() path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Shadow" in ring helpers used to mean that the helper will allocate
rings without touching existing configuration, this was used for
reconfiguration while the device was running. We will soon use
the same helpers for .ndo_open() path, so replace "shadow" with
"ring_set".
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All functions which need to reallocate ring resources at runtime
look very similar. Centralize that logic into a separate function.
Encapsulate configuration parameters in a structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report number of rings via ethtool .get_channels API.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the driver framework to separate out platform/ACPI specific code
from general code during device initialization. This will allow for the
introduction of PCI device support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx and Rx DMA channel status determiniation is different depending on the
version of the hardware. Update the channel status processing code to
account for the change. Also, reduce the timeout value used when stopping
the channels.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading all management counter registers as 64-bit
values. The indication of whether to read the high 32-bits to form
a 64-bit value is indicated in the version data.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to support accessing of the PCS registers
in a new way, while maintaining the current access method. Provide a
version specific field that indicates the method to use.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to be able to use clause 37 auto-negotiation.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare for the future introduction of clause 37 auto-negotiation by
updating the current auto-negotiation related functions to identify
them as clause 73 functions. Move interrupt enablement to the
enable/disable auto-negotiation functions. Update what will be common
routines to check for the current type of AN and process accordingly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to work with more than one type of phy by
adding additional callable functions into the phy interface and removing
phy specific settings/functions from non-phy related files.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate the FIFO across the hardware Rx queues based on the priority
of the queues. Giving more FIFO resources to queues with a higher
priority. If PFC is active but not enabled for a queue, then less
resources can allocated to the queue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the Rx and Tx fifos are evenly allocated between the hardware
queues of the device. As more queues are instantiated, the fifo memory
needs to be able to be allocated based on queue priority. This allows for
higher priority queues to have more fifo memory than lower priority
queues. Prepare for this by modifying the current fifo calculation to
assign the fifo queue allocation in an array that is then used to program
the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the length value used for the PCS register dump so that the full
value can be displayed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the ehea driver is missing a call to netif_carrier_off()
before the interface bring-up; this is necessary in order to
initialize the __LINK_STATE_NOCARRIER bit in the net_device state
field. Otherwise, we observe state UNKNOWN on "ip address" command
output.
This patch adds a call to netif_carrier_off() on ehea's net device
open callback.
Reported-by: Xiong Zhou <zhou@redhat.com>
Reference-ID: IBM bz #137702, Red Hat bz #1089134
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The system-status register is actually 16-bit wide and not 8 bit-wide.
Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver allocates replacement buffers before-hand to make
sure whenever an aggregation begins there would be a replacement
for the Rx buffers, as we can't release the buffer until
aggregation is terminated and driver logic assumes the Rx rings
are always full.
For every other Rx page that's being allocated [I.e., regular]
the page is being completely mapped while for the replacement
buffers only the first portion of the page is being mapped.
This means that:
a. Once replacement buffer replenishes the regular Rx ring,
assuming there's more than a single packet on page we'd post unmapped
memory toward HW [assuming mapping is actually done in granularity
smaller than page].
b. Unmaps are being done for the entire page, which is incorrect.
Fixes: 55482edc25 ("qede: Add slowpath/fastpath support and enable hardware GRO")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820.
Acked-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
which is bit mask. This is wrong. Hw actually provides us enum.
Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
Fixes: bf751ba802 ("driver/net: enic: record q_number and rss_hash for skb")
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP statistics are reported in ethtool, in total and per ring,
as follows:
- xdp_drop: the number of packets dropped by xdp.
- xdp_tx: the number of packets forwarded by xdp.
- xdp_tx_full: the number of times an xdp forward failed
due to a full tx xdp ring.
In addition, all packets that are dropped/forwarded by XDP
are no longer accounted in rx_packets/rx_bytes of the ring,
so that they count traffic that is passed to the stack.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separately manage the two types of TX rings: regular ones, and XDP.
Upon an XDP set, do not borrow regular TX rings and convert them
into XDP ones, but allocate new ones, unless we hit the max number
of rings.
Which means that in systems with smaller #cores we will not consume
the current TX rings for XDP, while we are still in the num TX limit.
XDP TX rings counters are not shown in ethtool statistics.
Instead, XDP counters will be added to the respective RX rings
in a downstream patch.
This has no performance implications.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support XDP CQ type, and refactor the CQ type enum.
Rename the is_tx field to match the change.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The coalesce settings behave badly when changing just one value:
... # ethtool -c eth0
rx-usecs: 249
... # ethtool -C eth0 tx-usecs 250
... # ethtool -c eth0
rx-usecs: 248
This occurs due to rounding errors when calculating the microseconds
value - the divisons round down. This causes (eg) the rx-usecs to
decrease by one every time the tx-usecs value is set as per the above.
Fix this by making the divison round-to-nearest.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation says that SGMII_LN_UCDR_SO_GAIN_MODE0 should be
set to 0, not 6, on the Qualcomm Technologies QDF2432.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the interrupt trigger region id to 2 and the
corresponding threshold set0/set1 values to 8/16.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since ethernet v1 hardware has a bug related to coalescing, disabling
this feature.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to always allocate the same number of TX and RX rings
so the support for having r_vectors without one of the rings
was dropped. That makes us, however, unnecessarily limited
to 8 TX rings (8 is the Linux RSS default) most of the time.
Also we are about to add channel count configuration via
ethtool, so bring that support back. TX rings can now default
to num_online_cpus() and RX rings to netif_get_num_default_rss_queues().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
num_irqs is not used anywhere, replace it with max_r_vecs which holds
number of allocated RX/TX vectors and is going to be useful soon.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_irqs_wanted() doesn't really encapsulate much logic,
remove it and inline the calculations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use unsigned int consistently for vector/ring counts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently using define for max TX rings to allocate IRQ
vectors. It's OK since the max number of rings for TX and RX
are currently the same, but lets make the code nicer by taking
max of the two.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already force ring sizes to be power of 2 so replace
modulo operations with AND (size - 1) in index calculations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a separate buffer allocation function to be called
from NAPI. We can make assumptions about the context and
buffer size.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Speed up RX processing by moving to the alloc_frag()/build_skb()
paradigm. Since we're no longer mapping the entire buffer for
DMA add helpers which take care of calculating offsets and
lengths.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_rx() is quite long already and about to get longer.
Move buffer drop/recycle to a helper.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper function to calculate the buffer size at run time.
Buffer lengths will now depend on the FW prepend configuration
instead of assuming the most space consuming configuration and
defaulting to 2k buffers at initialization time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't declare functions as static inline in .c files and
remove dead code it was hiding.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ether_setup() will be invoked by alloc_etherdev_mqs(), no need
to call it again.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop all code related to nfp3200. It was never widely deployed
as a NIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few variables in nfp_net_poll() which are used only
once or unused but set. Remove them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When relaxing the limitation on the number of unicast MAC filters
an interface can configure, qed started passing the MAC quota to
qede. However, the value is initialized only for PFs, causing VFs
to always try and configure themselves as promiscuous
[as they believe they lack the resources to configure the rx-mode].
Fixes: 7b7e70f979 ("qed*: Allow unicast filtering")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver is now setting the ndev's priv_flags instead of adding to it,
causing pktgen failure to utilize various features due to the loss
of the IFF_TX_SKB_SHARING indication.
Fixes: 7b7e70f979 ("qed*: Allow unicast filtering")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today. I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.
On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.
This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Fixes: 56ceecde1f ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed some of unnecessary if statements and unreachable code found by
static code analysis tool.
The return value of i40e_vsi_control_rings(..., false) is always 0. So,
test for non-zero will never be true. The function has been split into
"int i40e_vsi_start_rings()" and "void i40e_vsi_stop_rings()" for better
understanding.
Similarly, the function i40e_vsi_kill_vlan() never fails. So, checking
for return value is also unnecessary. Function definition changed to void.
The i40e_loopback_test() function is not implemented. The function and
all references to loopback testing were removed.
Change-ID: Id45cf66f6689ce2bc4e887de13f073e30e8431bd
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds I40E_NVMUPD_STATE_ERROR state for NVM update.
Without this patch driver has no possibility to return NVM image write
failure.This state is being set when ARQ rises error.
arq_last_status is also updated every time when ARQ event comes,
not only on error cases.
Change-ID: I67ce43ba22a240773c2821b436e96054db0b7c81
Signed-off-by: Maciej Sosin <maciej.sosin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For some cases when reading from device are incorrect or image is
incorrect, this part of code causes crash due to division by zero.
Change-ID: I8961029a7a87b0a479995823ef8fcbf6471405e1
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a VF is reset, it gets a new VSI, so all of its MAC filters go
away. Correctly set the number of filters to 0 when freeing VF
resources. This corrects a problem with failure to add filters when the
VF driver is reloaded.
Change-ID: I2acbecf734287b67473bb225293e14b5096acbef
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch reorders the logic at the end of i40e_tx_map to address the
fact that the logic was rather convoluted and much larger than it needed
to be.
In order to try and coalesce the code paths I have updated some of the
comments and repurposed some of the variables in order to reduce
unnecessary overhead.
This patch does the following:
1. Quit tracking skb->xmit_more with a flag, just max out packet_stride
2. Drop tail_bump and do_rs and instead just use desc_count and td_cmd
3. Pull comments from ixgbe that make need for wmb() more explicit.
Change-ID: Ic7da85ec75043c634e87fef958109789bcc6317c
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a common method for finding a VSI by type. The main
motivation for doing this is that the Flow Director path actually had two
ways of handling this, one stopped on first match and one did not. This
patch makes it so that all callers of this function will get the same
approach for finding a VSI.
Change-ID: Ibf25de8acd8466582520694424aa87da66965fbd
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove the second call to msleep outside the loop, and move the msleep
within the loop as the first step. This guarantees that a single loop
will wait the minimum time first, and then after the reset finishes we
no longer need an extra msleep.
Change-ID: Ib2086f0a142402b614f67846bc091754203a0b9a
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current Rx timestamp hang logic is not very robust because it does
not notice a register is hung until all four timestamps have been
latched and we wait a full 5 seconds. Replace this logic with a newer Rx
hang detection based on storing the jiffies when we first notice
a receive timestamp event. We store each register's time separately,
along with a flag indicating if it is currently latched. Upon first
transitioning to latch, we will update the latch_events[i] jiffies
value. This indicates the time we first noticed this event. The watchdog
routine will simply check that the either the flag has been cleared, or
we have passed at least one second. In this case, it is able to clear
the Rx timestamp register under the assumption that it was for a dropped
frame. The benefit if this strategy is that we should be able to
detect and clear out stalled RXTIME_H registers before we exhaust the
supply of 4, and avoid complete stall of Rx timestamp events.
Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We need a locking mechanism to protect the hardware SYSTIME register
which is split over 2 values, and has internal hardware latching. We
can't allow multiple accesses at the same time. However....
The spinlock_t is overkill here, especially use of spin_lock_irqsave,
since every PTP access will halt hardirqs. Notice that the only places
which need the SYSTIME value are user context and are capable of sleeping.
Thus, it is safe to use a mutex here instead of the spinlock.
Change-ID: I971761a89b58c6aad953590162e85a327fbba232
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When hardware has taken a timestamp for a received packet, it indicates
which RXTIME register the timestamp was placed in by some bits in the
receive descriptor. It uses 3 bits, one to indicate if the descriptor
index is valid (ie: there was a timestamp) and 2 bits to indicate which
of the 4 registers to read. However, the driver currently does not check
the TSYNVALID bit and only checks the index. It assumes a zero index
means no timestamp, and a non zero index means a timestamp occurred.
While this appears to be true, it prevents ever reading a timestamp in
RXTIME[0], and causes the first timestamp the device captures to be
ignored.
Fix this by using the TSYNVALID bit correctly as the true indicator of
whether the packet has an associated timestamp.
Also rename the variable rsyn to tsyn as this is more descriptive and
matches the register names.
Change-ID: I4437e8f3a3df2c2ddb458b0fb61420f3dafc4c12
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We duplicate some code around adding and deleting filters using the
adminq interface. This is prone to errors in case there are bugs. Use
functions which extract the logic to their own portion so that we don't
duplicate it twice in code.
Change-ID: I60d68aeb887976787dec00b23ab386a106e61465
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We determine that a VSI is in vlan_mode whenever it has any filters
with a VLAN other than -1 (I40E_VLAN_ALL). The previous method of doing
so was to perform a loop whenever we needed the check. However, we can
notice that only place where filters are added (i40e_add_filter) can
change the condition from false to true, and the only place we can
return to false is in i40e_vsi_sync_filters_subtask. Thus, we can remove
the loop and use a boolean directly.
Doing this avoids looping over filters repeatedly especially while we're
already inside a loop over all the filters. This should reduce the
latency of filter operations throughout the driver.
Change-ID: Iafde08df588da2a2ea666997d05e11fad8edc338
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently there exists a bug where adding at least one VLAN and then
removing all VLANs leaves the mac filters for the VSI with an incorrect
value for 'vid' which indicates the mac filter's VLAN status.
The current implementation for handling the removal of VLANs is wrong
for a couple reasons. The first is that when i40e_vsi_kill_vlan
iterates through the MAC filters, it fails to account for the MAC filter
status; i.e. it's not accommodating for filters that are about to be
deleted. The second problem is that MAC filters can be deleted in other
places (specifically i40e_set_rx_mode). Thus if it occurs that all the
VLAN MAC filters get deleted we need to switch out of VLAN mode, but the
code path through i40e_vsi_kill_vlan has already been executed and we're
now stuck in VLAN mode.
This patch fixes the issue by removing the check from i40e_vsi_kill_vlan
and puts the check instead in i40e_sync_vsi_filters where we're
guaranteed to see all filter deletions and can properly detect when we
need to switch out of VLAN mode.
Change-ID: Ib38fe6034b356eee9a0e20b8a9eeed5ff2debcd9
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, we fail to correctly restore filters on the temporary add
list when we fail to allocate memory either for deletion or addition.
Replace calls to "goto out;" with calls to a new location that correctly
handles memory allocation failures.
Note that it is safe for us to call i40e_undo_filter_entries on the
tmp_del_list even after we've deleted filters because at this point it
will be empty, so we don't need to separate the logic for add and
delete failure.
Change-Id: Iee107fd219c6e03e2fd9645c2debf8e8384a8521
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace the mac_filter_list with a static size hash table of 8bits. The
primary advantage of this is a decrease in latency of operations related
to searching for specific MAC filters, including .set_rx_mode. Using
a linked list resulted in several locations which were O(n^2). Using
a hash table should give us latency growth closer to O(n*log(n)).
Change-ID: I5330bd04053b880e670210933e35830b95948ebb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
When inside a loop where we call i40e_del_filter we use an O(n^2)
pattern where i40e_del_filter calls i40e_find_filter for us. We can
avoid this O(n^2) logic by factoring a function, __i40e_del_filter() out
from the i40e_del_filter code. This allows us to re-use the delete logic
where appropriate without having to search for the filter twice.
This new function benefits several functions including i40e_vsi_add_vlan,
i40e_vsi_kill_vlan, i40e_del_mac_vlan_all, and i40e_vsi_release.
Change-ID: I75fabe0f53bf73f56b80d342e5fdcfcc28f4d3eb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When adding new MAC address filters, the driver determines if it should
behave in VLAN mode (where all MAC addresses get assigned to every
existing VLAN) or in non-VLAN mode where MAC addresses get assigned the
VLAN_ANY identifier. Under some circumstances it is possible that a VLAN
has been marked for removal (such that all filters of that VLAN are set
to I40E_FILTER_REMOVE), and a subsequent call to i40e_put_mac_in_vlan
may occur prior to the driver subtask that syncs filters to the
hardware.
In this case, we may add filters to the new removed VLAN, even though it
should have been removed. This is most obvious when first adding a new
VLAN. We will delete all filters which are in I40E_VLAN_ANY (-1) and
then re-add them as in VLAN 0 (untagged). Then before we sync filters,
we will add new MAC address filter, which will be added to every VLAN
that exists. Unfortunately, this will include I40E_VLAN_ANY, so we will
end up incorrectly adding filters to the -1 VLAN. This can be fixed by
simply skipping all filters which are marked for removal.
A similar check is not necessary in i40e_del_mac_all_vlan, since we are
deleting, and any filter which we find already marked for removal would
simply be deleted again, which doesn't cause any issues.
Change-Id: I7962154013ce02fe950584690aeeb3ed853d0086
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a PVID has been assigned to a VSI, the function
i40e_put_mac_in_vlan arbitrarily modifies all filters
to have the same VLAN. This is obviously incorrect
because it could be modifying active filters without
putting them into the NEW state. The correct method
is to remove then re-add filters which is already done
in the code where we assign the PVID.
Fix this issue and a few other minor nits at the same
time. First, when we have a PVID don't even bother
looping and simply add the filter with the PVID immediately.
In the case of the loop, we now can remove several checks.
We also don't need to use i40e_find_filter first before
calling i40e_add_filter, since i40e_add_filter implicitly
does a lookup already.
Finally, update the return semantics of this function so
that on failure to add a filter it returns NULL, but on
success, it returns the last filter added. Otherwise,
we're just returning the last filter in the list. An
alternative fix might be to return 0 or an error code,
but this is pretty invasive to every call site.
Change-ID: I2325dfd843aec76d89fb0d7cb0e7c4f290a34840
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A future patch will be modifying these functions and making a call to
a static function which currently is defined after these functions. Move
them in a separate patch to ease review and ensure the moved code is
correct.
Change-ID: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The kernel provides __dev_uc_sync and __dev_mc_sync in order for drivers
which need individual notification of add and delete for each filter.
These functions allow us to vastly simplify our .set_rx_mode handler. We
need to implement two functions for sync and unsync which add and remove
filters respectively.
This change avoids a very complex and inefficient algorithm which
resulted in an abnormal latency for the .set_rx_mode NDO operation. The
resulting code after this change is more readable, more efficient, and
less code.
Due to the callback signature used by these functions we also must
update several other functions to take a const u8 * pointer.
Change-Id: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Originally the is_vf and is_netdev fields were added in order to
distinguish between VF and netdev filters in a single VSI. However, it
can be noted that we use separate VSI for SRIOV VFs and for netdev VSI.
Thus, since a single VSI should only ever have one type of filter, we
can simply remove the checks and remove the typing.
In a similar fashion, we can note that the only remaining way to get
multiple filters of a single type is through a debug command that was
added to debugfs. This command is useless in practice, and results in
causing bugs if we keep counter tracking but lose the is_vf and
is_netdev protections as desired above.
Since the only time we'd actually have a counter value besides 0 and
1 is through use of this debugfs hook, we can remove this unnecessary
command, and the entire counter logic it required.
We vastly simplify mac filters by removing
(a) the distinction between VF and netdev filters
(b) counting logic
(c) the ability to add and remove filters bypassing the stack via debugfs
Change-ID: Idf916dd2a1159b1188ddbab5bef6b85ea6bf27d9
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trival fix, dev_err message is missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.
Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.
This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].
Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.
This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the device, a MID entry represents a group of local ports, which can
later be bound to a MDB entry.
The lookup of an existing MID entry is currently done using the provided
MC MAC address and VID, from the Linux bridge. However, this can result
in an incorrect reuse of the same MID index in different VLAN-unaware
bridges (same IP MC group and VID 0).
Fix this by performing the lookup based on FID instead of VID, which is
unique across different bridges.
Fixes: 3a49b4fde2 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an interface is configured to use Tx/Rx-only queues,
the length of the statistics would be shortened to accomodate only the
statistics required per-each queue, and the values would be provided
accordingly.
However, the strings provided would still contain both Tx and Rx strings
for each one of the queues [regardless of its configuration], which might
lead to out-of-bound access when filling the buffers as well as incorrect
statistics presented.
Fixes: 9a4d7e86ac ("qede: Add support for Tx/Rx-only queues.")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.
The root cause was determined to be from the following series of
events:
1. Guest domain panics - resulting in the guest no longer processing
network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
control due to no more available tx drings and stops the tx queue
for the guest domain
3. The LDC of the network connection for the guest is reset when
the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
the normal method to re-enable the tx queue. But the ACK never comes
because the tx queue was cleared due to the LDC reset.
To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The second SET_NETDEV_DEV() in the hunk should be
removed.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYFfxWAAoJEORje4g2clinr4MQAMUMoO4akLhppyUuLiT2ZZHc
KshFwUF5RnZq6c2qXSxTVhfajyG6q71EwQNGaAMeGsjoCXM+u99Adgp/lYkXxbzY
e4W3gCvjGWIik5d3JRVq07XutVpeJIc/Qvvc5zc1lUYNR5f71iBw538eG9ic4PXi
4/CpRvcsa8Z9sbtKPcjHwQQRd4ewx/KAD6QyOsVz9GgkBeNMYag3SO731DYSkjRC
MYK85arNC1JUE/MHQKIfYQvjiJVfEyt2FvC8v9tW+bhzP6dAzxRY0yd8ZFJtCiYH
GFGy8vdeCA/0dFRD5cYKPKBiFwUbRC8bt2lLC5ZoUic2nZ23LlO67uDkaPDRYckt
oyhErFRX6Q/goqKFCI4tLUoSBF1bhy9EnbWyOWmcW7qpXRD3VCclS0Ctr++yJnv2
bhhlID56f+dX+rnW/OAERrk8MdVHo5xBUzQ8ZAAF3WDP9LqW+qlYVrEvrFqFIeFM
OCGUbW2xsZaHMZRyx0K6068hy8O4EujjgC9PARi65rrZAAwxlDm4ElJEYvzXZVXA
YoMeXiZGrhoj7+h8OorV0TyB+7mUgxFNlq0tCoi193QS+zIuQqf3XYNmuQGCUflF
YdzZs7/9LpANN4e5yDTCq3CIr8yYv9sRrdnSX0iShvbFBAKClLxpjj2xkpayHFVR
8CRvlb5O1v1fIwSn2z/I
=D5Ix
-----END PGP SIGNATURE-----
Merge tag 'shared-for-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Saeed Mahameed says:
====================
Mellanox mlx5 core driver updates 2016-10-25
This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
- SRIOV VF max rate limit support
- mlx5e tc support for FWD rules with counter.
Needed for both net and rdma subsystems.
Updates and Fixes:
From Saeed Mahameed (2):
- mlx5 IB: Skip handling unknown mlx5 events
- Add ConnectX-5 PCIe 4.0 VF device ID
From Artemy Kovalyov (2):
- Update struct mlx5_ifc_xrqc_bits
- Ensure SRQ physical address structure endianness
From Eugenia Emantayev (1):
- Fix length of async_event_mask
New Features:
From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
- Introduce TSAR manipulation firmware commands
- Introduce E-switch QoS management
- Add SRIOV VF max rate configuration support
From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
- Don't unlock fte while still using it
- Use fte status to decide on firmware command
- Refactor find_flow_rule
- Group similar rules under the same fte
- Add multi dest support
- Add option to add fwd rule with counter
- mlx5e tc support for FWD rule with counter
Mark here fixed two trivial issues with the flow steering core, and did
some refactoring in the flow steering API to support adding mulit destination
rules to the same hardware flow table entry at once. In the last two patches
added the ability to populate a flow rule with a flow counter to the same flow entry.
V2: Dropped some patches that added new structures without adding any usage of them.
Added SRIOV VF max rate configuration support patch that introduces
the usage of the TSAR infrastructure.
Added flow steering fixes and refactoring in addition to mlx5 tc
support for forward rule with counter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SwitchIB and SwitchIB-2 are Infiniband switches with up to 36 ports. This
driver initialize the hardware and Firmware which implements the IB
management and connection with the SM.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SwitchX-2 is IB capable device. This patch add a support to change the
port type between Ethernet and Infiniband.
When the port is set to IB, the FW implements the Subnet Management Agent
(SMA) manage the port. All port attributes can be control remotely by
the SM.
Usage:
$ devlink port show
pci/0000:03:00.0/1: type eth netdev eth0
pci/0000:03:00.0/3: type eth netdev eth1
pci/0000:03:00.0/5: type eth netdev eth2
pci/0000:03:00.0/6: type eth netdev eth3
pci/0000:03:00.0/8: type eth netdev eth4
$ devlink port set pci/0000:03:00.0/1 type ib
$ devlink port show
pci/0000:03:00.0/1: type ib
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we are about to add Infiniband port remove and create we will add
"eth" prefix to port create and remove APIs.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add "port_type_set" API to mlxsw core. The core layer send the change type
callback to the port along with it's private information.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>