If two consecutive reads of the counter are the same, it is also
not an overflow. "systimel_1 < systimel_2" should be
"systimel_1 <= systimel_2".
Before the patch, we could perform an *erroneous* correction:
Let's say that systimel_1 == systimel_2 == 0xffffffff.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: intel-wired-lan@lists.osuosl.org
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
"incvalue" variable holds a result of "er32(TIMINCA) &
E1000_TIMINCA_INCVALUE_MASK" and used in "do_div(temp, incvalue)"
as a divisor.
Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For bitshifts, we should make use of the BIT macro when possible, and
ensure that other bitshifts are marked as unsigned. This helps prevent
signed bitshift errors, and ensures similar style.
Make use of GENMASK and the unsigned postfix where BIT() isn't
appropriate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fixed the file to use a consistent ret_val for return value checking.
Signed-off-by: Brian Walsh <brian@walsh.ws>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the issues for disabling auto-negotiation and forcing
speed and duplex settings for the non-copper media.
For non-copper media, e1000_get_settings should return ETH_TP_MDI_INVALID for
eth_tp_mdix_ctrl instead of ETH_TP_MDI_AUTO so subsequent e1000_set_settings
call would not fail with -EOPNOTSUPP.
e1000_set_spd_dplx should not automatically turn autoneg back on for forced
1000 Mbps full duplex settings for non-copper media.
Cc: xe-kernel@external.cisco.com
Cc: Daniel Walker <dwalker@fifo99.com>
Signed-off-by: Steve Shih <sshih@cisco.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Patch 562abd39 "xen-netback: support multiple extra info fragments
passed from frontend" contained a mistake which can result in an in-
correct number of responses being generated when handling errors
encountered when processing packets containing extra info fragments.
This patch fixes the problem.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reported-by: Jan Beulich <JBeulich@suse.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I tried to fix this before, but my previous fix was incomplete
and we can still get the same link error in randconfig builds
because of the way that Kconfig treats the
default y if MVNETA=y && MVNETA_BM_ENABLE
line that does not actually trigger when MVNETA_BM_ENABLE=m,
unlike I intended.
Changing the line to use MVNETA_BM_ENABLE!=n however has
the desired effect and hopefully makes all configurations
work as expected.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 019ded3aa7 ("net: mvneta: bm: clarify dependencies")
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the hardcoded mask 0x00fffff0 with MICREL_PHY_ID_MASK for
better readability.
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a netns leak.
Fixes: 93edb8c7f9 ("gtp: reload GTPv1 header after pskb_may_pull()")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device should be configured by default to VEB once VFs are active.
This changes the configuration of both PFs' and VFs' vports into enabling
tx-switching once sriov is enabled.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows the user to view the VF configuration by observing the PF's
device.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support in `ndo_set_vf_spoofchk' for allowing PF control over
its VF spoof-checking configuration.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support in 2 ndo that allow PF to tweak the VF's view of the
link - `ndo_set_vf_link_state' to allow it a view independent of the PF's,
and `ndo_set_vf_rate' which would allow the PF to limit the VF speed.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows the PF to enforce the VF's mac.
i.e., by using `ip link ... vf <x> mac <value>'.
While a MAC is forced, PF would prevent the VF from configuring any other
MAC.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for PF control over the VF vlan configuration.
I.e., `ip link ... vf <x> vlan <vid>' should now be supported.
1. <vid> != 0 => VF receives [unknowingly] only traffic tagged by
<vid> and tags all outgoing traffic sent by VF with <vid>.
2. <vid> == 0 ==> Remove the pvid configuration, reverting to previous.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a PCI callback for `sriov_configure' and a new PCI device id for
the VF [+ Some minor changes to accomodate differences between PF and VF
at the qede].
Following this, VF creation should be possible and the entire subset of
existing PF functionality that's allow to VFs should be supported.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the VF infrastructure is supposed to offer backward/forward
compatibility, the various types associated with VF<->PF communication
should be aligned across all various platforms that support IOV
on our family of adapters.
This adds a couple of currently missing values, specifically aligning
the enum for the various TLVs possible in the communication between them.
It then adds the PF implementation for some of those missing VF requests.
This support isn't really necessary for the Linux VF as those VFs aren't
requiring it [at least today], but are required by VFs running on other
OSes. LRO is an example of one such configuration.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up to this point, VF and PF communication always originates from VF.
As a result, VF cannot be notified of any async changes, and specifically
cannot be informed of the current link state.
This introduces the bulletin board, the mechanism through which the PF
is going to communicate async notifications back to the VF. basically,
it's a well-defined structure agreed by both PF and VF which the VF would
continuously poll and into which the PF would DMA messages when needed.
[Bulletin board is actually allocated and communicated in previous patches
but never before used]
Based on the bulletin infrastructure, the VF can query its link status
and receive said async carrier changes.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds sufficient changes to allow VFs l2-configuration flows to work.
While the fastpath of the VF and the PF are meant to be exactly the same,
the configuration of the VF is done by the PF.
This diverges all VF-related configuration flows that originate from a VF,
making them pass through the VF->PF channel and adding sufficient logic
on the PF side to support them.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While previous patches have already added the necessary logic to probe
VFs as well as enabling them in the HW, this patch adds the ability to
support VF FLR & SRIOV disable.
It then wraps both flows together into the first IOV callback to be
provided to the protocol driver - `configure'. This would later to be used
to enable and disable SRIOV in the adapter.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the qed VFs for the first time -
The vfs are limited functions, with a very different PCI bar structure
[when compared with PFs] to better impose the related security demands
associated with them.
This patch includes the logic neccesary to allow VFs to successfully probe
[without actually adding the ability to enable iov].
This includes diverging all the flows that would occur as part of the pci
probe of the driver, preventing VF from accessing registers/memories it
can't and instead utilize the VF->PF channel to query the PF for needed
information.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Communication between VF and PF is based on a dedicated HW channel;
VF will prepare a messge, and by signaling the HW the PF would get a
notification of that message existance. The PF would then copy the
message, process it and DMA an answer back to the VF as a response.
The messages themselves are TLV-based - allowing easier backward/forward
compatibility.
This patch adds the infrastructure of the channel on the PF side -
starting with the arrival of the notification and ending with DMAing
the response back to the VF.
It also adds a dummy-response as reference, as it only lays the
groundwork of the communication; it doesn't really add support of any
actual messages.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for a new Kconfig option for qed* driver which would allow
[eventually] the support in VFs.
This patch adds the necessary logic in the PF to learn about the possible
VFs it will have to support [Based on PCI configuration space and HW],
and prepare a database with an entry per-VF as infrastructure for future
interaction with said VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add detection and recovery code when the hardware returned opaque value
does not match the expected consumer index. Once the issue is detected,
we skip the processing of all RX and LRO/GRO packets. These completion
entries are discarded without sending the SKB to the stack and without
producing new buffers. The function will be reset from a workqueue.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a rare hardware bug that can cause a bad opaque value in the RX
or TPA completion. When this happens, the hardware may have used the
same buffer twice for 2 rx packets. In addition, the driver will also
crash later using the bad opaque as the index into the ring.
The rx opaque value is predictable and is always monotonically increasing.
The workaround is to keep track of the expected next opaque value and
compare it with the one returned by hardware during RX and TPA start
completions. If they miscompare, we will not process any more RX and
TPA completions and exit NAPI. We will then schedule a workqueue to
reset the function.
This patch adds the logic to keep track of the next rx consumer index.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If qlcnic_fw_cmd_get_minidump_temp() fails then "fw_dump->tmpl_hdr" is
NULL or possibly freed. It can lead to an oops later.
Fixes: d01a6d3c8a ('qlcnic: Add support to enable capability to extend minidump for iSCSI')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We turn the feature ON, only for servers with PCI BW < MAX LINK BW, as it
helps reducing PCI pressure on weak PCI slots, but it adds some software
overhead.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the MPWQE/Striding RQ default configuration dynamic and not
statically set at compile time. Now at driver load we set
stride size and num strides dynamically.
By default we use same values as before, but when CQE compression
is enabled, we set larger stride size to benefit from CQE
compression for larger packets.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CQE compression feature is meant to save PCIe bandwidth by
compressing few CQEs into smaller amount of bytes on PCIe.
CQE compression can be selectively enabled per CQ. By default
is disabled for now and will be enabled later on.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A switch can export an attached EEPROM using the standard ethtool API.
However the switch itself cannot determine the size of the EEPROM, and
multiple sizes are allowed. Thus a device tree property is supported
to indicate the length of the EEPROM. Parse this property during
device probe, and implement a callback function to retrieve it.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa_switch structure contains a dsa_chip_data member called pd.
However in the rest of the code, pd is used for dsa_platform_data.
This is confusing. Rename it cd, which is already often used in dsa.c
and slave.c for this data type.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch drivers only use the master_dev member for dev_info()
messages. Now that the device is passed to the old style probe, and
new style drivers are probed as true linux drivers, this is no longer
needed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resetting the switch is something the driver does, not the framework.
So move the parsing of this property into the driver.
There are no in kernel users of this property, so moving it does not
break anything. There is however a board which will make use of this
property making its way into the kernel.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow Marvell switches to be mdio devices. Currently the driver just
allocate the private structure and detects what device is on the
bus. Later patches will make them register with the DSA framework.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
All other DSA drivers use _drv_ in there DSA probe function name, thus
allowing for a true linux driver probe function to use the
conventional name. Make mv88e6xxx fit this pattern.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
By initialising immediately it, we don't run the danger of using it
before it is initialised.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switch models have a STU (per VLAN port state database). Add a new
capability flag to switches info, instead of checking their family.
Also if the 6165 family has an STU, it must have a VTU, so add the
MV88E6XXX_FLAG_VTU to its family flags.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both VTU and STU operations use the same routine to access their
(common) data registers, with a different offset.
Add VTU and STU specific read and write functions to the data registers
to abstract the required offset.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.
This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.
dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If phy was suspended and is starting, current driver always enable
phy's interrupts, if phy works in polling, phy can raise unexpected
interrupt which will not be handled, the interrupt will block system
enter suspend again. So interrupts should only be re-enabled if phy
works in interrupt.
Signed-off-by: Shaohui Xie <Shaohui.Xie@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The GTPv1 header flags indicate the presence of optional extensions
after this header. Refresh the pointer to the GTPv1 header as skb->head
might have be reallocated via pskb_may_pull().
Fixes: 459aa660eb ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the one
contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ethtool callbacks {get|set}_link_ksettings are often the same, so
we add two generics functions phy_ethtool_{get|set}_link_ksettings
to avoid writing severals times the same function.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-By: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tx interrupt is of edge type, and in case such interrupt is triggered
while it is masked it will not be handled even after tx interrupts are
re-enabled in the end of NAPI poll.
This will cause tx network to stop in the following scenario:
* Rx is being handled, hence interrupts are masked.
* Tx interrupt is triggered after checking if there is some tx to handle
and before re-enabling the interrupts.
In this situation only rx transaction will release tx requests.
In order to handle the tx that was missed( if there was one ),
a NAPI reschdule was added after enabling the interrupts.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Acked-by: Gilad Ben-Yossef <giladby@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Below is a description of a possible problematic
sequence. CPU-A is sending a frame and CPU-B handles
the interrupt that indicates the frame was sent. CPU-B
reads an invalid value of tx_packet_sent.
CPU-A CPU-B
----- -----
nps_enet_send_frame
.
.
tx_skb = skb
tx_packet_sent = true
order HW to start tx
.
.
HW complete tx
------> get tx complete interrupt
.
.
if(tx_packet_sent == true)
handle tx_skb
end memory transaction
(tx_packet_sent actually
written)
Furthermore there is a dependency between tx_skb and tx_packet_sent.
There is no assurance that tx_skb contains a valid pointer at CPU B
when it sees tx_packet_sent == true.
Solution:
Initialize tx_skb to NULL and use it to indicate that packet was sent,
in this way tx_packet_sent can be removed.
Add a write memory barrier after setting tx_skb in order to make sure
that it is valid before HW is informed and IRQ is fired.
Fixed sequence will be:
CPU-A CPU-B
----- -----
tx_skb = skb
wmb()
.
.
order HW to start tx
.
.
HW complete tx
------> get tx complete interrupt
.
.
if(tx_skb != NULL)
handle tx_skb
tx_skb = NULL
Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Acked-by: Gilad Ben-Yossef <giladby@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJXMGAcAAoJED07qiWsqSVqR7AH/RTuW5SeDFQGI1YK4U6ekrbg
+22EDLyUh+MD/eBKf74C9jciaTnd84PAYCOEBa6rXi/2P1gHMnyEIJOxse/cfgKz
Hf26avGjaTCPS7VFHJeLTSrOlR/Hogl5gp+SEjA4WD1cpr480lS3sgGjax8YTY20
sNl2xJqnFVjkJAa0f7AsmaZRHsyytvPbS5c8z7RuihhX1yamTPm8BKqY7s4oJ83n
Rg2/fXV6O1Dg+p/2qra7kyMGj6wIIXOI9wXPjLNXuR6nqT3vWhGaKy+pkl/Ok2JY
UvwDeb7UvgXcypv5FO3LW9R7vqF5L9ZpqS2XCrlTwoFct7bCOCH1xJFGaXV/Cbo=
=Eipf
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.7-20160509' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2016-05-09
this is a pull request of 12 patches for net-next/master.
Alexander Gerasiov and Nikita Edward Baruzdin each contribute a patch
improving the sja1000 driver. Amitoj Kaur Chawla's patch converts the
mcp251x driver to alloc_workqueue(). A patch by Oliver Hartkopp fixes
the handling of CAN config options. Andreas Gröger improves the error
handling in the janz-ican3 driver. The patch by Maximilian Schneider
for the gs_usb improves probing of the USB driver. Finally there are 6
improvement patches by Marek Vasut for the ifi CAN driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is an initial implementation of a netdev driver for GTP datapath
(GTP-U) v0 and v1, according to the GSM TS 09.60 and 3GPP TS 29.060
standards. This tunneling protocol is used to prevent subscribers from
accessing mobile carrier core network infrastructure.
This implementation requires a GGSN userspace daemon that implements the
signaling protocol (GTP-C), such as OpenGGSN [1]. This userspace daemon
updates the PDP context database that represents active subscriber
sessions through a genetlink interface.
For more context on this tunneling protocol, you can check the slides
that were presented during the NetDev 1.1 [2].
Only IPv4 is supported at this time.
[1] http://git.osmocom.org/openggsn/
[2] http://www.netdevconf.org/1.1/proceedings/slides/schultz-welte-osmocom-gtp.pdf
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reserved fields should be set to zero to avoid exposing
bits from the kernel stack.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following sparse warning:
drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:274:1: warning:
symbol 'socfpga_dwmac_pm_ops' was not declared. Should it be static?
Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow udp and raw sockets to send by oif that is an enslaved interface
versus the l3mdev/VRF device. For example, this allows BFD to use ifindex
from IP_PKTINFO on a receive to send a response without the need to
convert to the VRF index. It also allows ping and ping6 to work when
specifying an enslaved interface (e.g., ping -I swp1 <ip>) which is
a natural use case.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:
IP-Config: Reopening network devices...
genirq: Flags mismatch irq 139. 00000000 (eth0:ch24:emac) vs. 00000000 (eth0:ch24:emac)
ravb e6800000.ethernet eth0: cannot request IRQ eth0:ch24:emac
IP-Config: Failed to open eth0
IP-Config: No network devices available
The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.
However, the real cause is that ravb_close() doesn't release the R-Car
Gen3-specific secondary IRQ.
Add the missing free_irq() call to fix this.
Fixes: 22d4df8ff3 ("ravb: Add support for r8a7795 SoC")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit-bd5a256 introduces a deadlock bug in fjes_change_mtu().
This spin_lock_irqsave() is obviously unnecessary.
This patch eliminates unnecessary spin_lock_irqsave() in
fjes_change_mtu()
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I accidently let Arnd's VXLAN dependency changes slip into net-next,
they are only appropriate for net.
Also the flow steering structural changes to mlx5e_priv got scrambled
during the merge resolution as well.
Fix that all up.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netdevice.h we removed the structure in net-next that is being
changes in 'net'. In macsec.c and rtnetlink.c we have overlaps
between fixes in 'net' and the u64 attribute changes in 'net-next'.
The mlx5 conflicts have to do with vxlan support dependencies.
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all drivers support the same set of functions and the same
setup code, drop every model-specific DSA switch driver and replace them
with a common mv88e6xxx driver.
This merges the info tables into one, removes the function exports, the
model-specific files, and update the defconfigs.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6131 is the only driver to set the tag protocol to DSA_TAG_PROTO_DSA.
Since it works fine with DSA_TAG_PROTO_EDSA, change its value, like all
other mv88e6xxx drivers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Provide a shared mv88e6xxx_setup function to the drivers.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6131 is the only driver which setups the priority of IGMP/MLD snoop
frames and ARP frames to the highest setting. Drop such change until we
figure out a common configuration for all switch models.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All switch models setup the GLOBAL_CONTROL_2 register with slightly
differences.
Since the cascade mode is valid even in a single chip setup, factorize
such configuration.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All switch drivers configure the GLOBAL_MONITOR_CONTROL register with
slightly changes.
Assume the setup of the upstream port, and configure it as the port to
which ingress and egress and ARP monitor frames are to be sent.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6131 switch models have a Core Tag Type register. Their setup code
is setting it to 0x8100, which is the reset default.
Drop this specific part which is correctly configured on reset anyway.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All switch models configure the GLOBAL_CONTROL register with slightly
differences.
Discarding packets with excessive collisions
(GLOBAL_CONTROL_DISCARD_EXCESS) is specific to 6352 and similar
switches, and setting a maximum frame size
(GLOBAL_CONTROL_MAX_FRAME_1632) is specific to 6185 and similar
switches.
As we are centralizing the chips setup, skip these settings and don't
discard any frames yet, until we found out that such discarding by the
hardware is necessary.
Assume a common setup to enable the PHY Polling Unit if present, don't
discard any packets, and mask all interrupt sources.
Tested on 88E6352 and 88E6185.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every driver is calling mv88e6xxx_setup_global after
mv88e6xxx_setup_common. Call the former in the latter.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_PPU_ACTIVE flag to describe how to reset the
switch, and merge the reset call to the common setup code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_ATU flag to identify switch models with an Address
Translation Unit.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_VTU flag to indentify switch models with a VLAN
Table Unit.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MV88E6XXX_FLAG_PORTSTATE and MV88E6XXX_FLAG_VLANTABLE flags to
identify switch models with required 802.1D operations.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only 6131 was not supporting the port registers access yet. Assume such
support and use the unlock access routines in the meantime.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_EEE flag to describe switch models featuring Energy
Efficient Ethernet. Use it to conditionally support such access in the
common code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switch models have a dedicated register for Switch MAC/WoF/WoL.
This register, when present, is used to indirectly set the switch MAC
address, instead of a direct write to 3 global registers.
Identify this feature and share a common mv88e6xxx_set_addr function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MV88E6XXX_FLAG_TEMP and MV88E6XXX_FLAG_TEMP_LIMIT flags to describe
switch models featuring a temperature access. Use them to centralize the
access to the temperature feature.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_EEPROM flag to describe switch models featuring an
EEPROM and distribute the EEPROM access routines to all models.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some switch has dedicated SMI PHY Command and Data registers, used to
indirectly access the PHYs, instead of direct access.
Identify these switch models and make mv88e6xxx_phy_{read,write} generic
enough to support every models.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a MV88E6XXX_FLAG_PPU flag to describe switch models with a PHY
Polling Unit. This allows to merge PPU specific PHY access code in the
share code.
Make the mv88e6xxx_ppu_disable and mv88e6xxx_phy_{read,write}_ppu
functions use unlocked register accesses in order to call them in
mv88e6xxx_phy_{read,write} in a locked context.
Since the PPU code is shared, also remove NET_DSA_MV88E6XXX_NEED_PPU.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a flags bitmap to the info structure in order to identify features
supported or not by the different switch models.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The updated specification for the IFI CANFD core contains description
of more detailed error reporting capability of the core. Implement
support for this detailed error reporting.
Signed-off-by: Marek Vasut <marex@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Only increment the TX counters in the irq handler if a CAN message
was sent. The current code incremented the counters also if the TX
FIFO empty interrupt happened, which is incorrect.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The CAN_CTRLMODE_FD flag is set for both ISO and BOSCH CANFD mode,
while the CAN_CTRLMODE_FD_NON_ISO is additional flag which is only
set for CANFD-BOSCH mode. Fix the handling of the flags to reflect
this.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Reviewed-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
There is no distinction between bittiming constants for the slow and
fast part of the CANFD operation on this controller, so just use one
single bittiming constant set.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The updated documentation regarding the IFI CANFD core from April 2016
adds more details regarding the timing calculation. There is no longer
any distinction in the timing calculation between CANFD and CAN2.0, but
instead there are two timing modes -- 4_12_6_6 and 7_9_8_8 -- where the
numbers mean the width in bits of the SJW/Prescaler/TimeA/TimeB fields.
The code uses 7_9_8_8 mode, which allows more fine-grained control over
the timing.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Start the NAPI polling in case the bus warning interrupt happens,
since it is the poll function which checks and reports the warning.
Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Modified the USB device table to use only the first USB interface, as is
the case with GS USB devices. This allows other GS USB compatible
devices to be more flexible with their remaining interfaces.
Signed-off-by: Maximilian Schneider <max@schneidersoft.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
My patch of May 2015 was missing the changed handling of error
indications. With CAL/CANopen firmware the NMTS-SlaveEventIndication
must be used instead of CAN-EventIndication. An appropriate slave node
must be configured to report the errors.
In our department (about 15 development systems with Janz ICAN3-
modules with firmware 1.48, my system also with firmware ICANOS 1.35)
we use the driver with this patch for about one year: no known problems.
Signed-off-by: Andreas Gröger <andreas24groeger@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
As described in 'can: m_can: tag current CAN FD controllers as non-ISO'
(6cfda7fbeb) it is possible to define fixed configuration options by
setting the according bit in 'ctrlmode' and clear it in 'ctrlmode_supported'.
This leads to the incovenience that the fixed configuration bits can not be
passed by netlink even when they have the correct values (e.g. non-ISO, FD).
This patch fixes that issue and not only allows fixed set bit values to be set
again but now requires(!) to provide these fixed values at configuration time.
A valid CAN FD configuration consists of a nominal/arbitration bittiming, a
data bittiming and a control mode with CAN_CTRLMODE_FD set - which is now
enforced by a new can_validate() function. This fix additionally removed the
inconsistency that was prohibiting the support of 'CANFD-only' controller
drivers, like the RCar CAN FD.
For this reason a new helper can_set_static_ctrlmode() has been introduced to
provide a proper interface to handle static enabled CAN controller options.
Reported-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Cc: <stable@vger.kernel.org> # >= 3.18
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Replace scheduled to be removed create_freezable_workqueue with
alloc_workqueue.
priv->wq should be explicitly set as freezable to ensure it is frozen
in the suspend sequence and work items are drained so that no new work
item starts execution until thawed. Thus, use of WQ_FREEZABLE flag
here is required.
WQ_MEM_RECLAIM flag has been set here to ensure forward progress
regardless of memory pressure.
The order of execution is not important so set @max_active as 0.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
This patch adds support for the Marathon CAN-bus-PCIe card to the
sja1000 driver. For more information see:
http://can.marathon.ru/page/devices/can-bus-pcie
Signed-off-by: Nikita Edward Baruzdin <nebaruzdin@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
According to SJA1000 documentation the location of error is available
regardless of an error type. Therefore it should always be forwarded to
SocketCAN.
Signed-off-by: Nikita Edward Baruzdin <nebaruzdin@lvk.cs.msu.su>
Signed-off-by: Alexander GQ Gerasiov <gq@cs.msu.su>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
VXLAN can be disabled at compile-time or it can be a loadable
module while mlx5 is built-in, which leads to a link error:
drivers/net/built-in.o: In function `mlx5e_create_netdev':
ntb_netdev.c:(.text+0x106de4): undefined reference to `vxlan_get_rx_port'
This avoids the link error and makes the vxlan code optional,
like the other ethernet drivers do as well.
Link: https://patchwork.ozlabs.org/patch/589296/
Fixes: b3f63c3d5e ("net/mlx5e: Add netdev support for VXLAN tunneling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 69976fb104.
We cannot select VXLAN when IPv4 support is disabled, that just gives
us additional build errors, including:
warning: (MLX5_CORE_EN) selects VXLAN which has unmet direct dependencies (NETDEVICES && NET_CORE && INET)
In file included from ../drivers/net/vxlan.c:36:0:
include/net/udp_tunnel.h: In function 'udp_tunnel_handle_offloads':
include/net/udp_tunnel.h:112:9: error: implicit declaration of function 'iptunnel_handle_offloads' [-Werror=implicit-function-declaration]
return iptunnel_handle_offloads(skb, type);
^~~~~~~~~~~~~~~~~~~~~~~~
I'm sending a proper fix for the original bug in a separate patch.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the chip_reset() methods repeat the code writing to the ARSTR register
and delaying for 1 ms, so that we can reuse sh_eth_chip_reset() twice.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sh_eth_chip_reset_giga() doesn't really need to use direct iowrite32() when
writing to the ARSTR register, it can use sh_eth_tsu_write() as all other
chip_reset() methods.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that mdiobus_scan() doesn't return NULL on failure anymore, this driver
no longer needs to check for it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.
IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using ifb+netem on ingress on SIT/IPIP/GRE traffic,
GRO packets are not properly processed.
Segmentation should not be forced, since ifb is already adding
quite a performance hit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If GSO packet is segmented and its segments are properly queued,
we call consume_skb() instead of kfree_skb() to be drop monitor
friendly.
Fixes: 3e4f8b7873 ("macvtap: Perform GSO on forwarding path.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"data_split" was never set to false. It's just uninitialized.
Fixes: 2950219d87 ('qede: Add basic network device support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error handling is broken here. netxen_rom_fast_read() returns zero
on success and -EIO on error. It never returns -1.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My static checker complains that we are using "autoneg" without
initializing it. The problem is the ->phy_read() condition is reversed
so we only set this on error instead of success.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My static checker complained that "v" can be used unintialized if
netxen_rom_fast_read() returns -EIO. That function never actually
returns -1.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When cxgb4 is enabled with CONFIG_CHELSIO_T4_DCB set, VI enable command
gets called with DCB enabled. But when we have a back to back setup with
DCB enabled on one side and non-DCB on the Peer side. Firmware doesn't
send any DCB_L2_CFG, and DCB priority is never set for Tx queue.
But driver resets the queue priority and state machine whenever there
is a link down, this patch fixes it by adding a check to reset only if
cxgb4_dcb_enabled() returns true.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we fail to set the flooding configuration for the broadcast and
unregistered multicast traffic, we should revert the flooding
configuration of the unknown unicast traffic.
Fixes: 0293038e0c ("mlxsw: spectrum: Add support for flood control")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the leave procedure in the error path symmetric to the join
procedure and first remove the port from the collector before
potentially destroying the LAG.
Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them. Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done. This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.
This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Reviewed-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating macvtaps that are expected to have the same ifindex
in different network namespaces, only the first one will succeed.
The others will fail with a sysfs_warn_dup warning due to them trying
to create the following sysfs link (with 'NN' the ifindex of macvtapX):
/sys/class/macvtap/tapNN -> /sys/devices/virtual/net/macvtapX/tapNN
This is reproducible by running the following commands:
ip netns add ns1
ip netns add ns2
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns1
ip link set veth1 netns ns2
ip netns exec ns1 ip l add link veth0 macvtap0 type macvtap
ip netns exec ns2 ip l add link veth1 macvtap1 type macvtap
The last command will fail with "RTNETLINK answers: File exists" (along
with the kernel warning) but retrying it will work because the ifindex
was incremented.
The 'net' device class is isolated between network namespaces so each
one has its own hierarchy of net devices.
This isn't the case for the 'macvtap' device class.
The problem occurs half-way through the netdev registration, when
`macvtap_device_event` is called-back to create the 'tapNN' macvtap
class device under the 'macvtapX' net class device.
This patch adds namespace support to the 'macvtap' device class so
that /sys/class/macvtap is no longer shared between net namespaces.
However, making the macvtap sysfs class namespace-aware has the side
effect of changing /sys/devices/virtual/net/macvtapX/tapNN into
/sys/devices/virtual/net/macvtapX/macvtap/tapNN.
This is due to Commit 24b1442 ("Driver-core: Always create class
directories for classses that support namespaces") and the fact that
class devices supporting namespaces are really not supposed to be placed
directly under other class devices.
To avoid breaking userland, a tapNN symlink pointing to macvtap/tapNN is
created inside the macvtapX directory.
Signed-off-by: Marc Angel <marc@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-05-05
This series contains updates to i40e and i40evf.
The theme behind this series is code reduction, yeah! Jesse provides
most of the changes starting with a refactor of the interpretation of
a tunnel which lets us start using the hardware's parsing. Removed
the packet split receive routine and ancillary code in preparation
for the Rx-refactor. The refactor of the receive routine,
aligns the receive routine with the one in ixgbe which was highly
optimized. The hardware supports a 16 byte descriptor for receive,
but the driver was never using it in production. There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole lot
of complexity while getting rid of the code. Fixed a bug where while
changing the number of descriptors using ethtool, the driver did not
test the limits of the system memory before permanently assuming it
would be able to get receive buffer memory.
Mitch fixes a memory leak of one page each time the driver is opened by
allocating the correct number of receive buffers and do not fiddle with
next_to_use in the VF driver.
Arnd Bergmann fixed a indentation issue by adding the appropriate
curly braces in i40e_vc_config_promiscuous_mode_msg().
Julia Lawall fixed an issue found by Coccinelle, where i40e_client_ops
structure can be const since it is never modified.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tables have to exist for VRFs to function. Ensure they exist
when VRF device is created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qede requires qed to provide enough resources to accommodate 16 combined
channels, but that upper-bound isn't actually being enforced by it.
Instead, qed inform back to qede how many channels can be opened based on
available resources - but that calculation doesn't really take into account
the resources requested by qede; Instead it considers other FW/HW available
resources.
As a result, if a user would increase the number of channels to more than
16 [e.g., using ethtool] the chip would hang.
This change increments the resources requested by qede to 64 combined
channels instead of 16; This value is an upper bound on the possible
available channels [due to other FW/HW resources].
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently had a system crash in the cnic module. Vmcore analysis confirmed
that "ip link up" was executed which failed due to an allocation failure
because of memory fragmentation. Futher analysis revealed that the cnic irq
vector was still allocated after the "ip link up" that failed. When
"ip link down" was executed it called free_msi_irqs() which crashed the system
because the cnic irq was still inuse.
PANIC: "kernel BUG at drivers/pci/msi.c:411!"
The code execution was:
cnic_netdev_event()
if (event == NETDEV_UP) {
.
.
▹ if (!cnic_start_hw(dev))
cnic_start_hw()
calls cnic_cm_open() which failed with -ENOMEM
cnic_start_hw() then took the err1 path:
err1:↩
cp->free_resc(dev);↩ <---- frees resources but not irq vector
pci_dev_put(dev->pcidev);↩
return err;↩
}↩
This returns control back to cnic_netdev_event() but now the cnic irq vector
is still allocated even although cnic_cm_open() failed. The next
"ip link down" while trigger the crash.
The cnic_start_hw() routine is not handling the allocation failure correctly.
Fix this by checking whether CNIC_DRV_STATE_HANDLES_IRQ flag is set indicating
that the hardware has been started in cnic_start_hw(). If it has then call
cp->stop_hw() which frees the cnic irq vector and cnic resources. Otherwise
just maintain the previous behaviour and free cnic resources.
I reproduced this by injecting an ENOMEM error into cnic_cm_alloc_mem()s return
code.
# ip link set dev enpX down
# ip link set dev enpX up <--- hit's allocation failure
# ip link set dev enpX down <--- crashes here
With this patch I confirmed there was no crash in the reproducer.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e_client_ops structure is never modified, so declare it as const.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Newly added code in i40e_vc_config_promiscuous_mode_msg() is indented
in a way that gcc rightly complains about:
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c: In function 'i40e_vc_config_promiscuous_mode_msg':
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1543:4: error: this 'if' clause does not guard... [-Werror=misleading-indentation]
if (f->vlan >= 0 && f->vlan <= I40E_MAX_VLANID)
^~
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c:1550:5: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if'
aq_err = pf->hw.aq.asq_last_status;
From the context, it looks like the aq_err assignment was meant to be
inside of the conditional expression, so I'm adding the appropriate
curly braces now.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5676a8b9cd ("i40e: Add VF promiscuous mode driver support")
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When testing on systems with very limited amounts of RAM, a bug was
found where, while changing the number of descriptors using ethtool,
the driver didn't test the limits of system memory before permanently
assuming it would be able to get receive buffer memory.
Work around this issue by pre-allocation of the receive buffer
memory, in the "ghost" ring, which is then used during reinit
using the new ring length.
Change-Id: I92d7a5fb59a6c884b2efdd1ec652845f101c3359
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Allocate the correct number of RX buffers, and don't fiddle with
next_to_use. The common RX code handles all of this. This fixes a memory
leak of one page each time the driver is opened.
Change-Id: Id06eca353086e084921f047acad28c14745684ee
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The hardware supports a 16 byte descriptor for receive, but the
driver was never using it in production. There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole
lot of complexity while getting rid of the code.
Also since the previous patch made us use no-split mode all the
time, drop any support in the driver for any other value in dtype
and assume it is always zero (aka no-split).
Hooray for code removal!
Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is part 2 of the Rx refactor series, just including
changes to i40evf.
This refactor aligns the receive routine with the one in
ixgbe which was highly optimized. This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.
In order to do this:
- consolidate the receive path into a single function that doesn't
use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove VF ethtool control over packet split
- remove priv_flags interface since it is unused
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.
Some of the split related context set up code stays in
i40e_virtchnl_pf.c in case an older VF driver tries to load
and still wants to use packet split.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is part 1 of the Rx refactor series, just including
changes to i40e.
This refactor aligns the receive routine with the one in
ixgbe which was highly optimized. This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.
In order to do this:
- consolidate the receive path into a single function that doesn't
use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove ethtool control over packet split
Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use htons instead of unconditionally byte swapping nexthdr. On a little
endian systems shifting the byte is correct behavior, but it results in
incorrect csums on big endian architectures.
Fixes: f8c6455bb0 ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Carol Soto <clsoto@us.ibm.com>
Tested-by: Carol Soto <clsoto@us.ibm.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory. On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent. Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent(). Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).
The mlx4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
device is opened.
Prevent Ethernet driver to run this problematic code by forcing it to
allocate contiguous memory. As for the Infiniband driver, at first we
are trying to allocate contiguous memory, but in case of failure roll
back to work with fragmented memory.
Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reported-by: David Daney <david.daney@cavium.com>
Tested-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of the rx-refactor, the dtype variable in the i40e_ring
struct is no longer used, so remove it.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Refactor the interpretation of a tunnel. This removes
some code and lets us start using the hardware's parsing.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-05-04
This series contains updates to ixgbe, ixgbevf and traffic class helpers.
Sridhar adds helper functions to the tc_mirred header to access tcf_mirred
information and then implements them for ixgbe to enable redirection to
a SRIOV VF or an offloaded MACVLAN device queue via tc 'mirred' action.
Amritha adds support to set filters with multiple header fields (L3,L4)
to match on.
KY Srinivasan from Microsoft add Hyper-V support into ixgbevf.
Emil adds 82599 sub-device IDs that were missing from the list of parts
that support WoL. Then simplified the logic we use to determine WoL
support by reading the EEPROM bits for MACs X540 and newer.
Preethi cleaned up duplicate and unused device IDs. Fixed our ethtool
stat reporting where we were ignoring higher 32 bits of stats registers,
so fill out 64 bit stat values into two 32 bit words.
Babu Moger from Oracle improves VF performance issues on SPARC.
Alex Duyck cleans up some of the Hyper-V implementation from KY so that
we can just use function pointers instead of having to identify if a
given VF is running on a Linux or Windows PF.
Usha makes sure that DCB and FCoE is disabled for X550EM_x/a MACs and
cleans up the DCB initialization in the process.
Tony cleans up the API for ixgbevf_update_xcast_mode() so we do not
have to pass in the netdev parameter, since it was never used in the
function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The multicast/all-multicast internal flags are not properly restored
after device reset. This could lead to unreliable multicast operations
after an ethtool configuration change for example.
Call bnxt_mc_list_updated() and setup the vnic->mask in bnxt_init_chip()
to fix the issue.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code determines if the next ring entry is valid before proceeding
further to read the rest of the entry. The CPU can re-order and read
the rest of the entry first, possibly reading a stale entry, if DMA
of a new entry happens right after reading it. This issue can be
readily seen on a ppc64 system, causing it to crash.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kbuild test robot reported a build failure on s390.
While at it, also fix missing conversion in the tilera driver.
Fixes: 9b36627ace ("net: remove dev->trans_start")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the call to fn() fails then "buf" is uninitialized. Just return the
error code in that case.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the fn() calls fail then "buf" is uninitialized. Just return early
in that situation.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I've finally noticed that mdiobus_scan() also returns either NULL or error
value on failure. Return ERR_PTR(-ENODEV) instead of NULL since this is
the error value already filtered out by the callers that want to ignore
the MDIO address scan failure...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
previous patches removed all direct accesses to dev->trans_start,
so change the netif_trans_update helper to update trans_start of
netdev queue 0 instead and then remove trans_start from struct net_device.
AFAICS a lot of the netif_trans_update() invocations are now useless
because they occur in ndo_start_xmit and driver doesn't set LLTX
(i.e. stack already took care of the update).
As I can't test any of them it seems better to just leave them alone.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
a trans_start struct member exists twice:
- in struct net_device (legacy)
- in struct netdev_queue
Instead of open-coding dev->trans_start usage to obtain the current
trans_start value, use dev_trans_start() instead.
This is not exactly the same, as dev_trans_start also considers
the trans_start values of the netdev queues owned by the device
and provides the most recent one.
For legacy devices this doesn't matter as dev_trans_start can cope
with netdev trans_start values of 0 (they are ignored).
This is a prerequisite to eventual removal of dev->trans_start.
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
use net_device directly. Compile tested, objdiff shows no changes.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the receive path a queue's work bit was cleared unconditionally even
if fec_enet_rx_queue only read out a part of the available packets from
the hardware. This resulted in not reading any packets in the next napi
turn and so packets were delayed or lost.
The obvious fix is to only clear a queue's bit when the queue was
emptied.
Fixes: 4d494cdc92 ("net: fec: change data structure to support multiqueue")
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Fugang Duan <fugang.duan@nxp.com>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add support to configure trusted vf attribute through trust_vf_ndo.
- Upon VF trust setting change we update vport context to refresh
allmulti/promisc or any trusted vf attributes that we didn't trust the
VF for before.
- Lock the eswitch state lock on vport event in order to synchronise the
vport context updates , this will prevent contention with vport trust
setting change which will trigger vport mac list update.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add promisc_change as a trigger to vport context change event.
Add set vport promisc/allmulti functions to add vport to promiscuous
flowtable rules.
Upon promisc/allmulti rx mode vf request add the vport to
the relevant promiscuous group (Allmulti/Promisc group) so the relevant
traffic will be forwarded to it.
Upon allmulti vf request add the vport to each existing multicast fdb
rule.
Upon adding/removing mcast address from a vport, update all other
allmulti vports.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add promiscuous and allmulti steering groups in FDB table.
Besides the full match L2 steering rules group, we added
two more groups to catch the "miss" rules traffic:
* Allmulti group: One rule that forwards any mcast traffic coming from
either uplink or VFs/PF vports
* Promisc group: One rule that forwards all unmatched traffic coming
from uplink.
Needed for downstream privileged VF promisc and allmulti support.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the usage of explicit cleanup function and use existing vport
change handler. Calling vport change handler while vport
is disabled will cleanup the vport resources.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable ingress/egress ACL tables only when we need to configure ACL
rules.
Disable ingress/egress ACL tables once all ACL rules are removed.
All VF outgoing/incoming traffic need to go through the ingress/egress ACL
tables.
Adding/Removing these tables on demand will save unnecessary hops in the
flow steering when the ACL tables are empty.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure ingress and egress vport ACL rules according to spoofchk
admin parameters.
Ingress ACL flow table rules:
if (!spoofchk && !vst) allow all traffic.
else :
1) one of the following rules :
* if (spoofchk && vst) allow only untagged traffic with smac=original
mac sent from the VF.
* if (spoofchk && !vst) allow only traffic with smac=original mac sent
from the VF.
* if (!spoofchk && vst) allow only untagged traffic.
2) drop all traffic that didn't hit #1.
Add support for set vf spoofchk ndo.
Add non zero mac validation in case of spoofchk to set mac ndo:
when setting new mac we need to validate that the new mac is
not zero while the spoofchk is on because it is illegal
combination.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configure ingress and egress vport ACL rules according to
vlan and qos admin parameters.
Ingress ACL flow table rules:
1) drop any tagged packet sent from the VF
2) allow other traffic (default behavior)
Egress ACL flow table rules:
1) allow only tagged traffic with vlan_tag=vst_vid.
2) drop other traffic.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create egress/ingress ACLs per VF vport at vport enable.
Ingress ACL:
- one flow group to drop all tagged traffic in VST mode.
Egress ACL:
- one flow group that allows only untagged traffic with
smac that is equals to the original mac (anti-spoofing).
- one flow group that allows only untagged traffic.
- one flow group that allows only smac that is equals
to the original mac (anti-spoofing).
(note: only one of the above group has active rule)
- star rule will be used to drop all other traffic.
By default no rules are generated, unless VST is explicitly requested.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vport spin lock can be replaced with synchronize_irq() in the right
place, this will remove the need of locking inside irq context.
Locking in esw_enable_vport is not required since vport events are yet
to be enabled, and at esw_disable_vport it is sufficient to
synchronize_irq() to guarantee no further vport events handlers will be
scheduled.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>