The check drops packets if they need to be routed and their source IP
equals to their destination IP.
Disable the check since the kernel forwards such packets and does not
drop them.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check drops packets if they need to be routed and their multicast
MAC mismatched to their multicast destination IP.
For IPV4:
DMAC is mismatched if it is different from {01-00-5E-0 (25 bits),
DIP[22:0]}
For IPV6:
DMAC is mismatched if it is different from {33-33-0 (16 bits),
DIP[31:0]}
Disable the check since the kernel forwards such packets and does not
drop them.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check drops packets if they need to be routed and their source IP is
from class E, i.e., belongs to 240.0.0.0/4 address range, but different
from 255.255.255.255.
Disable the check since the kernel forwards such packets and does not
drop them.
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6390 family uses an extended register to set the port connected to
the CPU. The lower 5 bits indicate the port, the upper three bits are
the priority of the frames as they pass through the switch, what
egress queue they should use, etc. Since frames being set to the CPU
are typically management frames, BPDU, IGMP, ARP, etc set the priority
to 7, the reset default, and the highest.
Fixes: 33641994a6 ("net: dsa: mv88e6xxx: Monitor and Management tables")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up inconsistent usage of upper and lowercase letters in "Samsung"
name.
"SAMSUNG" is not an abbreviation but a regular trademarked name.
Therefore it should be written with lowercase letters starting with
capital letter.
Although advertisement materials usually use uppercase "SAMSUNG", the
lowercase version is used in all legal aspects (e.g. on Wikipedia and in
privacy/legal statements on
https://www.samsung.com/semiconductor/privacy-global/).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows the creation of the /dev/ptpX device for i225, and reading
and writing the time.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since net_device.mem_start is unsigned long, it should not be cast to
int right before casting to pointer. This fixes warning (compile
testing on alpha architecture):
drivers/net/wan/sdla.c: In function ‘sdla_transmit’:
drivers/net/wan/sdla.c:711:13: warning:
cast to pointer from integer of different size [-Wint-to-pointer-cast]
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware user manual, when hardware reports error
'roc_pkt_without_key_port', the driver should assert function
reset to do the recovery.
So this patch uses HNAE3_FUNC_RESET to replace HNAE3_GLOBAL_RESET.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In hclge_inform_reset_assert_to_vf(), variable reset_type(enum type)
will be copied into msg_data whose size is 2 bytes. Currently, hip08
is a little-endian machine, so the lower two bytes of reset_type will
be copied to msg_data. But when running on a big-endian machine,
msg_data will have a wrong value(the higher two bytes of reset_type).
So this patch modifies the type of reset_type to u16, and adds a
build check in case enum hnae3_reset_type has value larger than
U16_MAX.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some case, the MAC speed get from hardware maybe 0, it should
not be set to mac->speed.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The misc IRQ of all the devices have the same name, so it's
hard to find the right misc IRQ of the device.
This patch modifies the misc IRQ names as "hclge/hclgevf"-misc-
"pci name". And now the IRQ name is not related to net device
name anymore, so change the HNAE3_INT_NAME_LEN to 32 bytes, and
that is enough.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the returned vector_id less than 0, the message should print
out the vector who is getting vector index fail.
So this patch replaces vector_id with vector, and re-format the
message.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When rename the net devices, the IRQ number can not be
fetched by the net device name, because the driver request
the IRQ resources only when the vector resource changed, and
the rename operation did not change the vector resources,
so the IRQ name keeps the previous net device name.
So this patch modifies the name of the TQP IRQ as
"pci driver name"-"pci name"-"TxRx"-"index".
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prevent loss user's IRQ affinity configuration when DOWN,
this patch moves out release/request operation of the vector
handle from net DOWN/UP, just do it when vector resource changes.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds trace support for HNS3 driver. It also declares
some events which could be used to trace the events when a
TX/RX BD is processed, and other events which are related to
the processing of sk_buff, such as TSO, GRO.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Layerscape SoCs traditionally expose the SerDes configuration/status for
Ethernet protocols (PCS for SGMII/USXGMII/10GBase-R etc etc) in a register
format that is compatible with clause 22 or clause 45 (depending on
SerDes protocol). Each MAC has its own internal MDIO bus on which there
is one or more of these PCS's, responding to commands at a configurable
PHY address. The per-port internal MDIO bus (which is just for PCSs) is
totally separate and has nothing to do with the dedicated external MDIO
controller (which is just for PHYs), but the register map for the MDIO
controller is the same.
The VSC9959 (Felix) switch instantiated in the LS1028A is integrated
in hardware with the ENETC PCS of its DSA master, and reuses its MDIO
controller driver, so Felix has been made to depend on it in Kconfig.
+------------------------------------------------------------------------+
| +--------+ GMII (typically disabled via RCW) |
| ENETC PCI | ENETC |--------------------------+ |
| Root Complex | port 3 |-----------------------+ | |
| Integrated +--------+ | | |
| Endpoint | | |
| +--------+ 2.5G GMII | | |
| | ENETC |--------------+ | | |
| | port 2 |-----------+ | | | |
| +--------+ | | | | |
| +--------+ +--------+ |
| | Felix | | Felix | |
| | port 4 | | port 5 | |
| +--------+ +--------+ |
| |
| +--------+ +--------+ +--------+ +--------+ +--------+ +--------+ |
| | ENETC | | ENETC | | Felix | | Felix | | Felix | | Felix | |
| | port 0 | | port 1 | | port 0 | | port 1 | | port 2 | | port 3 | |
+------------------------------------------------------------------------+
| |||| SerDes | |||| |||| |||| |||| |
| +--------+block | +--------------------------------------------+ |
| | ENETC | | | ENETC port 2 internal MDIO bus | |
| | port 0 | | | PCS PCS PCS PCS | |
| | PCS | | | 0 1 2 3 | |
+-----------------|------------------------------------------------------+
v v v v v v
SGMII/ RGMII QSGMII/QSXGMII/4xSGMII/4x1000Base-X/4x2500Base-X
USXGMII/ (bypasses
1000Base-X/ SerDes)
2500Base-X
In the LS1028A SoC described above, the VSC9959 Felix switch is PF5 of
the ENETC root complex, and has 2 BARs:
- BAR 4: the switch's effective registers
- BAR 0: the MDIO controller register map lended from ENETC port 2
(PF2), for accessing its associated PCS's.
This explanation is necessary because the patch does some renaming
"pci_bar" -> "switch_pci_bar" for clarity, which would otherwise appear
a bit obtuse.
The fact that the internal MDIO bus is "borrowed" is relevant because
the register map is found in PF5 (the switch) but it triggers an access
fault if PF2 (the ENETC DSA master) is not enabled. This is not treated
in any way (and I don't think it can be treated).
All of this is so SoC-specific, that it was contained as much as
possible in the platform-integration file felix_vsc9959.c.
We need to parse and pre-validate the device tree because of 2 reasons:
- The PHY mode (SerDes protocol) cannot change at runtime due to SoC
design.
- There is a circular dependency in that we need to know what clause the
PCS speaks in order to find it on the internal MDIO bus. But the
clause of the PCS depends on what phy-mode it is configured for.
The goal of this patch is to make steps towards removing the bootloader
dependency for SGMII PCS pre-configuration, as well as to add support
for monitoring the in-band SGMII AN between the PCS and the system-side
link partner (PHY or other MAC).
In practice the bootloader dependency is not completely removed. U-Boot
pre-programs the PHY address at which each PCS can be found on the
internal MDIO bus (MDEV_PORT). This is needed because the PCS of each
port has the same out-of-reset PHY address of zero. The SerDes register
for changing MDEV_PORT is pretty deep in the SoC (outside the addresses
of the ENETC PCI BARs) and therefore inaccessible to us from here.
Felix VSC9959 and Ocelot VSC7514 are integrated very differently in
their respective SoCs, and for that reason Felix does not use the Ocelot
core library for PHYLINK. On one hand we don't want to impose the
fixed phy-mode limitation to Ocelot, and on the other hand Felix doesn't
need to force the MAC link speed the way Ocelot does, since the MAC is
connected to the PCS through a fixed GMII, and the PCS is the one who
does the rate adaptation at lower link speeds, which the MAC does not
even need to know about. In fact changing the GMII speed for Felix
irrecoverably breaks transmission through that port until a reset.
The pair with ENETC port 3 and Felix port 5 is optional and doesn't
support tagging. When we enable it, swp5 is a regular slave port, albeit
an internal one. The trouble is that it doesn't work, and that is
because the DSA PHYLIB adaptation layer doesn't treat fixed-link slave
ports. So that is yet another reason for wanting to convert Felix to the
native PHYLINK API.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the Felix DSA driver is implementing its own PHYLINK instance due
to SoC differences, it needs access to the few registers that are
common, mainly for flow control.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ocelot switchdev driver and the Felix DSA one need it for different
reasons. Felix (or at least the VSC9959 instantiation in NXP LS1028A) is
integrated with the traditional NXP Layerscape PCS design which does not
support runtime configuration of SerDes protocol. So it needs to
pre-validate the phy-mode from the device tree and prevent PHYLINK from
attempting to change it. For this, it needs to cache it in a private
variable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This increases the MDIO hold time to 5 enet_clk cycles from the previous
value of 0. This is actually the out-of-reset value, that the driver was
previously overwriting with 0. Zero worked for the external MDIO, but
breaks communication with the internal MDIO buses on which the PCS of
ENETC SI's and Felix switch are found.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Within the LS1028A SoC, the register map for the ENETC MDIO controller
is instantiated a few times: for the central (external) MDIO controller,
for the internal bus of each standalone ENETC port, and for the internal
bus of the Felix switch.
Refactoring is needed to support multiple MDIO buses from multiple
drivers. The enetc_hw structure is made an opaque type and a smaller
enetc_mdio_priv is created.
'mdio_base' - MDIO registers base address - is being parameterized, to
be able to work with different MDIO register bases.
The ENETC MDIO bus operations are exported from the fsl-enetc-mdio
kernel object, the same that registers the central MDIO controller (the
dedicated PF). The ENETC main driver has been changed to select it, and
use its exported helpers to further register its private MDIO bus. The
DSA Felix driver will do the same.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some MAC PCS blocks are unable to provide interrupts when their status
changes. As we already have support in phylink for polling status, use
this to provide a hook for MACs to enable polling mode.
The patch idea was picked up from Russell King's suggestion on the macb
phylink patch thread here [0] but the implementation was changed.
Instead of introducing a new phylink_start_poll() function, which would
make the implementation cumbersome for common PHYLINK implementations
for multiple types of devices, like DSA, just add a boolean property to
the phylink_config structure, which is just as backwards-compatible.
https://lkml.org/lkml/2019/12/16/603
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
QSGMII is a SerDes protocol clocked at 5 Gbaud (4 times higher than
SGMII which is clocked at 1.25 Gbaud), with the same 8b/10b encoding and
some extra symbols for synchronization. Logically it offers 4 SGMII
interfaces multiplexed onto the same physical lanes. Each MAC PCS has
its own in-band AN process with the system side of the QSGMII PHY, which
is identical to the regular SGMII AN process.
So allow QSGMII as a valid in-band AN mode, since it is no different
from software perspective from regular SGMII.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are 3 things that are wrong with the DSA deferred xmit mechanism:
1. Its introduction has made the DSA hotpath ever so slightly more
inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs
to be initialized to false for every transmitted frame, in order to
figure out whether the driver requested deferral or not (a very rare
occasion, rare even for the only driver that does use this mechanism:
sja1105). That was necessary to avoid kfree_skb from freeing the skb.
2. Because L2 PTP is a link-local protocol like STP, it requires
management routes and deferred xmit with this switch. But as opposed
to STP, the deferred work mechanism needs to schedule the packet
rather quickly for the TX timstamp to be collected in time and sent
to user space. But there is no provision for controlling the
scheduling priority of this deferred xmit workqueue. Too bad this is
a rather specific requirement for a feature that nobody else uses
(more below).
3. Perhaps most importantly, it makes the DSA core adhere a bit too
much to the NXP company-wide policy "Innovate Where It Doesn't
Matter". The sja1105 is probably the only DSA switch that requires
some frames sent from the CPU to be routed to the slave port via an
out-of-band configuration (register write) rather than in-band (DSA
tag). And there are indeed very good reasons to not want to do that:
if that out-of-band register is at the other end of a slow bus such
as SPI, then you limit that Ethernet flow's throughput to effectively
the throughput of the SPI bus. So hardware vendors should definitely
not be encouraged to design this way. We do _not_ want more
widespread use of this mechanism.
Luckily we have a solution for each of the 3 issues:
For 1, we can just remove that variable in the skb->cb and counteract
the effect of kfree_skb with skb_get, much to the same effect. The
advantage, of course, being that anybody who doesn't use deferred xmit
doesn't need to do any extra operation in the hotpath.
For 2, we can create a kernel thread for each port's deferred xmit work.
If the user switch ports are named swp0, swp1, swp2, the kernel threads
will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15
character length limit on kernel thread names). With this, the user can
change the scheduling priority with chrt $(pidof swp2_xmit).
For 3, we can actually move the entire implementation to the sja1105
driver.
So this patch deletes the generic implementation from the DSA core and
adds a new one, more adequate to the requirements of PTP TX
timestamping, in sja1105_main.c.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I finally found out how the 4 management route slots are supposed to
be used, but.. it's not worth it.
The description from the comment I've just deleted in this commit is
still true: when more than 1 management slot is active at the same time,
the switch will match frames incoming [from the CPU port] on the lowest
numbered management slot that matches the frame's DMAC.
My issue was that one was not supposed to statically assign each port a
slot. Yes, there are 4 slots and also 4 non-CPU ports, but that is a
mere coincidence.
Instead, the switch can be used like this: every management frame gets a
slot at the right of the most recently assigned slot:
Send mgmt frame 1 through S0: S0 x x x
Send mgmt frame 2 through S1: S0 S1 x x
Send mgmt frame 3 through S2: S0 S1 S2 x
Send mgmt frame 4 through S3: S0 S1 S2 S3
The difference compared to the old usage is that the transmission of
frames 1-4 doesn't need to wait until the completion of the management
route. It is safe to use a slot to the right of the most recently used
one, because by protocol nobody will program a slot to your left and
"steal" your route towards the correct egress port.
So there is a potential throughput benefit here.
But mgmt frame 5 has no more free slot to use, so it has to wait until
_all_ of S0, S1, S2, S3 are full, in order to use S0 again.
And that's actually exactly the problem: I was looking for something
that would bring more predictable transmission latency, but this is
exactly the opposite: 3 out of 4 frames would be transmitted quicker,
but the 4th would draw the short straw and have a worse worst-case
latency than before.
Useless.
Things are made even worse by PTP TX timestamping, which is something I
won't go deeply into here. Suffice to say that the fact there is a
driver-level lock on the SPI bus offsets any potential throughput gains
that parallelism might bring.
So there's no going back to the multi-slot scheme, remove the
"mgmt_slot" variable from sja1105_port and the dummy static assignment
made at probe time.
While passing by, also remove the assignment to casc_port altogether.
Don't pretend that we support cascaded setups.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only clk init function in this driver that register a clk is
fu540_c000_clk_init(), and thus we need to unregister the clk when this
driver is removed on that platform. Other init functions, for example
macb_clk_init(), don't register clks and therefore we shouldn't
unregister the clks when this driver is removed. Convert this
registration path to devm so it gets auto-unregistered when this driver
is removed and drop the clk_unregister() calls in driver remove (and
error paths) so that we don't erroneously remove a clk from the system
that isn't registered by this driver.
Otherwise we get strange crashes with a use-after-free when the
devm_clk_get() call in macb_clk_init() calls clk_put() on a clk pointer
that has become invalid because it is freed in clk_unregister().
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Yash Shah <yash.shah@sifive.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: c218ad5590 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch network drivers, phy drivers, and SFP/phylink over to use the
more correct 10GBASE-R, rather than 10GBASE-KR. 10GBASE-KR is backplane
ethernet, which is 10GBASE-R with autonegotiation on top, which our
current usage on the affected platforms does not have.
The only remaining user of PHY_INTERFACE_MODE_10GKR is the Aquantia
PHY, which has a separate mode for 10GBASE-KR.
For Marvell mvpp2, we detect 10GBASE-KR, and rewrite it to 10GBASE-R
for compatibility with existing DT - this is the only network driver
at present that makes use of PHY_INTERFACE_MODE_10GKR.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the netdev ops for managing VFs. Since most of the
management work happens in the NIC firmware, the driver becomes
mostly a pass-through for the network stack commands that want
to control and configure the VFs.
We also tweak ionic_station_set() a little to allow for
the VFs that start off with a zero'd mac address.
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adds new AdminQ calls and their related structs for
supporting PF controls on VFs:
CMD_OPCODE_VF_GETATTR
CMD_OPCODE_VF_SETATTR
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up inconsistent usage of upper and lowercase letters in "Samsung"
name.
"SAMSUNG" is not an abbreviation but a regular trademarked name.
Therefore it should be written with lowercase letters starting with
capital letter.
Although advertisement materials usually use uppercase "SAMSUNG", the
lowercase version is used in all legal aspects (e.g. on Wikipedia and in
privacy/legal statements on
https://www.samsung.com/semiconductor/privacy-global/).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
gpiod_get_from_of_node() is being retired in favor of
[devm_]fwnode_gpiod_get_index(), that behaves similar to
[devm_]gpiod_get_index(), but can work with arbitrary firmware node. It
will also be able to support secondary software nodes.
Let's switch this driver over.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we fail to locate GPIO for any reason other than deferral or
not-found-GPIO, we try to print device tree node info, however if might
be freed already as we called of_node_put() on it.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of fwnode_get_named_gpiod() that I plan to hide away, let's use
the new fwnode_gpiod_get_index() that mimics gpiod_get_index(), but
works with arbitrary firmware node.
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no build time dependency on CONFIG_OF, but we do need to make
sure we gate the initialization of the gpio_chip::of_node member with a
proper check on CONFIG_OF_GPIO. This enables the driver to build on
platforms that do not have CONFIG_OF enabled.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Atomic operations that span cache lines are super-expensive on x86
(not just to the current processor, but also to other processes as all
memory operations are blocked until the operation completes). Upcoming
x86 processors have a switch to cause such operations to generate a #AC
trap. It is expected that some real time systems will enable this mode
in BIOS.
In preparation for this, it is necessary to fix code that may execute
atomic instructions with operands that cross cachelines because the #AC
trap will crash the kernel.
Since "pwol_mask" is local and never exposed to concurrency, there is
no need to set bits in pwol_mask using atomic operations.
Directly operate on the byte which contains the bit instead of using
__set_bit() to avoid any big endian concern due to type cast to
unsigned long in __set_bit().
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Certain drivers will pass gro skbs to udp, at which point the udp driver
simply iterates through them and passes them off to encap_rcv, which is
where we pick up. At the moment, we're not attempting to coalesce these
into bundles, but we also don't want to wind up having cascaded lists of
skbs treated separately. The right behavior here, then, is to just mark
each incoming one as not on a list. This can be seen in practice, for
example, with Qualcomm's rmnet_perf driver.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Yaroslav Furman <yaro330@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before 8b7008620b ("net: Don't copy pfmemalloc flag in __copy_skb_
header()"), the pfmemalloc flag used to be between headers_start and
headers_end, which is a region we clear when preparing the packet for
encryption/decryption. This is a parameter we certainly want to
preserve, which is why 8b7008620b moved it out of there. The code here
was written in a world before 8b7008620b, though, where we had to
manually account for it. This commit brings things up to speed.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_sw_init function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_write_itr function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_assign_vector function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_free_q_vector function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_free_q_vectors function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_irq_disable function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_irq_enable function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_configure_msix function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_set_rx_mode function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_set_interrupt_capability function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_alloc_mapped_page function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_configure function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_set_default_mac_filter function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_power_down_link function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We want to avoid forward-declarations of function if possible.
Rearrange the igc_clean_tx_ring function implementation.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add support for E822 devices
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Coverity reports some of the calls to xdp_rxq_info_reg() as potential
issues, because the driver does not check its return value. However,
those calls are wrapped with "if (!xdp_rxq_info_is_reg(&ring->xdp_rxq))"
and this check alone is enough to be sure that the function will never
fail.
All possible states of xdp_rxq_info are:
- NEW,
- REGISTERED,
- UNREGISTERED,
- UNUSED.
The driver won't mark a queue as UNUSED under no circumstance, so the
return value can be ignored safely.
Add comments for Coverity right above calls to xdp_rxq_info_reg() to
suppress the warnings.
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In ice_xsk_umem(), variable qid which is later used as an array index,
is not validated for a possible boundary exceedance. Because of that,
a calling function might receive an invalid address, which causes
general protection fault when dereferenced.
To address this, add a boundary check to see if qid is greater than the
size of a UMEM array. Also, don't let user change vsi->num_xsk_umems
just by trying to setup a second UMEM if its value is already set up
(i.e. UMEM region has already been allocated for this VSI).
While at it, make sure that ring->zca.free pointer is always zeroed out
if there is no UMEM on a specified ring.
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the case where the hardware gives us a null Rx descriptor, it is
theoretically possible that we could call one of our skb-construction
functions with no data pointer, which would cause a panic.
In real life, this will never happen - we only get null RX
descriptors as the final descriptor in a chain of otherwise-valid
descriptors. When this happens, the skb will be extant and we'll just
call ice_add_rx_frag(), which can deal with empty data buffers.
Unfortunately, Coverity does not have intimate knowledge of our
hardware, so we must add a check here.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Coverity reports an error that is not really an error; suppress it.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Following the changes of commit 12299132b3 ("net: ethernet: intel: Demote
MTU change prints to debug"), change the MTU change message to netdev_dbg()
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently when there are SR-IOV VF(s) and the user does "ip link show <pf
interface>" the VF unicast MAC addresses all show 00:00:00:00:00:00
if the unicast MAC was set via VIRTCHNL (i.e. not administratively set
by the host PF).
This is misleading to the host administrator. Fix this by setting the
VF's dflt_lan_addr.addr when the VF's unicast MAC address is
configured via VIRTCHNL. There are a couple cases where we don't allow
the dflt_lan_addr.addr field to be written. First, If the VF's
pf_set_mac field is true and the VF is not trusted, then we don't allow
the dflt_lan_addr.addr to be modified. Second, if the
dflt_lan_addr.addr has already been set (i.e. via VIRTCHNL).
Also a small refactor was done to separate the flow for add and delete
MAC addresses in order to simplify the logic for error conditions
and set/clear the VF's dflt_lan_addr.addr field.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently the flow for ice_set_vf_link_state() is not configuring link
the same as all other VF link configuration flows. Fix this by only
setting the necessary VF members in ice_set_vf_link_state() and then
call ice_vc_notify_link_state() to actually configure link for the
VF. This made ice_set_pfe_link_forced() unnecessary, so it was
deleted. Also, this commonizes the link flows for the VF to all call
ice_vc_notify_link_state().
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove Rx flex descriptor metadata and flag programming; per specification
these registers cannot be written to as they are read only.
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Check for all unused parameters, if ethtool sent one of them,
print info about that and return error.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
After each rebuild driver deallocates q_vectors, so the interrupt
throttle rate (ITR) settings get lost.
Create a function to save and restore ITR for each queue. If a user
increases the number of queues, restore all the previous queue
settings for each existing queue, and the additional queues will
get the default setting.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the user sets itr_setting to zero from ethtool -C, the driver changes
this value to default in ice_cfg_itr (for example after changing ring
param). Remove code that sets default value in ice_cfg_itr and move it to
place where the driver allocates q_vectors.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently we do "for (i = 0; i < pf->num_alloc_vfs; i++)" all over the
place. Many other places use macros to contain this repeated for loop,
So create the macro ice_for_each_vf(pf, i) that does the same thing.
There were a couple places we were using one loop variable and a VF
iterator, which were changed to using a local variable within the
ice_for_each_vf() macro.
Also in ice_alloc_vfs() we were setting pf->num_alloc_vfs after doing
"for (i = 0; i < num_alloc_vfs; i++)". Instead assign pf->num_alloc_vfs
right after allocating memory for the pf->vf array.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We can't have more than one default VSI so prevent another VSI from
overwriting the current dflt_vsi. This was achieved by adding the
following functions:
ice_is_dflt_vsi_in_use()
- Used to check if the default VSI is already being used.
ice_is_vsi_dflt_vsi()
- Used to check if VSI passed in is in fact the default VSI.
ice_set_dflt_vsi()
- Used to set the default VSI via a switch rule
ice_clear_dflt_vsi()
- Used to clear the default VSI via a switch rule.
Also, there was no need to introduce any locking because all mailbox
events and synchronization of switch filters for the PF happen in the
service task.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There are many things wrong with the function
ice_set_vf_spoofchk().
1. The VSI being modified is the PF VSI, not the VF VSI.
2. We are enabling Rx VLAN pruning instead of Tx VLAN anti-spoof.
3. The spoofchk setting for each VF is not initialized correctly
or re-initialized correctly on reset.
To fix [1] we need to make sure we are modifying the VF VSI.
This is done by using the vf->lan_vsi_idx to index into the PF's
VSI array.
To fix [2] replace setting Rx VLAN pruning in ice_set_vf_spoofchk()
with setting Tx VLAN anti-spoof.
To Fix [3] we need to make sure the initial VSI settings match what
is done in ice_set_vf_spoofchk() for spoofchk=on. Also make sure
this also works for VF reset. This was done by modifying ice_vsi_init()
to account for the current spoofchk state of the VF VSI.
Because of these changes, Tx VLAN anti-spoof needs to be removed
from ice_cfg_vlan_pruning(). This is okay for the VF because this
is now controlled from the admin enabling/disabling spoofchk. For the
PF, Tx VLAN anti-spoof should not be set. This change requires us to
call ice_set_vf_spoofchk() when configuring promiscuous mode for
the VF which requires ice_set_vf_spoofchk() to move in order to prevent
a forward declaration prototype.
Also, add VLAN 0 by default when allocating a VF since the PF is unaware
if the guest OS is running the 8021q module. Without this, MDD events will
trigger on untagged traffic because spoofcheck is enabled by default. Due
to this change, ignore add/delete messages for VLAN 0 from VIRTCHNL since
this is added/deleted during VF initialization/teardown respectively and
should not be modified.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on the work done by Alex Duyck on other Intel drivers, add code to
support UDP segmentation offload (USO) for the ice driver.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Current code use dma_wmb() to ensure Rx/Tx descriptors are visible
to device before writing to doorbell.
However, these dma_wmb() are wrong and unnecessary. Therefore,
they should be removed.
iowrite32be() called from gve_rx_write_doorbell()/gve_tx_put_doorbell()
should guaratee that all previous writes to WB/UC memory is visible to
device before the write done by iowrite32be().
E.g. On ARM64, iowrite32be() calls __iowmb() which expands to dma_wmb()
and only then calls __raw_writel().
Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel test robot reports a boot failure with qemu in 5.5-rc,
referencing commit 2203cbf2c8 ("net: sfp: move fwnode parsing into
sfp-bus layer"). This is caused by phylink_create() being passed a
NULL fwnode, causing fwnode_property_get_reference_args() to return
-EINVAL.
Don't attempt to attach to a SFP bus if we have no fwnode, which
avoids this issue.
Reported-by: kernel test robot <rong.a.chen@intel.com>
Fixes: 2203cbf2c8 ("net: sfp: move fwnode parsing into sfp-bus layer")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/brocade/bna/bfa_ioc.c: In function
‘bfa_ioc_fwver_clear’:
drivers/net/ethernet/brocade/bna/bfa_ioc.c:1127:13: warning: variable
‘pgoff’ set but not used [-Wunused-but-set-variable]
It is never used, and so can be removed.
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current driver only exists on a non NUMA aware machine.
With 44768decb7 ("page_pool: handle page recycle for NUMA_NO_NODE condition")
applied we can safely change that to NUMA_NO_NODE and accommodate future
NUMA aware hardware using netsec network interface
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEmvEkXzgOfc881GuFWsYho5HknSAFAl4OCPwTHG1rbEBwZW5n
dXRyb25peC5kZQAKCRBaxiGjkeSdIMnWCACpMWqGPtvJPCDyCSqge5ncoWYIIzGX
nncH134TgBpkViYMybYBdHet7RUptJ5ItKVMCYvE9gmK11D1aZ84ylVll8dyz3od
ce9Y1+GK74bF1GXP5DJa+AbeLqFoW6X+iJPUpupCC3VnEnJ418f5R2RoS7LEnlqW
6pxZsylbULlcSxHxuU9Hii5zNtNSrXRZhSfTUsou5bNp3+65XCJ3JVPFc8Kg4iRw
ZrlC2fOKTcDDx53UO/OhPIkfwir9WEHJIVWWw+bm5+yqz8gtdC3hlFXSwK+E0Nuv
5ZQ9Q3adj0xNMRwapFk46GAhOJTPTu5dZm5504AETuFMCSKDRUmVufiU
=faNF
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-5.5-20200102' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2020-01-02
this is a pull request of 9 patches for net/master.
The first 5 patches target all the tcan4x5x driver. The first 3 patches
of them are by Dan Murphy and Sean Nyekjaer and improve the device
initialization (power on, reset and get device out of standby before
register access). The next patch is by Dan Murphy and disables the INH
pin device-state if the GPIO is unavailable. The last patch for the
tcan4x5x driver is by Gustavo A. R. Silva and fixes an inconsistent
PTR_ERR check in the tcan4x5x_parse_config() function.
The next patch is by Oliver Hartkopp and targets the generic CAN device
infrastructure. It ensures that an initialized headroom in outgoing CAN
sk_buffs (e.g. if injected by AF_PACKET).
The last 2 patches are by Johan Hovold and fix the kvaser_usb and gs_usb
drivers by always using the current alternate setting not blindly the
first one.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to dump the FECs registers the clocks have to be ticking,
otherwise a data abort occurs. Add calls to runtime PM so they are
enabled and later disabled.
Fixes: e8fcfcd568 ("net: fec: optimize the clock management to save power")
Reported-by: Chris Healy <Chris.Healy@zii.aero>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before ip_tunnel_ecn_encap() and udp_tunnel_xmit_skb() we should filter
tos value by RT_TOS() instead of using config tos directly.
vxlan_get_route() would filter the tos to fl4.flowi4_tos but we didn't
return it back, as geneve_get_v4_rt() did. So we have to use RT_TOS()
directly in function ip_tunnel_ecn_encap().
Fixes: 206aaafcd2 ("VXLAN: Use IP Tunnels tunnel ENC encap API")
Fixes: 1400615d64 ("vxlan: allow setting ipv6 traffic class")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENETC implement time specific departure capability, which enables
the user to specify when a frame can be transmitted. When this
capability is enabled, the device will delay the transmission of
the frame so that it can be transmitted at the precisely specified time.
The delay departure time up to 0.5 seconds in the future. If the
departure time in the transmit BD has not yet been reached, based
on the current time, the packet will not be transmitted.
This driver was loaded by Qos driver ETF. User could load it by tc
commands. Here are the example commands:
tc qdisc add dev eth0 root handle 1: mqprio \
num_tc 8 map 0 1 2 3 4 5 6 7 hw 1
tc qdisc replace dev eth0 parent 1:8 etf \
clockid CLOCK_TAI delta 30000 offload
These example try to set queue mapping first and then set queue 7
with 30us ahead dequeue time.
Then user send test frame should set SO_TXTIME feature for socket.
There are also some limitations for this feature in hardware:
- Transmit checksum offloads and time specific departure operation
are mutually exclusive.
- Time Aware Shaper feature (Qbv) offload and time specific departure
operation are mutually exclusive.
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use resource_size rather than a verbose computation on
the end and start fields.
The semantic patch that makes these changes is as follows:
(http://coccinelle.lip6.fr/)
<smpl>
@@ struct resource ptr; @@
- (ptr.end + 1 - ptr.start)
+ resource_size(&ptr)
@@ struct resource *ptr; @@
- (ptr->end + 1 - ptr->start)
+ resource_size(ptr)
</smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only the SFC4000 code, now moved to sfc-falcon, needed I2C.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When APP TLV selector 1 (EtherType) is used with PID of 0, the
corresponding entry specifies "default application priority [...] when
application priority is not otherwise specified."
mlxsw currently supports this type of APP entry, but uses it only as a
fallback for unspecified DSCP rules. However non-IP traffic is prioritized
according to port-default priority, not according to the DSCP-to-prio
tables, and thus it's currently not possible to prioritize such traffic
correctly.
Extend the use of the abovementioned APP entry to also set default port
priority.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add QPDP. This register controls the port default Switch Priority and
Color. The default Switch Priority and Color are used for frames where the
trust state uses default values. Currently there are two cases where this
applies: a port is in trust-PCP state, but a packet arrives untagged; and a
port is in trust-DSCP state, but a non-IP packet arrives.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mv88e6xxx_port_set_cmode() relies on cmode stored in struct
mv88e6xxx_port to skip cmode update when the requested value matches the
cached value. It turns out that mv88e6xxx_port_hidden_write() might
change the port cmode setting as a side effect, so we can't rely on the
cached value to determine that cmode update in not necessary.
Force cmode update in mv88e6341_port_set_cmode(), to make
serdes configuration work again. Other mv88e6xxx_port_set_cmode()
callers keep the current behaviour.
This fixes serdes configuration of the 6141 switch on SolidRun Clearfog
GT-8K.
Fixes: 7a3007d22e ("net: dsa: mv88e6xxx: fully support SERDES on Topaz family")
Reported-by: Denis Odintsov <d.odintsov@traviangames.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under load, the RX side of the mscan driver can get stuck while TX still
works. Restarting the interface locks up the system. This behaviour
could be reproduced reliably on a MPC5121e based system.
The patch fixes the return value of the NAPI polling function (should be
the number of processed packets, not constant 1) and the condition under
which IRQs are enabled again after polling is finished.
With this patch, no more lockups were observed over a test period of ten
days.
Fixes: afa17a500a ("net/can: add driver for mscan family & mpc52xx_mscan")
Signed-off-by: Florian Faber <faber@faberman.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Make sure to always use the descriptors of the current alternate setting
to avoid future issues when accessing fields that may differ between
settings.
Signed-off-by: Johan Hovold <johan@kernel.org>
Fixes: d08e973a77 ("can: gs_usb: Added support for the GS_USB CAN devices")
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Make sure to use the current alternate setting when verifying the
interface descriptors to avoid binding to an invalid interface.
Failing to do so could cause the driver to misbehave or trigger a WARN()
in usb_submit_urb() that kernels with panic_on_warn set would choke on.
Fixes: aec5fb2268 ("can: kvaser_usb: Add support for Kvaser USB hydra family")
Cc: stable <stable@vger.kernel.org> # 4.19
Cc: Jimmy Assarsson <extja@kvaser.com>
Cc: Christer Beskow <chbe@kvaser.com>
Cc: Nicklas Johansson <extnj@kvaser.com>
Cc: Martin Henriksson <mh@kvaser.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fix inconsistent IS_ERR and PTR_ERR in tcan4x5x_parse_config().
The proper pointer to be passed as argument is tcan4x5x->device_wake_gpio.
This bug was detected with the help of Coccinelle.
Fixes: 2de4973569 ("can: tcan45x: Make wake-up GPIO an optional GPIO")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the device state GPIO is not connected to the host then disable the
INH output from the TCAN device per section 8.3.5 of the data sheet.
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
It's a good idea to reset a ip-block/spi device before using it, this
patch will reset the device.
And a generic reset function if needed elsewhere.
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The tcan4x5x_parse_config() function now performs action on the device
either reading or writing and a reset. If the devive has a switchable
power supppy (i.e. regulator is managed) it needs to be turned on.
So turn on the regulator if available. If the parsing fails, turn off
the regulator.
Fixes: 2de4973569 ("can: tcan45x: Make wake-up GPIO an optional GPIO")
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The m_can tries to detect if Non ISO Operation is available while in
standby mode, this function results in the following error:
| tcan4x5x spi2.0 (unnamed net_device) (uninitialized): Failed to init module
| tcan4x5x spi2.0: m_can device registered (irq=84, version=32)
| tcan4x5x spi2.0 can2: TCAN4X5X successfully initialized.
When the tcan device comes out of reset it goes in standby mode. The
m_can driver tries to access the control register but fails due to the
device being in standby mode.
So this patch will put the tcan device in normal mode before the m_can
driver does the initialization.
Fixes: 5443c226ba ("can: tcan4x5x: Add tcan4x5x driver to the kernel")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Nyekjaer <sean@geanix.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2019-12-31
This series contains updates to e1000e, igb and igc only.
Robert Beckett provide an igb change to assist in keeping packets from
being dropped due to receive descriptor ring being full when receive
flow control is enabled. Create a separate function to setup SRRCTL to
ease in reuse and ensure that setting of the drop enable bit only if
receive flow control is not enabled.
Sasha adds support for scatter gather support in igc. Improve the
direct memory address mapping flow by optimizing/simplifying and more
clear. Update igc to use pci_release_mem_regions() instead of
pci_release_selected_regions(). Clean up function header comments to
align with the actual code. Adds support for 64 bit DMA access, to help
handle socket buffer fragments in high memory. Adds legacy power
management support in igc by implementing suspend, resume,
runtime_suspend/resume, and runtime_idle callbacks. Clean up references
to Serdes interface in igc since that interface is not supported for
i225 devices.
Alex replaces the pr_info calls with netdev_info in all cases related to
netdev link state, as suggested by Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Serdes interface is not applicable for i225 devices.
Remove this from comments and make comments more clearly.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace the pr_info calls with netdev_info in all cases related to the
netdevice link state.
As a result of this patch the link messages will change as shown below.
Before:
e1000e: ens3 NIC Link is Down
e1000e: ens3 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
After:
e1000e 0000:00:03.0 ens3: NIC Link is Down
e1000e 0000:00:03.0 ens3: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On relevant platforms ndo_start_xmit can handle socket buffer
fragments in high memory
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The function description for igc_alloc_rx_buffers has not reflected
the function meaning. Add meaningful description.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The function description for igc_is_non_eop includes an extra @skb
parameter description. This parameter doesn't exist on the function, so
remove it.
Suggested-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use the pci_release_mem_regions method instead of the
pci_release_selected_regions method
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Improve the probe flow and set both the DMA mask and the coherent
to the same thing. Make the flow optimized and cleared.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Scatter gather is used to do DMA data transfers of data that is written to
noncontiguous areas of memory.
This patch enables scatter gather support.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If Rx flow control has been enabled (via autoneg or forced), packets
should not be dropped due to Rx descriptor ring exhaustion. Instead
pause frames should be used to apply back pressure. This only applies
if VFs are not in use.
Move SRRCTL setup to its own function for easy reuse and only set drop
enable bit if Rx flow control is not enabled.
Since v1: always enable dropping of packets if VFs in use.
Signed-off-by: Robert Beckett <bob.beckett@collabora.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When disabling PTP timestamping, don't reset the switch with the new
static config until all existing PTP frames have been timestamped on the
RX path or dropped. There's nothing we can do with these afterwards.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And move the queue of skb's waiting for RX timestamps into the ptp_data
structure, since it isn't needed if PTP is not compiled.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For first-generation switches (SJA1105E and SJA1105T):
- TPID means C-Tag (typically 0x8100)
- TPID2 means S-Tag (typically 0x88A8)
While for the second generation switches (SJA1105P, SJA1105Q, SJA1105R,
SJA1105S) it is the other way around:
- TPID means S-Tag (typically 0x88A8)
- TPID2 means C-Tag (typically 0x8100)
In other words, E/T tags untagged traffic with TPID, and P/Q/R/S with
TPID2.
So the patch mentioned below fixed VLAN filtering for P/Q/R/S, but broke
it for E/T.
We strive for a common code path for all switches in the family, so just
lie in the static config packing functions that TPID and TPID2 are at
swapped bit offsets than they actually are, for P/Q/R/S. This will make
both switches understand TPID to be ETH_P_8021Q and TPID2 to be
ETH_P_8021AD. The meaning from the original E/T was chosen over P/Q/R/S
because E/T is actually the one with public documentation available
(UM10944.pdf).
Fixes: f9a1a7646c ("net: dsa: sja1105: Reverse TPID and TPID2")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check originates from the initial implementation which was not based
on PTP time but on a standalone clock source. In the meantime we can now
program the PTPSCHTM register at runtime with the dynamic base time
(actually with a value that is 200 ns smaller, to avoid writing DELTA=0
in the Schedule Entry Points Parameters Table). And we also have logic
for moving the actual base time in the future of the PHC's current time
base, so the check for zero serves no purpose, since even if the user
will specify zero, that's not what will end up in the static config
table where the limitation is.
Fixes: 86db36a347 ("net: dsa: sja1105: Implement state machine for TAS with PTP clock source")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When activating tc-taprio offload on the switch ports, the TAS state
machine will try to check whether it is running or not, but will find
both the STARTED and STOPPED bits as false in the
sja1105_tas_check_running function. So the function will return -EINVAL
(an abnormal situation) and the kernel will keep printing this from the
TAS FSM workqueue:
[ 37.691971] sja1105 spi0.1: An operation returned -22
The reason is that the underlying function that gets called,
sja1105_ptp_commit, does not actually do a SPI_READ, but a SPI_WRITE. So
the command buffer remains initialized with zeroes instead of retrieving
the hardware state. Fix that.
Fixes: 41603d78b3 ("net: dsa: sja1105: Make the PTP command read-write")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PTP egress timestamp N must be captured from register PTPEGR_TS[n],
where n = 2 * PORT + TSREG. There are 10 PTPEGR_TS registers, 2 per
port. We are only using TSREG=0.
As opposed to the management slots, which are 4 in number
(SJA1105_NUM_PORTS, minus the CPU port). Any management frame (which
includes PTP frames) can be sent to any non-CPU port through any
management slot. When the CPU port is not the last port (#4), there will
be a mismatch between the slot and the port number.
Luckily, the only mainline occurrence with this switch
(arch/arm/boot/dts/ls1021a-tsn.dts) does have the CPU port as #4, so the
issue did not manifest itself thus far.
Fixes: 47ed985e97 ("net: dsa: sja1105: Add logic for TX timestamping")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'eth_zero_addr()' is already called in the error handling path. This is
harmless, but there is no point in calling it twice, so remove one.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
As per 802.3-2005, Section Two, Annex 28B, Table 28B-2 [1], when
_only_ Rx pause is enabled, both symmetric and asymmetric pause
towards local device must be enabled. Also, firmware returns the local
device's flow control pause params as part of advertised capabilities
and negotiated params as part of current link attributes. So, fix up
ethtool's flow control pause params fetch logic to read from acaps,
instead of linkattr.
[1] https://standards.ieee.org/standard/802_3-2005.html
Fixes: c3168cabe1 ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")
Signed-off-by: Surendra Mobiya <surendra@chelsio.com>
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, VRRP packets and packets that hit exceptions during routing
(e.g., MTU error) are policed using the same policer towards the CPU.
This means, for example, that misconfiguration of the MTU on a routed
interface can prevent VRRP packets from reaching the CPU, which in turn
can cause the VRRP daemon to assume it is the Master router.
Fix this by using a dedicated policer for VRRP packets.
Fixes: 11566d34f8 ("mlxsw: spectrum: Add VRRP traps")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a router interface (RIF) is created the MAC address of the backing
netdev is verified to have the same MSBs as existing RIFs. This is
required in order to avoid changing existing RIF MAC addresses that all
share the same MSBs.
Loopback RIFs are special in this regard as they do not have a MAC
address, given they are only used to loop packets from the overlay to
the underlay.
Without this change, an error is returned when trying to create a RIF
after the creation of a GRE tunnel that is represented by a loopback
RIF. 'rif->dev->dev_addr' points to the GRE device's local IP, which
does not share the same MSBs as physical interfaces. Adding an IP
address to any physical interface results in:
Error: mlxsw_spectrum: All router interface MAC addresses must have the
same prefix.
Fix this by skipping loopback RIFs during MAC validation.
Fixes: 74bc993974 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver wrongly assumes that it is the only entity that can set the
SKBTX_IN_PROGRESS bit of the current skb. Therefore, in the
gfar_clean_tx_ring function, where the TX timestamp is collected if
necessary, the aforementioned bit is used to discriminate whether or not
the TX timestamp should be delivered to the socket's error queue.
But a stacked driver such as a DSA switch can also set the
SKBTX_IN_PROGRESS bit, which is actually exactly what it should do in
order to denote that the hardware timestamping process is undergoing.
Therefore, gianfar would misinterpret the "in progress" bit as being its
own, and deliver a second skb clone in the socket's error queue,
completely throwing off a PTP process which is not expecting to receive
it, _even though_ TX timestamping is not enabled for gianfar.
There have been discussions [0] as to whether non-MAC drivers need or
not to set SKBTX_IN_PROGRESS at all (whose purpose is to avoid sending 2
timestamps, a sw and a hw one, to applications which only expect one).
But as of this patch, there are at least 2 PTP drivers that would break
in conjunction with gianfar: the sja1105 DSA switch and the felix
switch, by way of its ocelot core driver.
So regardless of that conclusion, fix the gianfar driver to not do stuff
based on flags set by others and not intended for it.
[0]: https://www.spinics.net/lists/netdev/msg619699.html
Fixes: f0ee7acfcd ("gianfar: Add hardware TX timestamping support")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/wan/fsl_ucc_hdlc.c: In function ucc_hdlc_irq_handler:
drivers/net/wan/fsl_ucc_hdlc.c:643:23:
warning: variable ut_info set but not used [-Wunused-but-set-variable]
drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_suspend:
drivers/net/wan/fsl_ucc_hdlc.c:880:23:
warning: variable ut_info set but not used [-Wunused-but-set-variable]
drivers/net/wan/fsl_ucc_hdlc.c: In function uhdlc_resume:
drivers/net/wan/fsl_ucc_hdlc.c:925:6:
warning: variable ret set but not used [-Wunused-but-set-variable]
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
GXBB and newer SoCs use the fixed FCLK_DIV2 (1GHz) clock as input for
the m250_sel clock. Meson8b and Meson8m2 use MPLL2 instead, whose rate
can be adjusted at runtime.
So far we have been running MPLL2 with ~250MHz (and the internal
m250_div with value 1), which worked enough that we could transfer data
with an TX delay of 4ns. Unfortunately there is high packet loss with
an RGMII PHY when transferring data (receiving data works fine though).
Odroid-C1's u-boot is running with a TX delay of only 2ns as well as
the internal m250_div set to 2 - no lost (TX) packets can be observed
with that setting in u-boot.
Manual testing has shown that the TX packet loss goes away when using
the following settings in Linux (the vendor kernel uses the same
settings):
- MPLL2 clock set to ~500MHz
- m250_div set to 2
- TX delay set to 2ns on the MAC side
Update the m250_div divider settings to only accept dividers greater or
equal 2 to fix the TX delay generated by the MAC.
iperf3 results before the change:
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 182 MBytes 153 Mbits/sec 514 sender
[ 5] 0.00-10.00 sec 182 MBytes 152 Mbits/sec receiver
iperf3 results after the change (including an updated TX delay of 2ns):
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-10.00 sec 927 MBytes 778 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 927 MBytes 777 Mbits/sec receiver
Fixes: 4f6a71b84e ("net: stmmac: dwmac-meson8b: fix internal RGMII clock configuration")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
If packet checker is enabled in the serdes, then Rx counter registers
start working, and no side effects have been detected.
This patch enables packet checker automatically when powering serdes on,
and exposes Rx counter registers via ethtool statistics interface.
Code partially basded by older attempt by Andrew Lunn.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/amazon/ena/ena_netdev.c: In function ena_xdp_xmit_buff:
drivers/net/ethernet/amazon/ena/ena_netdev.c:316:19: warning:
variable rx_ring set but not used [-Wunused-but-set-variable]
commit 548c4940b9 ("net: ena: Implement XDP_TX action")
left behind this unused variable.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to set variable 'mbus' static
since new value always be assigned before use it.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing NULL to ppp_pernet causes a crash via BUG_ON.
Dereferencing net in net_generic() also has the same effect.
This patch removes the redundant BUG_ON check on the same parameter.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-12-27
The following pull-request contains BPF updates for your *net-next* tree.
We've added 127 non-merge commits during the last 17 day(s) which contain
a total of 110 files changed, 6901 insertions(+), 2721 deletions(-).
There are three merge conflicts. Conflicts and resolution looks as follows:
1) Merge conflict in net/bpf/test_run.c:
There was a tree-wide cleanup c593642c8b ("treewide: Use sizeof_field() macro")
which gets in the way with b590cb5f80 ("bpf: Switch to offsetofend in
BPF_PROG_TEST_RUN"):
<<<<<<< HEAD
if (!range_is_zero(__skb, offsetof(struct __sk_buff, priority) +
sizeof_field(struct __sk_buff, priority),
=======
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, priority),
>>>>>>> 7c8dce4b16
There are a few occasions that look similar to this. Always take the chunk with
offsetofend(). Note that there is one where the fields differ in here:
<<<<<<< HEAD
if (!range_is_zero(__skb, offsetof(struct __sk_buff, tstamp) +
sizeof_field(struct __sk_buff, tstamp),
=======
if (!range_is_zero(__skb, offsetofend(struct __sk_buff, gso_segs),
>>>>>>> 7c8dce4b16
Just take the one with offsetofend() /and/ gso_segs. Latter is correct due to
850a88cc40 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN").
2) Merge conflict in arch/riscv/net/bpf_jit_comp.c:
(I'm keeping Bjorn in Cc here for a double-check in case I got it wrong.)
<<<<<<< HEAD
if (is_13b_check(off, insn))
return -1;
emit(rv_blt(tcc, RV_REG_ZERO, off >> 1), ctx);
=======
emit_branch(BPF_JSLT, RV_REG_T1, RV_REG_ZERO, off, ctx);
>>>>>>> 7c8dce4b16
Result should look like:
emit_branch(BPF_JSLT, tcc, RV_REG_ZERO, off, ctx);
3) Merge conflict in arch/riscv/include/asm/pgtable.h:
<<<<<<< HEAD
=======
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
#define VMALLOC_END (PAGE_OFFSET - 1)
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define BPF_JIT_REGION_SIZE (SZ_128M)
#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
#define BPF_JIT_REGION_END (VMALLOC_END)
/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
#define VMEMMAP_SHIFT \
(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
#define VMEMMAP_END (VMALLOC_START - 1)
#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
#define vmemmap ((struct page *)VMEMMAP_START)
>>>>>>> 7c8dce4b16
Only take the BPF_* defines from there and move them higher up in the
same file. Remove the rest from the chunk. The VMALLOC_* etc defines
got moved via 01f52e16b8 ("riscv: define vmemmap before pfn_to_page
calls"). Result:
[...]
#define __S101 PAGE_READ_EXEC
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC
#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
#define VMALLOC_END (PAGE_OFFSET - 1)
#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
#define BPF_JIT_REGION_SIZE (SZ_128M)
#define BPF_JIT_REGION_START (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
#define BPF_JIT_REGION_END (VMALLOC_END)
/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
#define VMEMMAP_SHIFT \
(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
#define VMEMMAP_SIZE BIT(VMEMMAP_SHIFT)
#define VMEMMAP_END (VMALLOC_START - 1)
#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
[...]
Let me know if there are any other issues.
Anyway, the main changes are:
1) Extend bpftool to produce a struct (aka "skeleton") tailored and specific
to a provided BPF object file. This provides an alternative, simplified API
compared to standard libbpf interaction. Also, add libbpf extern variable
resolution for .kconfig section to import Kconfig data, from Andrii Nakryiko.
2) Add BPF dispatcher for XDP which is a mechanism to avoid indirect calls by
generating a branch funnel as discussed back in bpfconf'19 at LSF/MM. Also,
add various BPF riscv JIT improvements, from Björn Töpel.
3) Extend bpftool to allow matching BPF programs and maps by name,
from Paul Chaignon.
4) Support for replacing cgroup BPF programs attached with BPF_F_ALLOW_MULTI
flag for allowing updates without service interruption, from Andrey Ignatov.
5) Cleanup and simplification of ring access functions for AF_XDP with a
bonus of 0-5% performance improvement, from Magnus Karlsson.
6) Enable BPF JITs for x86-64 and arm64 by default. Also, final version of
audit support for BPF, from Daniel Borkmann and latter with Jiri Olsa.
7) Move and extend test_select_reuseport into BPF program tests under
BPF selftests, from Jakub Sitnicki.
8) Various BPF sample improvements for xdpsock for customizing parameters
to set up and benchmark AF_XDP, from Jay Jayatheerthan.
9) Improve libbpf to provide a ulimit hint on permission denied errors.
Also change XDP sample programs to attach in driver mode by default,
from Toke Høiland-Jørgensen.
10) Extend BPF test infrastructure to allow changing skb mark from tc BPF
programs, from Nikita V. Shirokov.
11) Optimize prologue code sequence in BPF arm32 JIT, from Russell King.
12) Fix xdp_redirect_cpu BPF sample to manually attach to tracepoints after
libbpf conversion, from Jesper Dangaard Brouer.
13) Minor misc improvements from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While testing max vlan configuration on the PF, firmware gets
assert as driver was configuring number of vlans more than what
is supported per port/engine, it was figured out that there is an
implicit vlan (hidden default vlan consuming hardware cam entry resource)
which is configured default for all the clients (PF/VFs) on client_init
ramrod by the adapter implicitly, so when allocating resources among the
PFs this implicit vlan should be considered or total vlan entries should
be reduced by one to accommodate that default/implicit vlan entry.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Although it has same value as MAX_MAC_CREDIT_E2,
use MAX_VLAN_CREDIT_E2 appropriately.
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By re-attaching RX, TX, and CTL rings during connect() rather than
assuming they are freshly allocated (i.e. assuming the counters are zero),
and avoiding forcing state to Closed in netback_remove() it is possible
for vif instances to be unbound and re-bound from and to (respectively) a
running guest.
Dynamic unbind/bind is a highly useful feature for a backend module as it
allows it to be unloaded and re-loaded (i.e. updated) without requiring
domUs to be halted.
This has been tested by running iperf as a server in the test VM and
then running a client against it in a continuous loop, whilst also
running:
while true;
do echo vif-$DOMID-$VIF >unbind;
echo down;
rmmod xen-netback;
echo unloaded;
modprobe xen-netback;
cd $(pwd);
brctl addif xenbr0 vif$DOMID.$VIF;
ip link set vif$DOMID.$VIF up;
echo up;
sleep 5;
done
in dom0 from /sys/bus/xen-backend/drivers/vif to continuously unbind,
unload, re-load, re-bind and re-plumb the backend.
Clearly a performance drop was seen but no TCP connection resets were
observed during this test and moreover a parallel SSH connection into the
guest remained perfectly usable throughout.
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The suspend/resume code for AQR107 works on AQR105 too.
This patch fixes issues with the partner not seeing the link down
when the interface using AQR105 is brought down.
Fixes: bee8259dd3 ("net: phy: add driver for aquantia phy")
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the error path some fragments remain DMA mapped. Adding a fix
that unmaps all the fragments. Rework cleanup path to be simpler.
Fixes: 8151ee88ba ("dpaa_eth: use page backed rx buffers")
Signed-off-by: Madalin Bucur <madalin.bucur@oss.nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On RTL8211F the RX and TX delays (2ns) can be configured in two ways:
- pin strapping (RXD1 for the TX delay and RXD0 for the RX delay, LOW
means "off" and HIGH means "on") which is read during PHY reset
- using software to configure the TX and RX delay registers
So far only the configuration using pin strapping has been supported.
Add support for enabling or disabling the RGMII RX delay based on the
phy-mode to be able to get the RX delay into a known state. This is
important because the RX delay has to be coordinated between the PHY,
MAC and the PCB design (trace length). With an invalid RX delay applied
(for example if both PHY and MAC add a 2ns RX delay) Ethernet may not
work at all.
Also add debug logging when configuring the RX delay (just like the TX
delay) because this is a common source of problems.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RGMII requires a delay of 2ns between the data and the clock signal.
There are at least three ways this can happen. One possibility is by
having the PHY generate this delay.
This is a common source for problems (for example with slow TX speeds or
packet loss when sending data). The TX delay configuration of the
RTL8211F PHY can be set either by pin-strappping the RXD1 pin (HIGH
means enabled, LOW means disabled) or through configuring a paged
register. The setting from the RXD1 pin is also reflected in the
register.
Add debug logging to the TX delay configuration on RTL8211F so it's
easier to spot these issues (for example if the TX delay is enabled for
both, the RTL8211F PHY and the MAC).
This is especially helpful because there is no public datasheet for the
RTL8211F PHY available with all the RX/TX delay specifics.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As explained in previous patches, the driver no longer needs to maintain
a list of identical FIB entries (i.e, same {tb_id, prefix, prefix
length}) and therefore each FIB node can only store one FIB entry.
Remove the FIB entry list and simplify the code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the last patch mlxsw_sp_fib{4,6}_node_entry_link() and
mlxsw_sp_fib{4,6}_node_entry_unlink() are identical and can therefore be
consolidated into the same common function.
Perform the consolidation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Host routes that perform decapsulation of IP in IP tunnels have a
special adjacency entry linked to them. This entry stores information
such as the expected underlay source IP. When the route is deleted this
entry needs to be freed.
The allocation of the adjacency entry happens in
mlxsw_sp_fib4_entry_type_set(), but it is freed in
mlxsw_sp_fib4_node_entry_unlink().
Create a new function - mlxsw_sp_fib4_entry_type_unset() - and free the
adjacency entry there.
This will allow us to consolidate mlxsw_sp_fib{4,6}_node_entry_unlink()
in the next patch.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the driver no longer maintains a list of identical routes there is
no route to promote when a route is deleted.
Remove that code that took care of it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that the networking stack takes care of only notifying the routes of
interest, we do not need to maintain a list of identical routes.
Remove the check that tests if the route is the first route in the FIB
node.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.
Fixes: 826f66b30c ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The burning process requires to perform internal allocations of large
chunks of memory. This memory doesn't need to be contiguous and can be
safely allocated by vzalloc() instead of kzalloc(). This patch changes
such allocation to avoid possible out-of-memory failure.
Fixes: 410ed13cae ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 1588 standard defines one step operation for both Sync and
PDelay_Resp messages. Up until now, hardware with P2P one step has
been rare, and kernel support was lacking. This patch adds support of
the mode in anticipation of new hardware developments.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When parsing a PHY node, register its time stamper, if any, and attach
the instance to the PHY device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
While PHY time stamping drivers can simply attach their interface
directly to the PHY instance, stand alone drivers require support in
order to manage their services. Non-PHY MII time stamping drivers
have a control interface over another bus like I2C, SPI, UART, or via
a memory mapped peripheral. The controller device will be associated
with one or more time stamping channels, each of which sits snoops in
on a MII bus.
This patch provides a glue layer that will enable time stamping
channels to find their controlling device.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the stack supports time stamping in PHY devices. However,
there are newer, non-PHY devices that can snoop an MII bus and provide
time stamps. In order to support such devices, this patch introduces
a new interface to be used by both PHY and non-PHY devices.
In addition, the one and only user of the old PHY time stamping API is
converted to the new interface.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
An upcoming patch will change how the PHY time stamping functions are
registered with the networking stack, and adapting this driver would
entail adding forward declarations for four time stamping methods.
However, forward declarations are considered to be stylistic defects.
This patch avoids the issue by moving the probe and remove methods
immediately above the phy_driver interface structure.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netcp_ethss driver tests fields of the phy_device in order to
determine whether to defer to the PHY's time stamping functionality.
This patch replaces the open coded logic with an invocation of the
proper methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The macvlan layer tests fields of the phy_device in order to determine
whether to invoke the PHY's tsinfo ethtool callback. This patch
replaces the open coded logic with an invocation of the proper
methods.
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the new notifications mlxsw does not need to handle identical
routes itself, as this is taken care of by the core IPv6 code.
Instead, mlxsw only needs to take care of inserting and removing routes
from the device.
Convert mlxsw to use the new IPv6 route notifications and simplify the
code.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When do IPv6 tunnel PMTU update and calls __ip6_rt_update_pmtu() in the end,
we should not call dst_confirm_neigh() as there is no two-way communication.
Although GTP only support ipv4 right now, and __ip_rt_update_pmtu() does not
call dst_confirm_neigh(), we still set it to false to keep consistency with
IPv6 code.
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MTU update code is supposed to be invoked in response to real
networking events that update the PMTU. In IPv6 PMTU update function
__ip6_rt_update_pmtu() we called dst_confirm_neigh() to update neighbor
confirmed time.
But for tunnel code, it will call pmtu before xmit, like:
- tnl_update_pmtu()
- skb_dst_update_pmtu()
- ip6_rt_update_pmtu()
- __ip6_rt_update_pmtu()
- dst_confirm_neigh()
If the tunnel remote dst mac address changed and we still do the neigh
confirm, we will not be able to update neigh cache and ping6 remote
will failed.
So for this ip_tunnel_xmit() case, _EVEN_ if the MTU is changed, we
should not be invoking dst_confirm_neigh() as we have no evidence
of successful two-way communication at this point.
On the other hand it is also important to keep the neigh reachability fresh
for TCP flows, so we cannot remove this dst_confirm_neigh() call.
To fix the issue, we have to add a new bool parameter for dst_ops.update_pmtu
to choose whether we should do neigh update or not. I will add the parameter
in this patch and set all the callers to true to comply with the previous
way, and fix the tunnel code one by one on later patches.
v5: No change.
v4: No change.
v3: Do not remove dst_confirm_neigh, but add a new bool parameter in
dst_ops.update_pmtu to control whether we should do neighbor confirm.
Also split the big patch to small ones for each area.
v2: Remove dst_confirm_neigh in __ip6_rt_update_pmtu.
Suggested-by: David Miller <davem@davemloft.net>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify the code by moving the call to rtl_enable_eee() from the
individual PHY configs to rtl8169_init_phy().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to recent changes we don't need the call to rtl_rar_exgmac_set()
and longer at this place. It's called from rtl_rar_set() which is
called in rtl_init_mac_address() and rtl8169_resume().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify and factor out this magic from rtl8168h_2_hw_phy_config()
and name it based on the vendor driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IP fragment is specified through user-defined field as the first
bit of the first user-defined word. We were previously trying to extract
it from the user-defined mask which could not possibly work. The ip_frag
is also supposed to be a boolean, if we do not cast it as such, we risk
overwriting the next fields in CFP_DATA(6) which would render the rule
inoperative.
Fixes: 7318166cac ("net: dsa: bcm_sf2: Add support for ethtool::rxnfc")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>