Inside m_can_chip_config(), when setting up the new value of the CCCR,
the CCCR_NISO bit is not cleared like the others, CCCR_TEST, CCCR_MON,
CCCR_BRSE and CCCR_FDOE, before checking the can.ctrlmode bits for
CAN_CTRLMODE_FD_NON_ISO.
This way once the controller was configured for CAN_CTRLMODE_FD_NON_ISO,
this mode could never be cleared again.
This fix is only relevant for controllers with version 3.1.x or 3.2.x.
Older versions do not support NISO.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The DMA logic in firmwares < v3.3.0 embedded in the PCAN-PCIe FD cards
family is not capable of handling a mix of 32-bit and 64-bit logical
addresses. If the board is equipped with 2 or 4 CAN ports, then such a
situation might lead to a PCIe Bus Error "Malformed TLP" packet
as well as "irq xx: nobody cared" issue.
This patch adds a workaround that requests only 32-bit DMA addresses
when these might be allocated outside of the 4 GB area.
This issue has been fixed in firmware v3.3.0 and next.
Signed-off-by: Stephane Grosjean <s.grosjean@peak-system.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prevent drivers from building on PPC32 if they use isa_bus_to_virt(),
isa_virt_to_bus(), or isa_page_to_bus(), which are not available and
thus cause build errors.
../drivers/net/ethernet/3com/3c515.c: In function 'corkscrew_open':
../drivers/net/ethernet/3com/3c515.c:824:9: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/lance.c: In function 'lance_rx':
../drivers/net/ethernet/amd/lance.c:1203:23: error: implicit declaration of function 'isa_bus_to_virt'; did you mean 'bus_to_virt'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/amd/ni65.c: In function 'ni65_init_lance':
../drivers/net/ethernet/amd/ni65.c:585:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
../drivers/net/ethernet/cirrus/cs89x0.c: In function 'net_open':
../drivers/net/ethernet/cirrus/cs89x0.c:897:20: error: implicit declaration of function 'isa_virt_to_bus'; did you mean 'virt_to_bus'? [-Werror=implicit-function-declaration]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
After device is stopped we reset the rings by moving all free buffers
to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
We then proceed to clear the read/write pointers. This means that if
we try to reset the ring again the code will assume that the next to
fill buffer is at position 0 and swap it with cnt - 1. Since we
previously cleared position cnt - 1 it will lead to leaking the first
buffer and leaving ring in a bad state.
This scenario can only happen if FW communication fails, in which case
the ring will never be used again, so the fact it's in a bad state will
not be noticed. Buffer leak is the only problem. Don't try to move
buffers in the ring if the read/write pointers indicate the ring was
never used or have already been reset.
nfp_net_clear_config_and_disable() is now fully idempotent.
Found by code inspection, FW communication failures are very rare,
and reconfiguring a live device is not common either, so it's unlikely
anyone has ever noticed the leak.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we have offload replay infrastructure added by
commit 326367427c ("net: sched: call reoffload op on block callback reg")
and flows are guaranteed to be removed correctly, we can revert
commit 951a8ee6de ("nfp: reject binding to shared blocks").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously only the neighbour state was checked to decide if an offloaded
entry should be removed. However, there can be situations when the entry
is dead but still marked as valid. This can lead to dead entries not
being removed from fw tables or even incorrect data being added.
Check the entry dead bit before deciding if it should be added to or
removed from fw neighbour tables.
Fixes: 8e6a9046b6 ("nfp: flower vxlan neighbour offload")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Problem:
In vxlan_newlink, a default fdb entry is added before register_netdev.
The default fdb creation function also notifies user-space of the
fdb entry on the vxlan device which user-space does not know about yet.
(RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
This patch fixes the user-space netlink notification ordering issue
with the following changes:
- decouple fdb notify from fdb create.
- Move fdb notify after register_netdev.
- Call rtnl_configure_link in vxlan newlink handler to notify
userspace about the newlink before fdb notify and
hence avoiding the user-space race.
Fixes: afbd8bae9c ("vxlan: add implicit fdb entry for default destination")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new option do_notify to vxlan_fdb_destroy to make
sending netlink notify optional. Used by a later patch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Add new vxlan_fdb_alloc helper
- rename existing vxlan_fdb_create into vxlan_fdb_update:
because it really creates or updates an existing
fdb entry
- move new fdb creation into a separate vxlan_fdb_create
Main motivation for this change is to introduce the ability
to decouple vxlan fdb creation and notify, used in a later patch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Got crash report with following backtrace:
BUG: unable to handle kernel paging request at ffff8801869daffe
RIP: 0010:[<ffffffff816429c4>] [<ffffffff816429c4>] ip6_finish_output2+0x394/0x4c0
RSP: 0018:ffff880186c83a98 EFLAGS: 00010283
RAX: ffff8801869db00e ...
[<ffffffff81644cdc>] ip6_finish_output+0x8c/0xf0
[<ffffffff81644d97>] ip6_output+0x57/0x100
[<ffffffff81643dc9>] ip6_forward+0x4b9/0x840
[<ffffffff81645566>] ip6_rcv_finish+0x66/0xc0
[<ffffffff81645db9>] ipv6_rcv+0x319/0x530
[<ffffffff815892ac>] netif_receive_skb+0x1c/0x70
[<ffffffffc0060bec>] atl1c_clean+0x1ec/0x310 [atl1c]
...
The bad access is in neigh_hh_output(), at skb->data - 16 (HH_DATA_MOD).
atl1c driver provided skb with no headroom, so 14 bytes (ethernet
header) got pulled, but then 16 are copied.
Reserve NET_SKB_PAD bytes headroom, like netdev_alloc_skb().
Compile tested only; I lack hardware.
Fixes: 7b70176421 ("atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring")
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f599c64fdf ("xen-netfront: Fix race between device setup and
open") changed the initialization order: xennet_create_queues() now
happens before we do register_netdev() so using netdev->name in
xennet_init_queue() is incorrect, we end up with the following in
/proc/interrupts:
60: 139 0 xen-dyn -event eth%d-q0-tx
61: 265 0 xen-dyn -event eth%d-q0-rx
62: 234 0 xen-dyn -event eth%d-q1-tx
63: 1 0 xen-dyn -event eth%d-q1-rx
and this looks ugly. Actually, using early netdev name (even when it's
already set) is also not ideal: nowadays we tend to rename eth devices
and queue name may end up not corresponding to the netdev name.
Use nodename from xenbus device for queue naming: this can't change in VM's
lifetime. Now /proc/interrupts looks like
62: 202 0 xen-dyn -event device/vif/0-q0-tx
63: 317 0 xen-dyn -event device/vif/0-q0-rx
64: 262 0 xen-dyn -event device/vif/0-q1-tx
65: 17 0 xen-dyn -event device/vif/0-q1-rx
Fixes: f599c64fdf ("xen-netfront: Fix race between device setup and open")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add MODULE_LICENSE() to net/dsa/realtek.o to fix build warning message.
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/dsa/realtek.o
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As was recently discussed [1], let's avoid casting the const buf in
bonding_sysfs_store_option and use kstrndup/kfree instead.
[1] http://lists.openwall.net/netdev/2018/07/22/25
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
free_irq() waits until all handlers for this IRQ have completed. As the
relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
it might never return if the thread calling free_irq() holds this lock.
For the same reason kthread_cancel_delayed_work_sync() in the polling case
must not hold this lock.
Also first free the irq (or stop the worker respectively) such that
mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
worker thread to call handle_nested_irq(0) which results in a NULL-pointer
exception.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
"imply HWMON" was supposed to ensure that the SFP phy code can be built
with HWMON enabled or disabled while at the same time ensuring that
HWMON is not built as module if SFP is built into the kernel.
Unfortunately, that does not work as intended. With "allmodconfig", it
results in several unrelated HWMON drivers to be disabled instead of
being built as module as expected.
Let's use the old "depends on HWMON || HWMON=n" instead. This is slightly
different (it enforces SFP to be built as module if HWMON is built as
module), but it is better than the alternative of using "IS_REACHABLE()"
in the driver since that would disable sensor support if HWMON is built
as module and SFP is built into the kernel.
Fixes: 1323061a01 ("net: phy: sfp: Add HWMON support for module sensors")
Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use vzalloc instead of the vmalloc, memset combo
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_zalloc_coherent instead of dma_alloc_coherent
followed by memset 0.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The situation described in the comment can occur also with
PHY_IGNORE_INTERRUPT, therefore change the condition to include it.
Fixes: f555f34fdc ("net: phy: fix auto-negotiation stall due to unavailable interrupt")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
FW hsi contains 256 approximation buckets which are split in ramrod into
eight u32 values, but driver is using eight 'unsigned long' variables.
This patch fixes the mcast logic by making the API utilize u32.
Fixes: 83aeb933 ("qed*: Trivial modifications")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a possible race where driver can read link status in mid-transition
and see that virtual-link is up yet speed is 0. Since in this
mid-transition we're guaranteed to see a mailbox from MFW soon, we can
afford to treat this as link down.
Fixes: cc875c2e ("qed: Add link support")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently, MFW publishes EEE capabilities even for Fiber-boards that don't
support them, and later since qed internally sets adv_caps it would cause
link-flap avoidance (LFA) to fail when driver would initiate the link.
This in turn delays the link, causing traffic to fail.
Driver has been modified to not to ask MFW for any EEE config if EEE isn't
to be enabled.
Fixes: 645874e5 ("qed: Add support for Energy efficient ethernet.")
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some time now, if you load the bonding driver and configure bond
parameters via sysfs using minimal config options, such as specifying
nothing but the mode, relying on defaults for everything else, modes
that cannot use arp monitoring (802.3ad, balance-tlb, balance-alb) all
wind up with both arp_interval=0 (as it should be) and miimon=0, which
means the miimon monitor thread never actually runs. This is particularly
problematic for 802.3ad.
For example, from an LNST recipe I've set up:
$ modprobe bonding max_bonds=0"
$ echo "+t_bond0" > /sys/class/net/bonding_masters"
$ ip link set t_bond0 down"
$ echo "802.3ad" > /sys/class/net/t_bond0/bonding/mode"
$ ip link set ens1f1 down"
$ echo "+ens1f1" > /sys/class/net/t_bond0/bonding/slaves"
$ ip link set ens1f0 down"
$ echo "+ens1f0" > /sys/class/net/t_bond0/bonding/slaves"
$ ethtool -i t_bond0"
$ ip link set ens1f1 up"
$ ip link set ens1f0 up"
$ ip link set t_bond0 up"
$ ip addr add 192.168.9.1/24 dev t_bond0"
$ ip addr add 2002::1/64 dev t_bond0"
This bond comes up okay, but things look slightly suspect in
/proc/net/bonding/t_bond0 output:
$ grep -i mii /proc/net/bonding/t_bond0
MII Status: up
MII Polling Interval (ms): 0
MII Status: up
MII Status: up
Now, pull a cable on one of the ports in the bond, then reconnect it, and
you'll see:
Slave Interface: ens1f0
MII Status: down
Speed: 1000 Mbps
Duplex: full
I believe this became a major issue as of commit 4d2c0cda07, which for
802.3ad bonds, sets slave->link = BOND_LINK_DOWN, with a comment about
relying on link monitoring via miimon to set it correctly, but since the
miimon work queue never runs, the link just stays marked down.
If we simply tweak bond_option_mode_set() slightly, we can check for the
non-arp modes having no miimon value set, and insert BOND_DEFAULT_MIIMON,
which gets things back in full working order. This problem exists as far
back as 4.14, and might be worth fixing in all stable trees since, though
the work-around is to simply specify an miimon value yourself.
Reported-by: Bob Ball <ball@umich.edu>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbT+cLAAoJEEg/ir3gV/o+I5QH/3LQemGzH33iNsg4khpPeNA+
Q4mGd2jqbwfL17FTSGpTsPje6rpwzR+j8W1fGTx1vzYmE79ZyDu4EwHS7YZJcGyz
q8P0HgrUe4NrJV8mlOpbIRbTuSwfqultw2qRpmCfLf5kK1nqSIPpUHIfBUMqwy0o
O7GJrytUI4Av+r5Px/6bjb5kBaVe5YBe0tg8nSrN2vtzHVQWm+5/uaNRW2SrCN+4
5SI2AsWyMwfGCC+IE8i9OlIFCy6Iu2vwcUabK+6EeGKP4Wb6rukyG01TkQPSd7gy
ozcAjvj+ppHmVFath1uzLCFU3RbKt6GbVRGaFQg5jO5vvK3uzFJnm59Vqw/WzNs=
=UXsy
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2018-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2018-07-18
The following series provides fixes to mlx5 core and net device driver.
Please pull and let me know if there's any problem.
For -stable v4.7
net/mlx5e: Don't allow aRFS for encapsulated packets
net/mlx5e: Fix quota counting in aRFS expire flow
For -stable v4.15
net/mlx5e: Only allow offloading decap egress (egdev) flows
net/mlx5e: Refine ets validation function
net/mlx5: Adjust clock overflow work period
For -stable v4.17
net/mlx5: E-Switch, UBSAN fix undefined behavior in mlx5_eswitch_mode
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the SPDX identifiers to HNS3 PF driver.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct hclge_desc_cb and hclge_desc_cb are never used in
anywhere. This patch removes them.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The input parameter "dev" of hns3_irq_handle() is indeed
used as a tqp vector, it is misleadin.
The struct member "flag" is used to indicate ring type,
so rename it.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use BIT() and GENMASK() to convert the bit mask, modify
the inconsistent ones, and remove useless ones.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using hex for bit offsets is inconsistent with the rest
of the file. Change them to decimal.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes some comment spelling errors, removes
redundant comments, rewrites misleading comments, and
adds some necessary comments.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove extra space and brackets.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apply the standard minor cleanup by returning ret outside
the brackets.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove some redundant assignments, because they have
been set to zero when allocate hdev.
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-20
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add sharing of BPF objects within one ASIC: this allows for reuse of
the same program on multiple ports of a device, and therefore gains
better code store utilization. On top of that, this now also enables
sharing of maps between programs attached to different ports of a
device, from Jakub.
2) Cleanup in libbpf and bpftool's Makefile to reduce unneeded feature
detections and unused variable exports, also from Jakub.
3) First batch of RCU annotation fixes in prog array handling, i.e.
there are several __rcu markers which are not correct as well as
some of the RCU handling, from Roman.
4) Two fixes in BPF sample files related to checking of the prog_cnt
upper limit from sample loader, from Dan.
5) Minor cleanup in sockmap to remove a set but not used variable,
from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netlink policy structure can be constant like other
drivers.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the ethtool callbacks for querying sfp/eeprom module.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds qed APIs for reading the PHY module.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The calculation of "wqe_size" is not correct when the tx queue is busy in
hinic_xmit_frame().
When there are no free WQEs, the tx flow will unmap the skb buffer, then
ring the doobell for the pending packets. But the "wqe_size" which used
to calculate the doorbell address is not correct. The wqe size should be
cleared to 0, otherwise, it will cause a doorbell error.
This patch fixes the problem.
Reported-by: Zhou Wang <wangzhou1@hisilicon.com>
Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
"Lots of fixes, here goes:
1) NULL deref in qtnfmac, from Gustavo A. R. Silva.
2) Kernel oops when fw download fails in rtlwifi, from Ping-Ke Shih.
3) Lost completion messages in AF_XDP, from Magnus Karlsson.
4) Correct bogus self-assignment in rhashtable, from Rishabh
Bhatnagar.
5) Fix regression in ipv6 route append handling, from David Ahern.
6) Fix masking in __set_phy_supported(), from Heiner Kallweit.
7) Missing module owner set in x_tables icmp, from Florian Westphal.
8) liquidio's timeouts are HZ dependent, fix from Nicholas Mc Guire.
9) Link setting fixes for sh_eth and ravb, from Vladimir Zapolskiy.
10) Fix NULL deref when using chains in act_csum, from Davide Caratti.
11) XDP_REDIRECT needs to check if the interface is up and whether the
MTU is sufficient. From Toshiaki Makita.
12) Net diag can do a double free when killing TCP_NEW_SYN_RECV
connections, from Lorenzo Colitti.
13) nf_defrag in ipv6 can unnecessarily hold onto dst entries for a
full minute, delaying device unregister. From Eric Dumazet.
14) Update MAC entries in the correct order in ixgbe, from Alexander
Duyck.
15) Don't leave partial mangles bpf program in jit_subprogs, from
Daniel Borkmann.
16) Fix pfmemalloc SKB state propagation, from Stefano Brivio.
17) Fix ACK handling in DCTCP congestion control, from Yuchung Cheng.
18) Use after free in tun XDP_TX, from Toshiaki Makita.
19) Stale ipv6 header pointer in ipv6 gre code, from Prashant Bhole.
20) Don't reuse remainder of RX page when XDP is set in mlx4, from
Saeed Mahameed.
21) Fix window probe handling of TCP rapair sockets, from Stefan
Baranoff.
22) Missing socket locking in smc_ioctl(), from Ursula Braun.
23) IPV6_ILA needs DST_CACHE, from Arnd Bergmann.
24) Spectre v1 fix in cxgb3, from Gustavo A. R. Silva.
25) Two spots in ipv6 do a rol32() on a hash value but ignore the
result. Fixes from Colin Ian King"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (176 commits)
tcp: identify cryptic messages as TCP seq # bugs
ptp: fix missing break in switch
hv_netvsc: Fix napi reschedule while receive completion is busy
MAINTAINERS: Drop inactive Vitaly Bordug's email
net: cavium: Add fine-granular dependencies on PCI
net: qca_spi: Fix log level if probe fails
net: qca_spi: Make sure the QCA7000 reset is triggered
net: qca_spi: Avoid packet drop during initial sync
ipv6: fix useless rol32 call on hash
ipv6: sr: fix useless rol32 call on hash
net: sched: Using NULL instead of plain integer
net: usb: asix: replace mii_nway_restart in resume path
net: cxgb3_main: fix potential Spectre v1
lib/rhashtable: consider param->min_size when setting initial table size
net/smc: reset recv timeout after clc handshake
net/smc: add error handling for get_user()
net/smc: optimize consumer cursor updates
net/nfc: Avoid stalls when nfc_alloc_send_skb() returned NULL.
ipv6: ila: select CONFIG_DST_CACHE
net: usb: rtl8150: demote allmulti message to dev_dbg()
...
We get egress rules through the egdev mechanism when the ingress device
is not supporting offload, with the expected use-case of tunnel decap
ingress rule set on shared tunnel device.
Make sure to offload egress/egdev rules only if decap action (tunnel key
unset) exists there and err otherwise.
Fixes: 717503b9cf ("net: sched: convert cls_flower->egress_dev users to tc_setup_cb_egdev infra")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix bad alignment of SQ buffer in fragmented QP allocation.
It should start directly after RQ buffer ends.
Take special care of the end case where the RQ buffer does not occupy
a whole page. RQ size is a power of two, so would be the case only for
small RQ sizes (RQ size < PAGE_SIZE).
Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
and for npages value of RQ and SQ.
Fixes: 3a2f703312 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The flow counters binding support commit introduced a code change where
none NULL 'rule_dest' is always passed to mlx5_add_flow_rules, this breaks
'DON'T_TRAP' rules insertion.
The fix uses the equivalent 'dest_num' value instead of dest pointer
at the failed check.
fixes: 3b3233fbf0 ('IB/mlx5: Add flow counters binding support')
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Driver is yet to support aRFS for encapsulated packets, return early
error in such case.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Quota should follow the amount of rules which do expire, and not the
number of rules that were examined, fixed that.
Fixes: 18c908e477 ("net/mlx5e: Add accelerated RFS support")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When driver converts HW timestamp to wall clock time it subtracts
the last saved cycle counter from the HW timestamp and converts the
difference to nanoseconds.
The conversion is done by multiplying the cycles difference with the
clock multiplier value as a first step and therefore the cycles
difference should be small enough so that the multiplication product
doesn't exceed 64bit.
The overflow handling routine is in charge of updating the last saved
cycle counter in driver and it is called periodically using kernel
delayed workqueue.
The delay period for this work is calculated using the max HW cycle
counter value (a 41 bit mask) as a base which doesn't take the 64bit
limit into account so the delay period may be incorrect and too
long to prevent a large difference between the HW counter and the last
saved counter in SW.
This change adjusts the work period for the HW clock overflow work by
taking the minimum between the previous value and the quotient of max
u64 value and the clock multiplier value.
Fixes: ef9814deaf ("net/mlx5e: Add HW timestamping (TS) support")
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Removed an error message received when configuring ETS total
bandwidth to be zero.
Our hardware doesn't support such configuration, so we shall
reject it in the driver. Nevertheless, we removed the error message
in order to eliminate error messages caused by old userspace tools
who try to pass such configuration.
Fixes: ff0891915c ("net/mlx5e: Fix ETS BW check")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
If out ring is full temporarily and receive completion cannot go out,
we may still need to reschedule napi if certain conditions are met.
Otherwise the napi poll might be stopped forever, and cause network
disconnect.
Fixes: 7426b1a518 ("netvsc: optimize receive completions")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dependencies on PCI where necessary.
Fixes: 7e2bc7fb65 ("net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In cases the probing fails the log level of the messages should
be an error.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the SPI thread is not running, a simple reset of sync
state won't fix the transmit timeout. We also need to wake up the kernel
thread.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: ed7d42e24e ("net: qca_spi: fix transmit queue timeout handling")
Signed-off-by: David S. Miller <davem@davemloft.net>
As long as the synchronization with the QCA7000 isn't finished, we
cannot accept packets from the upper layers. So let the SPI thread
enable the TX queue after sync and avoid unwanted packet drop.
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Fixes: 291ab06ecf ("net: qualcomm: new Ethernet over SPI driver for QCA7000")
Signed-off-by: David S. Miller <davem@davemloft.net>
For slow processors using bit-banging MDIO, 20ms can be too short a
timeout when waiting for the transmit timestamp to become
available. Double it to 40ms.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the 6352 and newer switches, the PTP Ethertype defaults to
ETH_P_1588. Hence it was not explicitly set. The 6165 however defaults
to 0. So explicitly set the EtherType.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6165 family supports a more restricted version of hardware time
stamps. Only L2 PTP is supported. All ports have to use the same
EtherType, and transport spec configuration. PTP can only be
enabled/disabled globally, not per port.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6165 only supports layer L2 PTP, where as the more modern devices
also support UDP and UDPv6, i.e. L4. Abstract the supported receive
filters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 6165 family does not have per port PTP control registers. Also, it
places the timestamp data in different registers. Abstract the current
implementation of 6352 compatible PTP devices so that 6165 can be
added.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv88e6165 family has its global clock in the PTP global
registers. It does not support any form of PTP events. Add a function
to read the clock, fill in an ops structure, and register it with the
two members of the family.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MV88E6165 PTP registers are all in AVB bank F, unlike newer
generations which spread them over AVB bank E and F. Implement AVB ops
for the MV88E6165 which hides this difference.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv88e6165 family supports PTP, but its registers use a different
layout to the currently supported devices. Abstract accessing the PTP
registers into a set of ops, so making space for a second
implementation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current description did not include new devices. Fix that by proving the
correct generic description.
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
This makes it more readable that rule is being used to return an err.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When translating command opcodes to a string, SET_DRIVER_VERSION
command was missing.
Fixes: 42ca502e17 ('net/mlx5_core: Use a macro in mlx5_command_str()')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Update mlx5 command list and error return function to handle XRQ
commands.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As newer firmware supports double push/pop in a single FTE, we add
core bits and extend vlan action logic for it.
Signed-off-by: Jianbo Liu <jianbol@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
NET_DSA_BCM_SF2 does not need to depend on CONFIG_OF anymore since we have
stubs when that option is disabled.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both BCMGENET and SYSTEMPORT build just fine with CONFIG_OF=n, we do have a
dependency on HAS_IOMEM that was not being reflected for SYSTEMPORT so add
that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver builds fine even with CONFIG_OF=n since we now have stubs that are
provided.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While some of the cavium drivers don't require PCI support, most
others do, as shown by these build failures:
WARNING: unmet direct dependencies detected for MDIO_THUNDER
Depends on [n]: NETDEVICES [=y] && MDIO_BUS [=y] && 64BIT [=y] && PCI [=n]
Selected by [y]:
- THUNDER_NIC_BGX [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_CAVIUM [=y] && 64BIT [=y]
- THUNDER_NIC_RGX [=y] && NETDEVICES [=y] && ETHERNET [=y] && NET_VENDOR_CAVIUM [=y] && 64BIT [=y]
drivers/net/ethernet/cavium/thunder/nicvf_main.c: In function 'nicvf_set_irq_affinity':
drivers/net/ethernet/cavium/thunder/nicvf_main.c:1095:25: error: implicit declaration of function 'pci_irq_vector'; did you mean 'rcu_irq_enter'? [-Werror=implicit-function-declaration]
drivers/net/ethernet/cavium/thunder/nic_main.c: In function 'nic_mbx_intr_handler':
drivers/net/ethernet/cavium/thunder/nic_main.c:1135:13: error: implicit declaration of function 'pci_irq_vector'; did you mean 'rcu_irq_enter'? [-Werror=implicit-function-declaration]
In file included from drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c:27:
drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 'octeon_unmap_pci_barx':
drivers/net/ethernet/cavium/liquidio/octeon_main.h:97:3: error: implicit declaration of function 'pci_release_region'; did you mean 'pci_release_regions'? [-Werror=implicit-function-declaration]
drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c: In function 'octeon_mbox_process_cmd':
drivers/net/ethernet/cavium/liquidio/octeon_mailbox.c:263:3: error: implicit declaration of function 'pcie_capability_set_word'; did you mean 'has_capability_noaudit'? [-Werror=implicit-function-declaration]
drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c: In function 'setup_cn23xx_octeon_pf_device':
drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c:1315:22: error: 'data32' is used uninitialized in this function [-Werror=uninitialized]
drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c: In function 'cn23xx_dump_vf_iq_regs':
include/linux/dynamic_debug.h:135:3: error: 'regval' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/net/ethernet/cavium/liquidio/lio_core.c: In function 'octeon_setup_interrupt':
drivers/net/ethernet/cavium/liquidio/lio_core.c:1067:17: error: invalid application of 'sizeof' to incomplete type 'struct msix_entry'
drivers/net/ethernet/cavium/liquidio/octeon_main.h: In function 'octeon_unmap_pci_barx':
drivers/net/ethernet/cavium/liquidio/octeon_main.h:97:3: error: implicit declaration of function 'pci_release_region'; did you mean 'pci_release_regions'? [-Werror=implicit-function-declaration]
This adds back the minimum set of dependencies to get everything to
build cleanly again, but leaving the ones that build cleanly.
Fixes: 7e2bc7fb65 ("net: cavium: Drop dependency of NET_VENDOR_CAVIUM on PCI")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
mii_nway_restart is not pm aware which results in a rtnl deadlock.
Implement mii_nway_restart manual by setting BMCR_ANRESTART if
BMCR_ANENABLE is set.
To reproduce:
* plug an asix based usb network interface
* wait until the device enters PM (~5 sec)
* `ip link set eth1 up` will never return
Fixes: d9fe64e511 ("net: asix: Add in_pm parameter")
Signed-off-by: Alexander Couzens <lynxis@fe80.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warnings:
drivers/net/ethernet/cavium/liquidio/lio_main.c:3068:23: warning:
Using plain integer as NULL pointer
drivers/net/ethernet/cavium/liquidio/lio_main.c:2909:23: warning:
Using plain integer as NULL pointer
drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c:385:27: warning:
Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These dummy helpers are all intended to be inline functions,
but one of them by accident came without the 'inline' keyword,
causing a harmless warning:
In file included from drivers/net/ethernet/mellanox/mlx5/core/main.c:63:
drivers/net/ethernet/mellanox/mlx5/core/accel/tls.h:79:1: error: 'mlx5_accel_tls_add_flow' defined but not used [-Werror=unused-function]
mlx5_accel_tls_add_flow(struct mlx5_core_dev *mdev, void *flow,
Fixes: ab412e1dd7 ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
t.qset_idx can be indirectly controlled by user-space, hence leading to
a potential exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c:2286 cxgb_extension_ioctl()
warn: potential spectre issue 'adapter->msix_info'
Fix this by sanitizing t.qset_idx before using it to index
adapter->msix_info
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The use of the | operator always leads to true, which looks rather
suspect in this case.
Fix this by using & instead.
Addresses-Coverity-ID: 1471903 ("Wrong operator used")
Fixes: dba1d918da ("net: mvpp2: debugfs: add entries for classifier flows")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
display free rx and tx page count in the meminfo of
an adapter.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend existing driver for Spectrum ASIC to support Spectrum-2 ASIC.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Utilize only C-TCAM for now. Do very minimal A-TCAM initialization in
order to make C-TCAM work.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-2, ACL regions that use 8 or 12 key blocks require several
consecutive hardware regions.
In order to allow defragmentation, the device stores a mapping from a
logical region ID to an hardware region ID, which is similar to the page
table that is used to translate virtual addresses to physical addresses.
Add the region association callback to the region create sequence and
implement it as a NOP in Spectrum which does not require it.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Encode each flexible key block in the general block scheme according its
block index.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum the key (and mask) block layout is very straight forward and
every block is 16 bytes aligned.
However, in Spectrum-2 the blocks are not even byte aligned, which makes
it difficult to encode them using current method.
Instead, first encode each block and then encode the block in the
general blocks layout.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PGCR register configures general Policy-Engine settings.
Specifically, we are going to use it in order to set the default action
base pointer, which determines where the default action (when there is
no hit) is located for each region.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PERERP register configures the region eRPs. It can be used, for
example, to enable lookup in the C-TCAM in addition to the A-TCAM.
To be able to perform a lookup in the C-TCAM we need to "use" the eRP
table. This is done by marking the pointer as valid, but zeroing the eRP
table vector.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PERCR register configures the region parameters such as whether to
consult the bloom filter before performing a lookup using a specific
eRP.
For C-TCAM only usage we don't need to accurately set the master mask.
Instead, we can set all of its bits to make sure all the extracted keys
are actually used.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PERAR register is used to associate a hw region for region_id's.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-2, activity cannot be find out by TCAM rule (PTCEv2 register),
but rather by associated action set. For that purpose, extend action ops
to allow query activity from PEFA register. Block activity is decided
according to activity of the first set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-2, the PEFA register is extend to report if the action set
was hit during processing of packets. Introduce this extension and
adjust the code around this accordingly.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce key blocks for Spectrum-2 that contains the same elements used
already for Spectrum1. Along with that, introduce encoder stub.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-2, no action set is stored directly in TCAM, all are located
in KVD linear. So ask core to treat the first set as dummy empty one,
to be just used for PTCEV2 purposes.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add dummy ops for now. The ops are going to be implemented later on.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum-2, KVD linear indexes are hashed into KVD hash. Therefore it
is possible for multiple resource types to use same indexes. There are
multiple index spaces. Also, the index space is bigger than the actual
KVD hash area, which allows to have holes in the index space without any
penalization. The HW has to be told in case the index for particular
resource type is no longer used so it can be freed from KVD hash. IEDR
register is used for that.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IEDR register is used for deleting entries from the entry tables.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow program sharing between netdevs of the same NFP ASIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow program sharing between devices which were linked together.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Create a higher-level entity to represent a device/ASIC to allow
programs and maps to be shared between device ports. The extra
work is required to make sure we don't destroy BPF objects as
soon as the netdev for which they were loaded gets destroyed,
as other ports may still be using them. When netdev goes away
all of its BPF objects will be moved to other netdevs of the
device, and only destroyed when last netdev is unregistered.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Currently we have two lists of offloaded objects - programs and maps.
Netdevice unregister notifier scans those lists to orphan objects
associated with device being unregistered. This puts unnecessary
(even if negligible) burden on all netdev unregister calls in BPF-
-enabled kernel. The lists of objects may potentially get long
making the linear scan even more problematic. There haven't been
complaints about this mechanisms so far, but it is suboptimal.
Instead of relying on notifiers, make the few BPF-capable drivers
register explicitly for BPF offloads. The programs and maps will
now be collected per-device not on a global list, and only scanned
for removal when driver unregisters from BPF offloads.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
BPF code should unregister the offload capabilities from .ndo_uninit(),
to make sure the operation is atomic with unlist_netdevice(). Plumb
the init/uninit NDOs for vNICs and representors.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Move bound program information from netdevsim to shared sub-object,
as programs will soon be shared between netdevs of the same ASIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Factor out sharable netdevsim sub-object and use IFLA_LINK to link
netdevsims together at creation time. Sharable object will have
its own DebugFS directory.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Grouping netdevsim devices into "ASICs" will soon be supported.
Add switch_id attribute to all netdevsims. For now each netdevsim
will have its switch_id matching the device id.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This driver can spam the kernel log with multiple messages of:
net eth0: eth0: allmulti set
Usually 4 or 8 at a time (probably because of using ConnMan).
This message doesn't seem useful, so let's demote it from dev_info()
to dev_dbg().
Signed-off-by: David Lechner <david@lechnology.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/dsa/rtl8366.c: In function ‘rtl8366_reset_vlan’:
drivers/net/dsa/rtl8366.c:234:25: warning: unused variable ‘vlan4k’ [-Wunused-variable]
Signed-off-by: David S. Miller <davem@davemloft.net>
The wrong helper is used to swap the bytes when adding the lower bits of
the TX descriptors tag field in the shared ds_tagl variable. The
variable contains the DS[11:0] field and then the TAG[3:0] bits.
The mistake was highlighted by the sparse warning:
ravb_main.c:1622:31: left side has type restricted __le16
ravb_main.c:1622:31: right side has type unsigned short
ravb_main.c:1622:31: warning: invalid assignment: |=
ravb_main.c:1622:34: warning: cast to restricted __le16
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes sparse warning:
ravb_main.c:1257 ravb_get_strings() error: memcpy() '*ravb_gstrings_stats' too small (32 vs 960)
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Inside a loop in ravb_get_ethtool_stats() a variable 'stats' is declared
resulting in the argument also named 'stats' to be shadowed. Fix this
warning by renaming the unused argument 'stats' to 'estats'.
This fixes the sparse warning:
ravb_main.c:1225:36: originally declared here
ravb_main.c:1233:41: warning: symbol 'stats' shadows an earlier one
Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds a driver core for the Realtek SMI chips and a
subdriver for the RTL8366RB. I just added this chip simply
because it is all I can test.
The code is a massaged variant of the code that has been
sitting out-of-tree in OpenWRT for years in the absence of
a proper switch subsystem. This creates a DSA driver for it.
I have tried to credit the original authors wherever
possible.
The main changes I've done from the OpenWRT code:
- Added an IRQ chip inside the RTL8366RB switch to demux and
handle the line state IRQs.
- Distributed the phy handling out to the PHY driver.
- Added some RTL8366RB code that was missing in the driver at
the time, such as setting up "green ethernet" with a funny
jam table and forcing MAC5 (the CPU port) into 1 GBit.
- Select jam table and add the default jam table from the
vendor driver, also for ASIC "version 0" if need be.
- Do not store jam tables in the device tree, store them
in the driver.
- Pick in the "initvals" jam tables from OpenWRT's driver
and make those get selected per compatible for the
whole system. It's apparently about electrical settings
for this system and whatnot, not really configuration
from device tree.
- Implemented LED control: beware of bugs because there are
no LEDs on the device I am using!
We do not implement custom DSA tags. This is explained in
a comment in the driver as well: this "tagging protocol" is
not simply a few extra bytes tagged on to the ethernet
frame as DSA is used to. Instead, enabling the CPU tags
will make the switch start talking Realtek RRCP internally.
For example a simple ping will make this kind of packets
appear inside the switch:
0000 ff ff ff ff ff ff bc ae c5 6b a8 3d 88 99 a2 00
0010 08 06 00 01 08 00 06 04 00 01 bc ae c5 6b a8 3d
0020 a9 fe 01 01 00 00 00 00 00 00 a9 fe 01 02 00 00
0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
As you can see a custom "8899" tagged packet using the
protocol 0xa2. Norm RRCP appears to always have this
protocol set to 0x01 according to OpenRRCP. You can also
see that this is not a ping packet at all, instead the
switch is starting to talk network management issues
with the CPU port.
So for now custom "tagging" is disabled.
This was tested on the D-Link DIR-685 with initramfs and
OpenWRT userspaces and works fine on all the LAN ports
(lan0 .. lan3). The WAN port is yet not working.
Cc: Antti Seppälä <a.seppala@gmail.com>
Cc: Roman Yeryomin <roman@advem.lv>
Cc: Colin Leitner <colin.leitner@googlemail.com>
Cc: Gabor Juhos <juhosg@openwrt.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTL8366RB is an ASIC with five internal PHYs for
LAN0..LAN3 and WAN. The PHYs are spawn off the main
device so they can be handled in a distributed manner
by the Realtek PHY driver. All that is really needed
is the power save feature enablement and letting the
PHY driver core pick up the IRQ from the switch chip.
Cc: Antti Seppälä <a.seppala@gmail.com>
Cc: Roman Yeryomin <roman@advem.lv>
Cc: Colin Leitner <colin.leitner@googlemail.com>
Cc: Gabor Juhos <juhosg@openwrt.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
octeon_mgmt driver doesn't drop RX frames that are 1-4 bytes bigger than
MTU set for the corresponding interface. The problem is in the
AGL_GMX_RX0/1_FRM_MAX register setting, which should not account for VLAN
tagging.
According to Octeon HW manual:
"For tagged frames, MAX increases by four bytes for each VLAN found up to a
maximum of two VLANs, or MAX + 8 bytes."
OCTEON_FRAME_HEADER_LEN "define" is fine for ring buffer management, but
should not be used for AGL_GMX_RX0/1_FRM_MAX.
The problem could be easily reproduced using "ping" command. If affected
system has default MTU 1500, other host (having MTU >= 1504) can
successfully "ping" the affected system with payload size 1473-1476,
resulting in IP packets of size 1501-1504 accepted by the mgmt driver.
Fixed system still accepts IP packets of 1500 bytes even with VLAN tagging,
because the limits are lifted in HW as expected, for every VLAN tag.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The removed code would be called in two situations:
1. interface is brought up never or >10s after driver load
2. after close()
Case 1 we can handle cleaner by ensuring chip is powered down when
leaving probe(). open() callback will power up the chip.
In case 2 we call rtl_pll_power_down() twice currently, from the
close() callback and 10s later when entering runtime-suspend.
This is avoided by this patch.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SFP modules can contain a number of sensors. The EEPROM also contains
recommended alarm and critical values for each sensor, and indications
of if these have been exceeded. Export this information via
HWMON. Currently temperature, VCC, bias current, transmit power, and
possibly receiver power is supported.
The sensors in the modules can either return calibrate or uncalibrated
values. Uncalibrated values need to be manipulated, using coefficients
provided in the SFP EEPROM. Uncalibrated receive power values require
floating point maths in order to calibrate them. Performing this in
the kernel is hard. So if the SFP module indicates it uses
uncalibrated values, RX power is not made available.
With this hwmon device, it is possible to view the sensor values using
lm-sensors programs:
in0: +3.29 V (crit min = +2.90 V, min = +3.00 V)
(max = +3.60 V, crit max = +3.70 V)
temp1: +33.0°C (low = -5.0°C, high = +80.0°C)
(crit low = -10.0°C, crit = +85.0°C)
power1: 1000.00 nW (max = 794.00 uW, min = 50.00 uW) ALARM (LCRIT)
(lcrit = 40.00 uW, crit = 1000.00 uW)
curr1: +0.00 A (crit min = +0.00 A, min = +0.00 A) ALARM (LCRIT, MIN)
(max = +0.01 A, crit max = +0.01 A)
The scaling sensors performs on the bias current is not particularly
good. The raw values are more useful:
curr1:
curr1_input: 0.000
curr1_min: 0.002
curr1_max: 0.010
curr1_lcrit: 0.000
curr1_crit: 0.011
curr1_min_alarm: 1.000
curr1_max_alarm: 0.000
curr1_lcrit_alarm: 1.000
curr1_crit_alarm: 0.000
In order to keep the I2C overhead to a minimum, the constant values,
such as limits and calibration coefficients are read once at module
insertion time. Thus only reading *_input and *_alarm properties
requires i2c read operations.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of accessing the PHYstatus register we can use the information
phylib stores in the phy_device structure.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only remaining usage of the struct mii_if_info member is to store the
information whether the chip is GMII-capable. So we can replace it with
a simple flag.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can remove rtl8169_set_speed_xmii() now that phylib handles all this.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new phylib functions phy_speed_down() and phy_speed_up().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to using phy_mii_ioctl().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to using phy_ethtool_nway_reset().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use phy_ethtool_(g|s)et_link_ksettings() for the respective ethtool_ops
callbacks.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use genphy_soft_reset() instead of open-coding a PHY soft reset. We have
to do an explicit PHY soft reset because some chips use the genphy driver
which uses a no-op as soft_reset callback.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use phy_resume() / phy_suspend() instead of open coding this functionality.
The chip version specific differences are handled by the respective PHY
drivers.
The call to r8168_phy_power_down() in r8168_pll_power_down() can be
removed because phylib takes care now. The relevant scenarios are:
- rtl8169_close(): phy_disconnect() powers down PHY
- suspend: mdio_bus_phy_suspend() takes care
- runtime-suspend: WoL is active, don't suspend PHY
- rtl_shutdown(): no need to power down PHY
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic phylib support to r8169. All now unneeded old PHY handling code
will be removed in subsequent patches.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch has fix for TX timeout while running bi-directional
traffic with 100 Mbps using 5762.
Signed-off-by: Sanjeev Bansal <sanjeevb.bansal@broadcom.com>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Testing has uncovered a failure case that is not handled properly. In the
event that a login fails and we are not able to recover on the spot, we
return 0 from do_reset, preventing any error recovery code from being
triggered. Additionally, the state is set to "probed" meaning that when we
are able to trigger the error recovery, the driver always comes up in the
probed state. To handle the case properly, we need to return a failure code
here and set the adapter state to the state that we entered the reset in
indicating the state that we would like to come out of the recovery reset
in.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb size calculation in lan78xx_tx_bh is in race with the start_xmit,
which could lead to rare kernel oopses. So protect the whole skb walk with
a spin lock. As a benefit we can unlink the skb directly.
This patch was tested on Raspberry Pi 3B+
Link: https://github.com/raspberrypi/linux/issues/2608
Fixes: 55d7de9de6 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Floris Bos <bos@je-eigen-domein.nl>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new rx packet arrives, the rx path will decide whether to reuse
the remainder of the page or not according to one of the below conditions:
1. frag_info->frag_stride == PAGE_SIZE / 2
2. frags->page_offset + frag_info->frag_size > PAGE_SIZE;
The first condition is no met for when XDP is set.
For XDP, page_offset is always set to priv->rx_headroom which is
XDP_PACKET_HEADROOM and frag_info->frag_size is around mtu size + some
padding, still the 2nd release condition will hold since
XDP_PACKET_HEADROOM + 1536 < PAGE_SIZE, as a result the page will not
be released and will be _wrongly_ reused for next free rx descriptor.
In XDP there is an assumption to have a page per packet and reuse can
break such assumption and might cause packet data corruptions.
Fix this by adding an extra condition (!priv->rx_headroom) to the 2nd
case to avoid page reuse when XDP is set, since rx_headroom is set to 0
for non XDP setup and set to XDP_PACKET_HEADROOM for XDP setup.
No additional cache line is required for the new condition.
Fixes: 34db548bfb ("mlx4: add page recycling in receive path")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Suggested-by: Martin KaFai Lau <kafai@fb.com>
CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose counters ASIC has in the group of RFC 2819 counters that count
number of packets within specific size range.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring SLI_PKTn_OUTPUT_CONTROL, VF driver was assuming that IPTR
mode was disabled by reset, which was not true. Since DPDK driver had
set IPTR mode previously, the VF driver (which uses buf-ptr-only mode) was
not properly handling DROQ packets (i.e. it saw zero-length packets).
This represented an invalid hardware configuration which the driver could
not handle.
Signed-off-by: Rick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc device may need to fallback to running in single queue
mode if host side only wants to support single queue.
Recent change for handling mtu broke this in setup logic.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 3ffe64f1a6 ("hv_netvsc: split sub-channel setup into async and sync")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During a device failover, there may be latency between the loss
of the current backing device and a notification from firmware that
a failover has occurred. This latency can result in a large amount of
error printouts as firmware returns outgoing traffic with a generic
error code. These are not necessarily errors in this case as the
firmware is busy swapping in a new backing adapter and is not ready
to send packets yet. This patch reclassifies those error codes as
warnings with an explanation that a failover may be pending. All
other return codes will be considered errors.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Octeon Ethernet drivers work perfectly without PCI.
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tag type in the frame extraction header is only a bit wide. There's
no need to use GENMASK when retrieving the information. This patch
simplify the code by dropping GENMASK and using BIT instead.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were returning DUPLEX_UNKNOWN in get_link_ksettings() when
the link was down. Unfortunately, this causes a problem when
"ethtool -s autoneg on" is issued for a link which is down because
the ethtool code first reads the settings and then reapplies them
with only the changes provided on the command line. Which results
in us diving into set_link_ksettings() with DUPLEX_UNKNOWN which is
not DUPLEX_FULL, so set_link_ksettings() throws an -EINVAL error.
do not return DUPLEX_UNKNOWN to fix the issue.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove the following documentation warning
drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c:103: warning: Excess function parameter 'priv' description in 'stmmac_axi_setup'
It was introduced in commit afea03656a ("stmmac: rework DMA bus setting and introduce new platform AXI structure")
Signed-off-by: Corentin Labbe <clabbe@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On XDP_TX we need to free up the frame only when tun_xdp_tx() returns a
negative value. A positive value indicates that the packet is
successfully enqueued to the ptr_ring, so freeing the page causes
use-after-free.
Fixes: 735fc4054b ("xdp: change ndo_xdp_xmit API to support bulking")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hwrm_dbg_resp_addr and hwrm_dbg_resp_dma_addr are never used
and can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the existing %pad printk format to print dma_addr_t values.
This avoids the following warnings when compiling on the parisc platform:
warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'dma_addr_t {aka unsigned int}' [-Wformat=]
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing entry for RTL8211C to mdio_device_id table.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: cf87915cb9 ("net: phy: realtek: add support for RTL8211C")
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the swap macro and remove unnecessary variable *temp*.
This makes the code easier to read and maintain. Also, slightly
refactor some code due to the removal of *temp*.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some network drivers include functionality to speed down the PHY when
suspending and just waiting for a WoL packet because this saves energy.
This functionality is quite generic, therefore let's factor it out to
phylib.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This functionality will also be needed in subsequent patches of this
series, therefore factor it out to a helper.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Actually, hclge_get_ring_chain_from_mbx is used to get ring type, tqp id,
and int_gl index from mailbox message. So the comments is incorrect. This
patch fixes it.
Fixes: dde1a86e93 ("net: hns3: Add mailbox support to PF driver")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HCLGE_INT_GL_IDX_M and HCLGE_INT_GL_IDX_S are used to set fireware
cmd. When getting int_gl value from mailbox message, we should use
HNAE3_RING_GL_IDX_M and HNAE3_RING_GL_IDX_S.
Fixes: 79eee41085 ("net: hns3: add int_gl_idx setup for VF")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
handle->reset_level is assigned to HNAE3_NONE_RESET when client is
initialized, if a tx timeout happens right after initialization,
then handle->reset_level is not resetted to HNAE3_FUNC_RESET in
hclge_reset_event, which will cause reset event not properly
handled problem.
This patch fixes it by setting handle->reset_level properly when
client is initialized.
Fixes: 6d4c3981a8 ("net: hns3: Changes to make enet watchdog timeout func common for PF/VF")
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The configuration of the ring will be used to reinitialize the
ring after the hardware reset is completed. So we should not
release and reacquire this configuration during reset.
Fixes: bb6b94a896 ("net: hns3: Add reset interface implementation in client")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing reset, netdev has not been brought up is not an error,
it means that we do not need do the stop operation, so just return
zero.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware's description, driver should get reset event
from VECTOR0_PF_OTHER_INT_ST(0x20800) instead of
VECTOR0_PF_OTHER_INT_SRC(0x20700).
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netdevice reset should not be requested frequently, a new one
must wait a moment since there may be some work not completed.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since current locking was not covering certain code where
netdev was being accessed or manipulated, this patch fixes
it.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to hardware's description, the head pointer register should
be written before the tail pointer register while doing command queue
initialization.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the byte count indication in CQE for processed IPsec
packets that contain a metadata header.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds common functions to handle mellanox metadata headers.
These functions are used by IPsec and TLS to process FPGA metadata.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables TLS Rx based on available HW capabilities.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds software statistics for TLS to count important
events.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the TLS rx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.
Special metadata ethertype is used to pass information to
the hardware.
When hardware loses synchronization a special resync request
metadata message is used to request resync.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the mlx5 implementation of the TLS Rx routines to add/del TLS
contexts, also add the tls_dev_resync_rx routine
to work with the TLS inline Rx crypto offload infrastructure.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.
Complete the implementation for Innova TLS (FPGA-based) hardware by
adding support for rx inline crypto offload.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For symmetry, we rename mlx5e_tls_offload_context to
mlx5e_tls_offload_context_tx before we add mlx5e_tls_offload_context_rx.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Reviewed-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For symmetry, we rename tls_offload_context to
tls_offload_context_tx before we add tls_offload_context_rx.
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classification operations that are used for RSS make use of several
lookup tables. Having hit counters for these tables is really helpful
to determine what flows were matched by ingress traffic, and see the
path of packets among all the classifier tables.
This commit adds hit counters for the 3 tables used at the moment :
- The decoding table (also called lookup_id table), that links flows
identified by the Header Parser to the flow table.
There's one entry per flow, located at :
.../mvpp2/<controller>/flows/XX/dec_hits
Note that there are 21 flows in the decoding table, whereas there are
52 flows in the Header Parser. That's because there are several kind
of traffic that will match a given flow. Reading the hit counter from
one sub-flow will clear all hit counter that have the same flow_id.
This also applies to the flow_hits.
- The flow table, that contains all the different lookups to be
performed by the classifier for each packet of a given flow. The match
is done on the first entry of the flow sequence.
- The C2 engine entries, that are used to assign the default rx queue,
and enable or disable RSS for a given port.
There's one entry per flow, located at:
.../mvpp2/<controller>/flows/XX/flow_hits
There is one C2 entry per port, so the c2 hit counter is located at :
.../mvpp2/<controller>/ethX/c2_hits
All hit counter values are 16-bits clear-on-read values.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The classifier configuration for RSS is quite complex, with several
lookup tables being used. This commit adds useful info in debugfs to
see how the different tables are configured :
Added 2 new entries in the per-port directory :
- .../eth0/default_rxq : The default rx queue on that port
- .../eth0/rss_enable : Indicates if RSS is enabled in the C2 entry
Added the 'flows' directory :
It contains one entry per sub-flow. a 'sub-flow' is a unique path from
Header Parser to the flow table. Multiple sub-flows can point to the
same 'flow' (each flow has an id from 8 to 29, which is its index in the
Lookup Id table) :
- .../flows/00/...
/01/...
...
/51/id : The flow id. There are 21 unique flows. There's one
flow per combination of the following parameters :
- L4 protocol (TCP, UDP, none)
- L3 protocol (IPv4, IPv6)
- L3 parameters (Fragmented or not)
- L2 parameters (Vlan tag presence or not)
.../type : The flow type. This is an even higher level flow,
that we manipulate with ethtool. It can be :
"udp4" "tcp4" "udp6" "tcp6" "ipv4" "ipv6" "other".
.../eth0/...
.../eth1/engine : The hash generation engine used for this
flow on the given port
.../hash_opts : The hash generation options indicating on
what data we base the hash (vlan tag, src
IP, src port, etc.)
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One helpful feature to help debug the Header Parser TCAM filter in PPv2
is to be able to see if the entries did match something when a packet
comes in. This can be done by using the built-in hit counter for TCAM
entries.
This commit implements reading the counter, and exposing its value on
debugfs for each filter entry.
The counter is a 16-bits clear-on-read value, located at:
.../mvpp2/<controller>/parser/XXX/hits
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell PPv2 Packer Header Parser has a TCAM based filter, that is not
trivial to configure and debug. Being able to dump TCAM entries from
userspace can be really helpful to help development of new features
and debug existing ones.
This commit adds a basic debugfs interface for the PPv2 driver, focusing
on TCAM related features.
<mnt>/mvpp2/ --- f2000000.ethernet
\- f4000000.ethernet --- parser --- 000 ...
| \- 001
| \- ...
| \- 255 --- ai
| \- header_data
| \- lookup_id
| \- sram
| \- valid
\- eth1 ...
\- eth2 --- mac_filter
\- parser_entries
\- vid_filter
There's one directory per PPv2 instance, named after pdev->name to make
sure names are uniques. In each of these directories, there's :
- one directory per interface on the controller, each containing :
- "mac_filter", which lists all filtered addresses for this port
(based on TCAM, not on the kernel's uc / mc lists)
- "parser_entries", which lists the indices of all valid TCAM
entries that have this port in their port map
- "vid_filter", which lists the vids allowed on this port, based on
TCAM
- one "parser" directory (the parser is common to all ports), containing :
- one directory per TCAM entry (256 of them, from 0 to 255), each
containing :
- "ai" : Contains the 1 byte Additional Info field from TCAM, and
- "header_data" : Contains the 8 bytes Header Data extracted from
the packet
- "lookup_id" : Contains the 4 bits LU_ID
- "sram" : contains the raw SRAM data, which is the result of the TCAM
lookup. This readonly at the moment.
- "valid" : Indicates if the entry is valid of not.
All entries are read-only, and everything is output in hex form.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the appropriate SPDX license identifiers and drop the license text.
This patch is only cosmetic.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-07-15
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Various different arm32 JIT improvements in order to optimize code emission
and make the JIT code itself more robust, from Russell.
2) Support simultaneous driver and offloaded XDP in order to allow for advanced
use-cases where some work is offloaded to the NIC and some to the host. Also
add ability for bpftool to load programs and maps beyond just the cgroup case,
from Jakub.
3) Add BPF JIT support in nfp for multiplication as well as division. For the
latter in particular, it uses the reciprocal algorithm to emulate it, from Jiong.
4) Add BTF pretty print functionality to bpftool in plain and JSON output
format, from Okash.
5) Add build and installation to the BPF helper man page into bpftool, from Quentin.
6) Add a TCP BPF callback for listening sockets which is triggered right after
the socket transitions to TCP_LISTEN state, from Andrey.
7) Add a new cgroup tree command to bpftool which iterates over the whole cgroup
tree and prints all attached programs, from Roman.
8) Improve xdp_redirect_cpu sample to support parsing of double VLAN tagged
packets, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hosts using a VRRP router send their packets with a destination MAC of
the VRRP router which is of the following form [1]:
IPv4 - 00-00-5E-00-01-{VRID}
IPv6 - 00-00-5E-00-02-{VRID}
Where VRID is the ID of the virtual router. Such packets are directed to
the router block in the ASIC by an FDB entry that was added in the
previous patch.
However, in certain cases it is possible to skip this FDB lookup and
send such packets directly to the router. This is accomplished by adding
these special MAC addresses to the RIF cache. If the cache is hit, the
packet will skip the L2 lookup and ingress the router with the RIF
specified in the cache entry.
1. https://tools.ietf.org/html/rfc5798#section-7.3
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Virtual Router Redundancy Protocol packets are used to communicate the
state of the Master router associated with the virtual router ID (VRID).
These are link-local multicast packets sent with IP protocol 112 that
are trapped in the router block in the ASIC.
Add a trap for these packets and mark the trapped packets to prevent
them from potentially being re-flooded by the bridge driver.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An IP packet received on a netdev with a macvlan upper whose MAC matches
the packet's destination MAC will be re-injected to the Rx path as if it
was received by the macvlan, and perform an L3 lookup.
Reflect this functionality to the ASIC by programming FDB entries that
will direct MACs of macvlan uppers to the router.
In a similar fashion to router interfaces (RIFs) that are programmed
upon the addition of the first IP address on an interface and destroyed
upon the removal of the last IP address, the FDB entries for the macvlan
are added and destroyed based on the addition of the first and removal
of the last IP address on the macvlan.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to allow more unicast MAC addresses (e.g., VRRP virtual MAC) to
be directed to the router we need to enable macvlan uppers on top of
mlxsw netdevs.
Allow macvlan upper devices on top of mlxsw netdevs and sanitize
configurations that can't work. For example, a macvlan can't be enslaved
to a bridge as without ACLs the device doesn't take the destination MAC
into account when classifying a packet to a bridge instance (i.e., a
FID).
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: f9358e12a0 ("net: mvpp2: split ingress traffic into multiple flows")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We accidentally left out the error handling for kstrtoul().
Fixes: a520030e32 ("qlcnic: Implement flash sysfs callback for 83xx adapter")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Split handling of offloaded and driver programs completely. Since
offloaded programs always come with XDP_FLAGS_HW_MODE set in reality
there could be no sharing, anyway, programs would only be installed
in driver or in hardware. Splitting the handling allows us to install
programs in HW and in driver at the same time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Allow netdevsim to accept driver and offload attachment of XDP
BPF programs at the same time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Split the query of HW-attached program from the software one.
Introduce new .ndo_bpf command to query HW-attached program.
This will allow drivers to install different programs in HW
and SW at the same time. Netlink can now also carry multiple
programs on dump (in which case mode will be set to
XDP_ATTACHED_MULTI and user has to check per-attachment point
attributes, IFLA_XDP_PROG_ID will not be present). We reuse
IFLA_XDP_PROG_ID skb space for second mode, so rtnl_xdp_size()
doesn't need to be updated.
Note that the installation side is still not there, since all
drivers currently reject installing more than one program at
the time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Basic operations drivers perform during xdp setup and query can
be moved to helpers in the core. Encapsulate program and flags
into a structure and add helpers. Note that the structure is
intended as the "main" program information source in the driver.
Most drivers will additionally place the program pointer in their
fast path or ring structures.
The helpers don't have a huge impact now, but they will
decrease the code duplication when programs can be installed
in HW and driver at the same time. Encapsulating the basic
operations in helpers will hopefully also reduce the number
of changes to drivers which adopt them.
Helpers could really be static inline, but they depend on
definition of struct netdev_bpf which means they'd have
to be placed in netdevice.h, an already 4500 line header.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw). Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports. Remove
prog_attached member.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The hardware supposedly handles frames up to 10236 bytes and
implements .ndo_change_mtu() so accept 10236 minus the ethernet
header for a VLAN tagged frame on the netdevices. Use
ETH_MIN_MTU as minimum MTU.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initialization sequence for the ethernet, setting up
interrupt routing and such things, need to be done after
both the ports are clocked and reset. Before this the
config will not "take". Move the initialization to the
port probe function and keep track of init status in
the state.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code was not tested with two ports actually in use at
the same time. (I blame this on lack of actual hardware using
that feature.) Now after locating a system using both ports,
add necessary fix to make both ports come up.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch over to using a module parameter and debug prints
that can be controlled by this or ethtool like everyone
else. Depromote all other prints to debug messages.
The phy_print_status() was already in place, albeit never
really used because the debuglevel hiding it had to be
set up using ethtool.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code to calculate the hardware register enumerator
for the maximum L3 length isn't entirely simple to read.
Use the existing defines and rewrite the function into a
table look-up.
Acked-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This parameter enables capturing region snapshot of the crspace
during critical errors. The default value of this parameter is
disabled, it can be enabled using devlink param commands.
It is possible to configure during runtime and also driver init.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Crdump allows the driver to create a snapshot of the FW PCI
crspace and health buffer during a critical FW issue.
In case of a FW command timeout, FW getting stuck or a non zero
value on the catastrophic buffer, a snapshot will be taken.
The snapshot is exposed using devlink, cr-space, fw-health
address regions are registered on init and snapshots are attached
once a new snapshot is collected by the driver.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Health buffer address is a 32 bit PCI address offset provided by
the FW. This offset is used for reading FW health debug data
located on the shared CR space. Cr space is accessible in both
driver and FW and allows for different queries and configurations.
Health buffer size is always 64B of readable data followed by a
lock which is used to block volatile CR space access.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit allows setting the RSS hash generation parameters from
ethtool. When setting parameters for a given flow type from ethtool
(e.g. tcp4), all the corresponding flows in the flow table are updated,
according to the supported hash parameters.
For example, when configuring TCP over IPv4 hash parameters to be
src/dst IP + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"),
we only set the "src/dst port" hash parameters on the non-fragmented TCP
over IPv4 flows.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the classification action that can be performed is to compute a
hash of the packet header based on some header fields, and lookup a RSS
table based on this hash to determine the final RxQ.
This is done by adding one lookup entry per flow per port, so that we
can configure the hash generation parameters for each flow and each
port.
There are 2 possible engines that can be used for RSS hash generation :
- C3HA, that generates a hash based on up to 4 header-extracted fields
- C3HB, that does the same as c3HA, but also includes L4 info in the hash
There are a lot of fields that can be extracted from the header. For now,
we only use the ones that we can configure using ethtool :
- DST MAC address
- L3 info
- Source IP
- Destination IP
- Source port
- Destination port
The C3HB engine is selected when we use L4 fields (src/dst port).
Header parser Dec table
Ingress pkt +-------------+ flow id +----------------------------+
------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag |
+-------------+ |TCP IPv4 w/o VLAN, not frag |
|TCP IPv4 w/ VLAN, frag |--+
|etc. | |
+----------------------------+ |
|
Flow table |
+---------+ +------------+ +--------------------------+ |
| RSS tbl |<--| Classifier |<--------| flow 0: C2 lookup | |
+---------+ +------------+ | C3 lookup port 0 | |
| | | C3 lookup port 1 | |
+-----------+ +-------------+ | ... | |
| C2 engine | | C3H engines | | flow 1: C2 lookup |<--+
+-----------+ +-------------+ | C3 lookup port 0 |
| ... |
| ... |
| flow 51 : C2 lookup |
| ... |
+--------------------------+
The C2 engine also gains the role of enabling and disabling the RSS
table lookup for this packet.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 classifier allows to perform classification operations on each
ingress packet, based on the flow the packet is assigned to.
The current code uses only 1 flow per port, and the only classification
action consists of assigning the rx queue to the packet, depending on the
port.
In preparation for adding RSS support, we have to split all incoming
traffic into different flows. Since RSS assigns a rx queue depending on
the hash of some header fields, we have to make sure that the hash is
generated in a consistent way for all packets in the same flow.
What we call a "flow" is actually a set of attributes attached to a
packet that depends on various L2/L3/L4 info.
This patch introduces 52 flows, wich are a combination of various L2, L3
and L4 attributes :
- Whether or not the packet has a VLAN tag
- Whether the packet is IPv4, IPv6 or something else
- Whether the packet is TCP, UDP or something else
- Whether or not the packet is fragmented at L3 level.
The flow is associated to a packet by the Header Parser. Each flow
corresponds to an entry in the decoding table. This entry then points to
the sequence of classification lookups to be performed by the
classifier, represented in the flow table.
For now, the only lookup we perform is a C2 lookup to set the default
rx queue.
Header parser Dec table
Ingress pkt +-------------+ flow id +----------------------------+
------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag |
+-------------+ |TCP IPv4 w/o VLAN, not frag |
|TCP IPv4 w/ VLAN, frag |--+
|etc. | |
+----------------------------+ |
|
Flow table |
+------------+ +---------------------+ |
To RxQ <---| Classifier |<-------| flow 0: C2 lookup |<--------+
+------------+ | flow 1: C2 lookup |
| | ... |
+------------+ | flow 51 : C2 lookup |
| C2 engine | +---------------------+
+------------+
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 Controller has a classifier, that can perform multiple lookup
operations for each packet, using different engines.
One of these engines is the C2 engine, which performs TCAM based lookups
on data extracted from the packet header. When a packet matches an
entry, the engine sets various attributes, used to perform
classification operations.
One of these attributes is the rx queue in which the packet should be sent.
The current code uses the lookup_id table (also called decoding table)
to assign the rx queue. However, this only works if we use one entry per
port in the decoding table, which won't be the case once we add RSS
lookups.
This patch uses the C2 engine to assign the rx queue to each packet.
The C2 engine is used through the flow table, which dictates what
classification operations are done for a given flow.
Right now, we have one flow per port, which contains every ingress
packet for this port.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mvpp22_init_rss function configures the RSS parameters for each port, so
rename it accordingly. Since this function relies on classifier
configuration, move its call right after the classifier config.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When filling the RSS table, we have to make sure that the rx queue is
attached to an online CPU.
This patch is not a full support for cpu_hotplug, but rather a way to
make sure that we don't break network on system booted with the maxcpus
parameter.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds an extra indirection when setting the indirection table
into the RSS hardware table to improve the packets distribution across
CPUs. For example, if 2 queues are used on a multi-core system this new
indirection will choose two queues on two different CPUs instead of the
two first queues which are on the same first CPU.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the RSS indirection table support, allowing to use the
ethtool -x and -X options to dump and set this table.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
[Maxime: Small warning fixes, use one table per port]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
to be used is chosen according to the default rx queue that would be
used for the packet.
This default rx queue is set in the Lookup_id Table (also called
Decoding Table), and is equal to the port->first_rxq.
Since the Classifier itself isn't active at any time for the moment,
this doesn't have a direct effect, the default rx queue at the moment is
the one where all packets end-up into.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
which was specified is actually named "RSS_HASH_SEL", but isn't used by
this driver at all.
Based on how this register was used, it should have been the
RXQ2RSS_TABLE register, which allows to select the RSS table that will
be used for the incoming packet.
The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
register.
Since RSS tables are actually not used by the driver for now, this
commit does not fix a runtime bug.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch fixing a typo in one of the RSS comments.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of receive queue per port is :
- MVPP2_DEFAULT_RXQ if in single queue mode
- MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode
with MVPP2_DEFAULT_RXQ = 4.
However, we don't use the extra rx queues at the moment, we really only
need one per port per CPU, until some more advanced classification rules
are implemented.
Suggested-by: Stefan Chulski <stefanc@marvell.com>
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a dedicated #define that indicates the number of rx queues per
port per cpu, this commit removes a harcoded use of that value
This doesn't fix any runtime bugs since the harcoded value matches the
expected value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since RSS only applies when we have per-cpu rx queues, it should only
be enabled when the driver is configured to make use of multi-queue
mode.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Maxime: Commit message]
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The multi queue mode is needed to have RSS available, and offers some
nice advantages, being able to have one rx queue vector per CPU.
This mode has been usable through the use of a module parameter, this
commit makes it the default value.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 driver defines 2 "queue_modes" :
- QDIST_SINGLE_MODE, where each port share one rx queue vector
between all CPUs
- QDIST_MULTI_MODE, where each port has one rx queue vector per CPU.
Multi queue mode isn't available on PPv2.1, make sure we fallback to
single mode when running on this revision.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The size of the the RSS indirection tables should be defined in mvpp2.h,
so that we can use it in all files of the PPv2 driver.
This commit moves the define in mvpp2.h, and adds the missing #include
in mvpp2_cls.h.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include guards should be put before #includes. This doesn't fix any bug,
but prevent future compilation issues when adding new files in the mvpp2
driver
The Header Parser init function needs the platform_device definition,
and with the fixed include guards we need to add the missing include.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
getnstimeofday64 is deprecated in favor of the ktime_get() family of
functions. The direct replacement would be ktime_get_real_ts64(),
but I'm picking the basic ktime_get() instead:
- using a ktime_t simplifies the code compared to timespec64
- using monotonic time instead of real time avoids issues caused
by a concurrent settimeofday() or during a leap second adjustment.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The two do the same thing, but we want to have a consistent
naming in the kernel.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should take and release the filter_sem consistently during the
reset process, in the same manner as the mac_lock and reset_lock.
For lockdep consistency we also take the filter_sem for write around
other calls to efx->type->init().
Fixes: c2bebe37c6 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some situations we may end up calling down_read while already
holding the semaphore for write, thus hanging. This has been seen
when setting the MAC address for the interface. The hung task log
in this situation includes this stack:
down_read
efx_ef10_filter_insert
efx_ef10_filter_insert_addr_list
efx_ef10_filter_vlan_sync_rx_mode
efx_ef10_filter_add_vlan
efx_ef10_filter_table_probe
efx_ef10_set_mac_address
efx_set_mac_address
dev_set_mac_address
In addition, lockdep rightly points out that nested calling of
down_read is incorrect.
Fixes: c2bebe37c6 ("sfc: give ef10 its own rwsem in the filter table instead of filter_lock")
Tested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SYSTEMPORT Lite reversed the logic compared to SYSTEMPORT, the
GIB_FCS_STRIP bit is set when the Ethernet FCS is stripped, and that bit
is not set by default. Fix the logic such that we properly check whether
that bit is set or not and we don't forward an extra 4 bytes to the
network stack.
Fixes: 44a4524c54 ("net: systemport: Add support for SYSTEMPORT Lite")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-07-12
This series contains updates to ixgbe and e100/e1000 kernel documentation.
Alex fixes ixgbe to ensure that we are more explicit about the ordering
of updates to the receive address register (RAR) table.
Dan Carpenter fixes an issue where we were reading one element beyond
the end of the array.
Mauro Carvalho Chehab fixes formatting issues in the e100.rst and
e1000.rst that were causing errors during 'make htmldocs'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The ipsec->tx_tbl[] has IXGBE_IPSEC_MAX_SA_COUNT elements so the > needs
to be changed to >= so we don't read one element beyond the end of the
array.
Fixes: 5925947047 ("ixgbe: process the Tx ipsec offload")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This change makes it so that we are much more explicit about the ordering
of updates to the receive address register (RAR) table. Prior to this patch
I believe we may have been updating the table while entries were still
active, or possibly allowing for reordering of things since we weren't
explicitly flushing writes to either the lower or upper portion of the
register prior to accessing the other half.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2018-07-11
An update from ieee802154 for your *net* tree.
Build system fix for a missing include from Arnd Bergmann.
Setting the IFLA_LINK for the lowpan parent from Lubomir Rintel.
Fixes for some RX corner cases in adf7242 driver by Michael Hennerich.
And some small patches to cleanup our BUG_ON vs WARN_ON usage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Unionize two u8 fields where only one of them is used depending on NIC
chipset.
- Move recovery_supported field after that union
These changes eliminate 7-bytes hole in the struct and makes it smaller
by 8 bytes.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Re-order fields in struct be_eq_obj to ensure that .napi field begins
at start of cache-line. Also the .adapter field is moved to the first
cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr)
and the 4-bytes hole to 3rd cache-line.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The event queue description (be_eq_obj.desc) field is used only to format
string for IRQ name and it is not really needed to hold this value.
Remove it and use local variable to format string for IRQ name.
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit fb6113e688 ("be2net: get rid of custom busy poll code")
replaced custom busy-poll code by the generic one but left several
macros and fields in struct be_eq_obj that are currently unused.
Remove this stuff.
Fixes: fb6113e688 ("be2net: get rid of custom busy poll code")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 2632bafd74 ("be2net: fix adaptive interrupt coalescing")
introduced a separate struct be_aic_obj to hold AIC information but
unfortunately left the old stuff in be_eq_obj. So remove it.
Fixes: 2632bafd74 ("be2net: fix adaptive interrupt coalescing")
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial fix to spelling mistake in qed_probe message.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The late ts queue can contain a bunch of skbs while hi rate testing,
no need to check all of them if timestamp is already matched.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was been observed that with a particular order of initialisation,
the netdev can be up, but the SFP module still has its TX_DISABLE
signal asserted. This occurs when the network device brought up before
the SFP kernel module has been inserted by userspace.
This occurs because sfp-bus layer does not hear about the change in
network device state, and so assumes that it is still down. Set
netdev->sfp when the upstream is registered to work around this problem.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
We fail to correctly clean up after a bus registration failure, which
can lead to an incorrect assumption about the registration state of
the upstream or sfp cage.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
When offloading mirror-to-gretap, mlxsw needs to preroute the path that
the encapsulated packet will take. That path may include a LAG device
above a front panel port. So far, mlxsw resolved the path to the first
up front panel slave of the LAG interface, but that only reflects
administrative state of the port. It neglects to consider whether the
port actually has a carrier, and what the LACP state is.
So instead of checking upness of the device, check carrier state and
txability.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A follow-up patch adds a new entry point, team_port_dev_txable(). Making
it an ordinary exported function would mean that any module that may
need the service in one of the supported configurations also
unconditionally needs to pull in the team module, whether or not the
user actually intends to create team interfaces.
To prevent that, team_port_dev_txable() is defined in if_team.h, and
therefore all dependencies of that function also need to be
publicly-visible.
Therefore move team_port_get_rcu() from team.c to if_team.h.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today macvlan ignores the notification when a lower device goes
administratively down, preventing the lack of connectivity from
bubbling up.
Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN
with NO-CARRIER which should be easy to interpret in userspace.
2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
Signed-off-by: Suresh Krishnan <skrishnan@arista.com>
Signed-off-by: Travis Brown <travisb@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09
This patch series is meant to allow support for the L2 forward offload, aka
MACVLAN offload without the need for using ndo_select_queue.
The existing solution currently requires that we use ndo_select_queue in
the transmit path if we want to associate specific Tx queues with a given
MACVLAN interface. In order to get away from this we need to repurpose the
tc_to_txq array and XPS pointer for the MACVLAN interface and use those as
a means of accessing the queues on the lower device. As a result we cannot
offload a device that is configured as multiqueue, however it doesn't
really make sense to configure a macvlan interfaced as being multiqueue
anyway since it doesn't really have a qdisc of its own in the first place.
The big changes in this set are:
Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN
Disable XPS for single queue devices
Replace accel_priv with sb_dev in ndo_select_queue
Add sb_dev parameter to fallback function for ndo_select_queue
Consolidated ndo_select_queue functions that appeared to be duplicates
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Expose stats obtained from firmware via debugfs. These stats can't
be part of ethtool -S because the slow firmware mailbox can cause
packet drops under heavy load.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running ethtool -S, some stats are requested from firmware.
Since getting these stats via firmware mailbox is slow, some packets
get dropped under heavy load while running ethtool -S.
So, remove these stats from ethtool -S.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Marvell PPv2 driver uses interrupts and tasklet but does not
explicitly include linux/interrupt.h, relying on implicit includes. This
one particularly is included by chance after a long unlogical chain of
inclusions. Fix this so we do not get future build breaks.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Size of csk_tbl is about 58K, which means 3rd order page allocation.
kvzalloc provides a fallback if no high order memory is available.
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Variables ack_status, bcf and protocol are being assigned but are
never used hence they are redundant and can be removed.
Also declare ack_type as unsigned int rather than unsigned to clean
up a checkpatch warning.
Cleans up clang warnings:
warning: variable 'ack_status' set but not used [-Wunused-but-set-variable]
warning: variable 'bcf' set but not used [-Wunused-but-set-variable]
warning: variable 'protocol' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
congestion argument passed to t4_sge_alloc_rxq() is used
to differentiate between nic/ofld queues.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In general, accessing userspace memory beyond the length of the supplied
buffer in VFS read/write handlers can lead to both kernel memory corruption
(via kernel_read()/kernel_write(), which can e.g. be triggered via
sys_splice()) and privilege escalation inside userspace.
In this case, the affected files are in debugfs (and should therefore only
be accessible to root) and check that *pos is zero (which prevents the
sys_splice() trick). Therefore, this is not a security fix, but rather a
small cleanup.
For the read handlers, fix it by using simple_read_from_buffer() instead of
custom logic.
For the write handler, add a check.
changed in v2:
- also fix dbg_write()
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Fix bug in the error code path when bnxt_request_irq() returns failure.
bnxt_disable_napi() should not be called in this error path because
NAPI has not been enabled yet.
Fixes: c0c050c58d ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling bnxt_set_max_func_irqs() to modify the max IRQ count requested or
freed by the RDMA driver is flawed. The max IRQ count is checked when
re-initializing the IRQ vectors and this can happen multiple times
during ifup or ethtool -L. If the max IRQ is reduced and the RDMA
driver is operational, we may not initailize IRQs correctly. This
problem shows up on VFs with very small number of MSIX.
There is no other logic that relies on the IRQ count excluding the ones
used by RDMA. So we fix it by just removing the call to subtract or
add the IRQs used by RDMA.
Fixes: a588e4580a ("bnxt_en: Add interface to support RDMA driver.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the driver assumes IFF_BROADCAST is always set and always sets
the broadcast filter. Modify the code to set or clear the broadcast
filter according to the IFF_BROADCAST flag.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code returns -ENOMEM and does not bother to set the output
parameters to 0 when no rings are available. Some callers, such as
bnxt_get_channels() will display garbage ring numbers when that happens.
Fix it by always setting the output parameters.
Fixes: 6e6c5a57fb ("bnxt_en: Modify bnxt_get_max_rings() to support shared or non shared rings.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there aren't enough RX rings available, the driver will attempt to
use a single RX ring without the aggregation ring. If that also
fails, the BNXT_FLAG_AGG_RINGS flag is cleared but the other ring
parameters are not set consistently to reflect that. If more RX
rings become available at the next open, the RX rings will be in
an inconsistent state and may crash when freeing the RX rings.
Fix it by restoring the BNXT_FLAG_AGG_RINGS if not enough RX rings are
available to run without aggregation rings.
Fixes: bdbd1eb59c ("bnxt_en: Handle no aggregation ring gracefully.")
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is possible that OVS may set don’t care for DEI/CFI bit in
vlan_tci mask. Hence, checking for vlan_tci exact match will endup
in a vlan flow rejection.
This patch fixes the problem by checking for vlan_pcp and vid
separately, instead of checking for the entire vlan_tci.
Fixes: e85a9be93c (bnxt_en: do not allow wildcard matches for L2 flows)
Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These resources are needed for Spectrum-2 KVD linear management
implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare for Spectrum-2 FW version checking and
make mlxsw_sp_fw_rev_validate() per-ASIC as well as required FW revision
and FW filename.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For Spectrum-2, we need to insert priority to C-TCAM because HW
needs that info in order to correctly process scenarios where rules
are in both C-TCAM and A-TCAM.
So extend the mlxsw_sp_acl_ctcam_entry_add() args to accept indication
if priority needs to be filled up and implement the priority
computation and fill-up.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is going to be needed for Spectrum-2 C-TCAM implementation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since Spectrum-2 encodes blocks into different HW layout, push this
code into Spectrum-specific op.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the flex keys for Spectrum-2 differ not only in blocks definitions
but also in encoding layout, prepare for the implementation and pass
Spectrum/Spectrum-2 specific ops down to mlxsw_afk_create.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ops to be called on driver instance init and fini.
This is needed in order to be possible to do Spectrum-2 specific init
and fini work.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To allow easy and clean Spectrum-2 implementation for things that differ
from Spectrum, split the existing ACL TCAM code 3 ways:
1) common code that calls Spectrum/Spectrum-2 specific ops
2) Spectrum ops implementations
3) common C-TCAM code that is going to be shared between Spectrum and
Spectrum-2 implementations
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>