Implement and export UDP segmentation offload (USO) support for both
NIC and MQPRIO QoS offload Tx path. Update appropriate logic in Tx to
parse GSO info in skb and configure FW_ETH_TX_EO_WR request needed to
perform USO.
v2:
- Remove inline keyword from write_eo_udp_wr() in sge.c. Let the
compiler decide.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The FW_ETH_TX_EO_WR used for sending UDP Segmentation Offload (USO)
requests expects the headers to be part of the descriptor and the
payload to be part of the SGL containing the DMA mapped addresses.
Hence, the DMA address in the first entry of the SGL can start after
the packet headers. Currently, unmap_sgl() tries to unmap from this
wrong offset, instead of the originally mapped DMA address.
So, use existing unmap_skb() instead, which takes originally saved DMA
addresses as input. Update all necessary Tx paths to save the original
DMA addresses, so that unmap_skb() can unmap them properly.
v2:
- No change.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f5489 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b0 ("s390/qeth: consolidate some duplicated HW cmd code").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Fix following sparse warnings:
drivers/net/dsa/ocelot/felix.c:351:6: warning: symbol 'felix_txtstamp' was not declared. Should it be static?
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call napi_disable() twice would cause dead lock. There are three situations
may result in the issue.
1. rtl8152_pre_reset() and set_carrier() are run at the same time.
2. Call rtl8152_set_tunable() after rtl8152_close().
3. Call rtl8152_set_ringparam() after rtl8152_close().
For #1, use the same solution as commit 8481141246 ("r8152: Re-order
napi_disable in rtl8152_close"). For #2 and #3, add checking the flag
of IFF_UP and using napi_disable/napi_enable during mutex.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function m_can_runtime_resume() is getting recursively called from
m_can_class_resume(). This results in a lock up.
We need not call m_can_class_resume() during m_can_runtime_resume().
Fixes: f524f829b7 ("can: m_can: Create a m_can platform framework")
Signed-off-by: Pankaj Sharma <pankj.sharma@samsung.com>
Signed-off-by: Sriram Dash <sriram.dash@samsung.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When virt_wifi interface is created, virt_wifi_newlink() is called and
it calls register_netdevice().
if register_netdevice() fails, it internally would call
->priv_destructor(), which is virt_wifi_net_device_destructor() and
it frees netdev. but virt_wifi_newlink() still use netdev.
So, use-after-free would occur in virt_wifi_newlink().
Test commands:
ip link add dummy0 type dummy
modprobe bonding
ip link add bonding_masters link dummy0 type virt_wifi
Splat looks like:
[ 202.220554] BUG: KASAN: use-after-free in virt_wifi_newlink+0x88b/0x9a0 [virt_wifi]
[ 202.221659] Read of size 8 at addr ffff888061629cb8 by task ip/852
[ 202.222896] CPU: 1 PID: 852 Comm: ip Not tainted 5.4.0-rc5 #3
[ 202.223765] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 202.225073] Call Trace:
[ 202.225532] dump_stack+0x7c/0xbb
[ 202.226869] print_address_description.constprop.5+0x1be/0x360
[ 202.229362] __kasan_report+0x12a/0x16f
[ 202.230714] kasan_report+0xe/0x20
[ 202.232595] virt_wifi_newlink+0x88b/0x9a0 [virt_wifi]
[ 202.233370] __rtnl_newlink+0xb9f/0x11b0
[ 202.244909] rtnl_newlink+0x65/0x90
[ ... ]
Cc: stable@vger.kernel.org
Fixes: c7cdba31ed ("mac80211-next: rtnetlink wifi simulation device")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20191121122645.9355-1-ap420073@gmail.com
[trim stack dump a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
If negotiated NVSP version <= NVSP_PROTOCOL_VERSION_6, the offset may
be wrong (too small) due to a host bug. This can cause missing the
end of the send indirection table, and add multiple zero entries from
leading zeros before the data region. This bug adds extra burden on
channel 0.
So fix the offset by computing it from the data structure sizes. This
will ensure netvsc driver runs normally on unfixed hosts, and future
fixed hosts.
Fixes: 5b54dac856 ("hyperv: Add support for virtual Receive Side Scaling (vRSS)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reach the data region, the existing code adds offset in struct
nvsp_5_send_indirect_table on the beginning of this struct. But the
offset should be based on the beginning of its container,
struct nvsp_message. This bug causes the first table entry missing,
and adds an extra zero from the zero pad after the data region.
This can put extra burden on the channel 0.
So, correct the offset usage. Also add a boundary check to ensure
not reading beyond data region.
Fixes: 5b54dac856 ("hyperv: Add support for virtual Receive Side Scaling (vRSS)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While using ARCH=mips CROSS_COMPILE=mips-linux-gnu- command to compile,
make C=2 drivers/net/ethernet/freescale/enetc/enetc.o
one warning can be found:
drivers/net/ethernet/freescale/enetc/enetc.c:1439:5:
warning: symbol 'enetc_setup_tc_mqprio' was not declared.
Should it be static?
This patch make symbol enetc_setup_tc_mqprio static.
Fixes: 34c6adf197 ("enetc: Configure the Time-Aware Scheduler via tc-taprio offload")
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to reuse ocelot functions as possible to enable PTP
clock and to support hardware timestamping on Felix.
On TX path, timestamping works on packet which requires timestamp.
The injection header will be configured accordingly, and skb clone
requires timestamp will be added into a list. The TX timestamp
is final handled in threaded interrupt handler when PTP timestamp
FIFO is ready.
On RX path, timestamping is always working. The RX timestamp could
be got from extraction header.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to define PTP registers for felix_vsc9959.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to use ocelot_port_add_txtstamp_skb() for adding skbs which
require TX timestamp into list. Export it so that DSA Felix driver
could reuse it too.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The method getting TX timestamp by reading timestamp FIFO and
matching skbs list is common for DSA Felix driver too.
So move code out of ocelot_board.c, convert to use
ocelot_get_txtstamp() function and export it.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export ocelot_hwstamp_get/set functions so that DSA driver
is able to reuse them.
Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The workqueue only exists for the primary PF. For other functions
we hit a WARN_ON in kernel/workqueue.c.
Fixes: 7c236c43b8 ("sfc: Add support for IEEE-1588 PTP")
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style. This fixes various indentation mixups (seven spaces,
tab+one space, etc).
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If rvu_get_blkaddr() fails, then this rvu_cgx_nix_cuml_stats() returns
zero and we write some uninitialized data into the debugfs output.
On the error paths, the use of the uninitialized "*stat" is harmless,
but it will lead to a Smatch warning (static analysis) and a UBSan
warning (runtime analysis) so we should prevent that as well.
Fixes: f967488d09 ("octeontx2-af: Add per CGX port level NIX Rx/Tx counters")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the soft status and control register, which allows
TX_FAULT and RX_LOS to be monitored and TX_DISABLE to be set. We
make use of this when the board does not support GPIOs for these
signals.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Micalizzi reports that Huawei MA5671A and Alcatel/Lucent G-010S-P
modules are capable of 2500base-X, but incorrectly report their
capabilities in the EEPROM. It seems rather common that GPON modules
mis-report.
Let's fix these modules by adding some quirks.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for applying module quirks to the list of supported
ethtool link modes.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-11-20
The following pull-request contains BPF updates for your *net-next* tree.
We've added 81 non-merge commits during the last 17 day(s) which contain
a total of 120 files changed, 4958 insertions(+), 1081 deletions(-).
There are 3 trivial conflicts, resolve it by always taking the chunk from
196e8ca748:
<<<<<<< HEAD
=======
void *bpf_map_area_mmapable_alloc(u64 size, int numa_node);
>>>>>>> 196e8ca748
<<<<<<< HEAD
void *bpf_map_area_alloc(u64 size, int numa_node)
=======
static void *__bpf_map_area_alloc(u64 size, int numa_node, bool mmapable)
>>>>>>> 196e8ca748
<<<<<<< HEAD
if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
=======
/* kmalloc()'ed memory can't be mmap()'ed */
if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
>>>>>>> 196e8ca748
The main changes are:
1) Addition of BPF trampoline which works as a bridge between kernel functions,
BPF programs and other BPF programs along with two new use cases: i) fentry/fexit
BPF programs for tracing with practically zero overhead to call into BPF (as
opposed to k[ret]probes) and ii) attachment of the former to networking related
programs to see input/output of networking programs (covering xdpdump use case),
from Alexei Starovoitov.
2) BPF array map mmap support and use in libbpf for global data maps; also a big
batch of libbpf improvements, among others, support for reading bitfields in a
relocatable manner (via libbpf's CO-RE helper API), from Andrii Nakryiko.
3) Extend s390x JIT with usage of relative long jumps and loads in order to lift
the current 64/512k size limits on JITed BPF programs there, from Ilya Leoshkevich.
4) Add BPF audit support and emit messages upon successful prog load and unload in
order to have a timeline of events, from Daniel Borkmann and Jiri Olsa.
5) Extension to libbpf and xdpsock sample programs to demo the shared umem mode
(XDP_SHARED_UMEM) as well as RX-only and TX-only sockets, from Magnus Karlsson.
6) Several follow-up bug fixes for libbpf's auto-pinning code and a new API
call named bpf_get_link_xdp_info() for retrieving the full set of prog
IDs attached to XDP, from Toke Høiland-Jørgensen.
7) Add BTF support for array of int, array of struct and multidimensional arrays
and enable it for skb->cb[] access in kfree_skb test, from Martin KaFai Lau.
8) Fix AF_XDP by using the correct number of channels from ethtool, from Luigi Rizzo.
9) Two fixes for BPF selftest to get rid of a hang in test_tc_tunnel and to avoid
xdping to be run as standalone, from Jiri Benc.
10) Various BPF selftest fixes when run with latest LLVM trunk, from Yonghong Song.
11) Fix a memory leak in BPF fentry test run data, from Colin Ian King.
12) Various smaller misc cleanups and improvements mostly all over BPF selftests and
samples, from Daniel T. Lee, Andre Guedes, Anders Roxell, Mao Wenan, Yue Haibing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl3VowIACgkQSD+KveBX
+j7cZgf/aEWFOw6e9oaELHqsWYWaqBabAh/celXLIVx7in1JR5oCGSQHRAH/5JB2
HHfvXvN5Yk9YNga5HtT4mqZS6NsgYksG3pneuJApLcbY627pAzlw2i90yqIKz8In
svulz/BBv22msxk/F2ZyQ04zltiNplrHI1ESbxmMhLuhRA5M9AwRTjfa2Uk6kbLj
pmwmAEymNsxNfRfL4/sVMXgbUcTOkf38h4qAWKTnUZeCFCVk2pbIjBkNTQ6eG+kY
rbYYqMyhqusGhvkiP083rdnu9aKmVUG55jIyd00PhrrVow9HSTKwNqlcq/+qtYpE
H5dhgxJexrXJd02m+KK2h+DZxPsHDA==
=t0rV
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-11-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-11-20
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.9:
('net/mlx5e: Fix set vf link state error flow')
For -stable v4.14
('net/mlxfw: Verify FSM error code translation doesn't exceed array size')
For -stable v4.19
('net/mlx5: Fix auto group size calculation')
For -stable v5.3
('net/mlx5e: Fix error flow cleanup in mlx5e_tc_tun_create_header_ipv4/6')
('net/mlx5e: Do not use non-EXT link modes in EXT mode')
('net/mlx5: Update the list of the PCI supported devices')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Only values 0 and 1 are currently defined as parameters for
PHY_MDIO_CHG. Instead of silently ignoring unknown values and
misinterpreting the firmware code let's explicitly check.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using macro FIELD_SIZEOF makes this define easier understandable.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're not in atomic context here, therefore switch to msleep.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both rtl_work_func_t() and rtl8152_close() call napi_disable().
Since the two calls aren't protected by a lock, if the close
function starts executing before the work function, we can get into a
situation where the napi_disable() function is called twice in
succession (first by rtl8152_close(), then by set_carrier()).
In such a situation, the second call would loop indefinitely, since
rtl8152_close() doesn't call napi_enable() to clear the NAPI_STATE_SCHED
bit.
The rtl8152_close() function in turn issues a
cancel_delayed_work_sync(), and so it would wait indefinitely for the
rtl_work_func_t() to complete. Since rtl8152_close() is called by a
process holding rtnl_lock() which is requested by other processes, this
eventually leads to a system deadlock and crash.
Re-order the napi_disable() call to occur after the work function
disabling and urb cancellation calls are issued.
Change-Id: I6ef0b703fc214998a037a68f722f784e1d07815e
Reported-by: http://crbug.com/1017928
Signed-off-by: Prashant Malani <pmalani@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The reset counter is specific for every QCA700x chip. So move this
into the private driver struct. Otherwise we get unpredictable reset
behavior in setups with multiple QCA700x chips.
Fixes: 291ab06ecf (net: qualcomm: new Ethernet over SPI driver for QCA7000)
Signed-off-by: Stefan Wahren <stefan.wahren@in-tech.com>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving many or larger packets, e.g. when doing a file download,
it was observed that the read buffer size register reports up to 4 bytes
more than the current define allows in the check.
If this is the case, then no data transfer is initiated to receive the
packets (and thus to empty the buffer) which results in a stall of the
interface.
These 4 bytes are a hardware generated frame length which is prepended
to the actual frame, thus we have to respect it during our check.
Fixes: 026b907d58 ("net: qca_spi: Add available buffer space verification")
Signed-off-by: Michael Heimpold <michael.heimpold@in-tech.com>
Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reversion of commit 11d49ce9f7
(“net/ibmvnic: Fix EOI when running in XIVE mode.”) leaves us
calling H_EOI even in XIVE mode. That will fail with H_FUNCTION
because H_EOI is not supported in that mode. That failure is
harmless. Ignore it so we can use common code for both XICS and
XIVE.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 11d49ce9f7
(“net/ibmvnic: Fix EOI when running in XIVE mode.”) since that
has the unintended effect of changing the interrupt priority
and emits warning when running in legacy XICS mode.
Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
since now the driver can let page_pool API to manage needed DMA
sync with a proper size.
- XDP_DROP DMA sync managed by mvneta driver: ~420Kpps
- XDP_DROP DMA sync managed by page_pool API: ~585Kpps
Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rely on page_pool_recycle_direct and not on xdp_return_buff in
mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
to the one strictly necessary
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Array mlxfw_fsm_state_err_str contains value to string translation, when
values are provided by mlxfw_dev. If value is larger than
MLXFW_FSM_STATE_ERR_MAX, return "unknown error" as expected instead of
reading an address than exceed array size.
Fixes: 410ed13cae ("Add the mlxfw module for Mellanox firmware flash process")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add the upcoming ConnectX-6 LX device ID.
Fixes: 85327a9c41 ("net/mlx5: Update the list of the PCI supported devices")
Signed-off-by: Shani Shapp <shanish@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Once all the large flow groups (defined by the user when the flow table
is created - max_num_groups) were created, then all the following new
flow groups will have only one flow table entry, even though the flow table
has place to larger groups.
Fix the condition to prefer large flow group.
Fixes: f0d22d1874 ("net/mlx5_core: Introduce flow steering autogrouped flow table")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Device that doesn't support IP-in-IP offloads has to filter csum and gso
offload support, otherwise kernel will conclude that device is capable of
offloading csum and gso for IP-in-IP tunnels and that might result in
IP-in-IP tunnel not functioning.
Fixes: 25948b87dd ("net/mlx5e: Support TSO and TX checksum offloads for IP-in-IP")
Signed-off-by: Marina Varshaver <marinav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
On some old Firmwares, connector type value was not supported, and value
read from FW was 0. For those, driver used link mode in order to set
connector type in link_ksetting.
After FW exposed the connector type, driver translated the value to ethtool
definitions. However, as 0 is a valid value, before returning PORT_OTHER,
driver run the check of link mode in order to maintain backward
compatibility.
Cited patch added support to EXT mode. With both features (connector type
and EXT link modes) ,if connector_type read from FW is 0 and EXT mode is
set, driver mistakenly compare EXT link modes to non-EXT link mode.
Fixed that by skipping this comparison if we are in EXT mode, as connector
type value is valid in this scenario.
Fixes: 6a89737241 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Before this commit the ndo always returned success.
Fix that.
Fixes: 1ab2068a4c ("net/mlx5: Implement vports admin state backup/restore")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When an ste hash table has too many collision we enlarge it
to a bigger hash table (rehash). Rehashing collision improvement
depends on the bytemask value. The more 1 bits we have in bytemask
means better spreading in the table.
Without this fix tables can grow in size without providing any
improvement which can lead to memory depletion and failures.
This patch will limit table rehash to reduce memory and improve
the performance.
Fixes: 41d0707415 ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The byte mask fields affect on the hash index distribution,
when the byte mask is zero, the hash calculation will always
be equal to the same index.
To avoid unneeded rehash of hash tables mark the table to skip
rehash.
This is needed by the next patch which will limit table rehash
to reduce memory consumption.
Fixes: 41d0707415 ("net/mlx5: DR, Expose steering rule functionality")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
When creating a CQ, the CPU id is used for the vector value.
This would fail in-case the CPU id was higher than the maximum
vector value.
Fixes: 297cccebdc ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Mirred action parsing code in parse_tc_fdb_actions() first checks if
out_dev has same parent id, and only verifies that there is a pending encap
action that was parsed before. Recent change in vxlan module made function
netdev_port_same_parent_id() to return true when called for mlx5 eswitch
representor and vxlan device created explicitly on mlx5 representor
device (vxlan devices created with "external" flag without explicitly
specifying parent interface are not affected). With call to
netdev_port_same_parent_id() returning true, incorrect code path is chosen
and encap rules fail to offload because vxlan dev is not a valid eswitch
forwarding dev. Dmesg log of error:
[ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding
In order to fix the issue, rearrange conditional in parse_tc_fdb_actions()
to check for pending encap action before checking if out_dev has the same
parent id.
Fixes: 0ce1822c2a ("vxlan: add adjacent link to limit depth level")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Current code uses the old method of prio encoding in
flow_cls_common_offload. Fix to follow the changes introduced in
commit ef01adae0e ("net: sched: use major priority number as hardware priority").
Fixes: fcb64c0f56 ("net/mlx5: E-Switch, add ingress rate support")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Be sure to release the neighbour in case of failures after successful
route lookup.
Fixes: 101f4de9dd ("net/mlx5e: Move TC tunnel offloading code to separate source file")
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add TC-MATCHALL classifier ingress offload support. The same actions
supported by existing TC-FLOWER offload can be applied to all incoming
traffic on the underlying interface.
Ensure the rule priority doesn't conflict with existing rules in the
TCAM. Only 1 ingress matchall rule can be active at a time on the
underlying interface.
v5:
- No change.
v4:
- Added check to ensure the matchall rule's prio doesn't conflict with
other rules in TCAM.
- Added logic to fill default mask for VIID, if none has been
provided, to prevent conflict with duplicate VIID rules.
- Used existing variables in private structure to fill VIID info,
instead of extracting the info manually.
v3:
- No change.
v2:
- Removed logic to fetch free index from end of TCAM. Must maintain
same ordering as in kernel.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only offload rule if it satisfies both of the following conditions:
1. The immediate previous rule has priority <= current rule's priority.
2. The immediate next rule has priority >= current rule's priority.
Also rework free entry fetch logic to search from end of TCAM, instead
of beginning, because higher indices have lower priority than lower
indices. This is similar to how TC auto generates priority values.
v5:
- Fixed commit message and comment to include comparison for equal
priority.
v4:
- Patch added in this version.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TC-MATCHALL classifier offload with TC-POLICE action applied for
all outgoing traffic on the underlying interface. Split flow block
offload to support both egress and ingress classification.
For example, to rate limit all outgoing traffic to 1 Gbps:
$ tc qdisc add dev enp2s0f4 clsact
$ tc filter add dev enp2s0f4 egress matchall skip_sw \
action police rate 1Gbit burst 8Kbit
Note that skip_sw is important. Otherwise, both stack and hardware
will end up doing policing. Policing can't be shared across flow
blocks. Only 1 egress matchall rule can be active at a time on the
underlying interface.
v5:
- No change.
v4:
- Removed check to reject police offload if prio is not 1.
- Moved TC_SETUP_BLOCK code to separate function.
v3:
- Added check to reject police offload if prio is not 1.
- Assign block_shared variable only for TC_SETUP_BLOCK.
v2:
- Added check to reject flow block sharing for policers.
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>