We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Include HWTSTAMP_FILTER_NTP_ALL in net_hwtstamp_validate() as a valid
filter and update drivers which can timestamp all packets, or which
explicitly list unsupported filters instead of using a default case, to
handle the filter.
CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The revision enum values (eg EFX_REV_HUNT_A0) form part of our API,
and are included in ethtool. If these are inconsistent then ethtool
will print garbage for a register dump (ethtool -d).
Fixes: 5a6681e22c ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: dd248f1bc6 ("sfc: Add PCI ID for Solarflare 8000 series 10/40G NIC")
Reported-by: Patrick Talbert <ptalbert@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of rx queues is determined by the rss_cpus parameter
or the cpu topology. If that is higher than EFX_MAX_RX_QUEUES the
driver can corrupt state.
Fixes: 8ceee660aa ("New driver "sfc" for Solarstorm SFC4000 controller.")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the mc_list is longer than 256 addresses, we enter mc_promisc mode.
If we're in mc_promisc mode and the firmware doesn't support cascaded
multicast, normally we also insert our mc_list, to prevent stealing by
another VI. However, if the mc_list was too long, this isn't really
helpful - the MC groups that didn't fit in the list can still get
stolen, and having only some of them stealable will probably cause
more confusing behaviour than having them all stealable. Since
inserting 256 multicast filters takes a long time and can lead to MCDI
state machine timeouts, just skip the mc_list insert in this overflow
condition.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/broadcom/genet/bcmmii.c
drivers/net/hyperv/netvsc.c
kernel/bpf/hashtab.c
Almost entirely overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Presumably if there is an "add" function, there is also a "del"
function. But it causes a static checker warning because it looks like
a common cut and paste bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The configurable priority to traffic class mapping and the user specified
queue ranges are used to configure the traffic class, overriding the
hardware defaults when the 'hw' option is set to 0. However, when the 'hw'
option is non-zero, the hardware QOS defaults are used.
This patch makes it so that we can pass the data the user provided to
ndo_setup_tc. This allows us to pull in the queue configuration if the
user requested it as well as any additional hardware offload type
requested by using a value other than 1 for the hw value.
Finally it also provides a means for the device driver to return the level
supported for the offload type via the qopt->hw value. Previously we were
just always assuming the value to be 1, in the future values beyond just 1
may be supported.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking fixes from David Miller:
1) Fix double-free in batman-adv, from Sven Eckelmann.
2) Fix packet stats for fast-RX path, from Joannes Berg.
3) Netfilter's ip_route_me_harder() doesn't handle request sockets
properly, fix from Florian Westphal.
4) Fix sendmsg deadlock in rxrpc, from David Howells.
5) Add missing RCU locking to transport hashtable scan, from Xin Long.
6) Fix potential packet loss in mlxsw driver, from Ido Schimmel.
7) Fix race in NAPI handling between poll handlers and busy polling,
from Eric Dumazet.
8) TX path in vxlan and geneve need proper RCU locking, from Jakub
Kicinski.
9) SYN processing in DCCP and TCP need to disable BH, from Eric
Dumazet.
10) Properly handle net_enable_timestamp() being invoked from IRQ
context, also from Eric Dumazet.
11) Fix crash on device-tree systems in xgene driver, from Alban Bedel.
12) Do not call sk_free() on a locked socket, from Arnaldo Carvalho de
Melo.
13) Fix use-after-free in netvsc driver, from Dexuan Cui.
14) Fix max MTU setting in bonding driver, from WANG Cong.
15) xen-netback hash table can be allocated from softirq context, so use
GFP_ATOMIC. From Anoob Soman.
16) Fix MAC address change bug in bgmac driver, from Hari Vyas.
17) strparser needs to destroy strp_wq on module exit, from WANG Cong.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
strparser: destroy workqueue on module exit
sfc: fix IPID endianness in TSOv2
sfc: avoid max() in array size
rds: remove unnecessary returned value check
rxrpc: Fix potential NULL-pointer exception
nfp: correct DMA direction in XDP DMA sync
nfp: don't tell FW about the reserved buffer space
net: ethernet: bgmac: mac address change bug
net: ethernet: bgmac: init sequence bug
xen-netback: don't vfree() queues under spinlock
xen-netback: keep a local pointer for vif in backend_disconnect()
netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails
netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups
netfilter: nf_conntrack_sip: fix wrong memory initialisation
can: flexcan: fix typo in comment
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
can: gs_usb: fix coding style
can: gs_usb: Don't use stack memory for USB transfers
ixgbe: Limit use of 2K buffers on architectures with 256B or larger cache lines
ixgbe: update the rss key on h/w, when ethtool ask for it
...
The value we read from the header is in network byte order, whereas
EFX_POPULATE_QWORD_* takes values in host byte order (which it then
converts to little-endian, as MCDI is little-endian).
Fixes: e9117e5099 ("sfc: Firmware-Assisted TSO version 2")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It confuses sparse, which thinks the size isn't constant. Let's achieve
the same thing with a BUILD_BUG_ON, since we know which one should be
bigger and don't expect them ever to change.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up affected files that include this signal functionality via sched.h.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fix typos and add the following to the scripts/spelling.txt:
an one||a one
I dropped the "an" before "one or more" in
drivers/net/ethernet/sfc/mcdi_pcol.h.
Link: http://lkml.kernel.org/r/1481573103-11329-6-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
efx_start_all can return without initialising queues as a reset is pending.
This means that when netif_device_attach is called, the kernel can start
sending traffic without having an initialised TX queue to send to.
This patch avoids this by not calling netif_device_attach if there is a
pending reset.
Fixes: e283546c04 ("sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the hw doesn't think they exist, we should defer to its authority.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On EF10, hardware filter IDs are 13 bits, but in some places we store
32-bit "full filter IDs" in which higher order bits encode the filter
match-priority. This could cause a filter to have a full filter ID of
0xffff, which is also the value EFX_EF10_FILTER_ID_INVALID which we use
in 16-bit "short" filter IDs (without match-priority bits). This would
occur if the hardware filter ID was 0x1fff and the match-priority was 7.
Unfortunately, some code that checks for EFX_EF10_FILTER_ID_INVALID can
be called on full filter IDs, and will WARN_ON if this ever happens.
So, since we have plenty of spare bits in the full filter ID, this patch
shifts the priority bits left one bit when constructing the full filter
IDs, ensuring that the 0x2000 bit of a full filter ID will always be 0
and thus no full filter ID can ever equal EFX_EF10_FILTER_ID_INVALID.
This patch also replaces open-coded full<->short filter ID conversions
with calls to functions, thus keeping the definition of the full filter
ID format in one place.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we fail to probe interrupts with our minimum mode, return that error.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add min_interrupt_mode specification per NIC type.
It is a bit confusing because of "highest interrupt mode is less capable".
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: a0ee354148 ("sfc: process RX event inner checksum flags")
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ndo_udp_tunnel_{add,del} to update the NIC's list of VXLAN and
GENEVE UDP ports. Also reset the port list to empty on driver load and
on driver unload, with appropriate flag set on the unload case.
These port numbers are used for RX inner checksum offload, and in future
will also be used for TX inner checksum offload and encapsulated TSO.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function wasn't being called in this particular case when the MC
reboots. This caused resource reallocations to not be handled properly
and often ended up disabling the interface.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is mainly to prepare for a future overlay networking patch that
could cause an MC reset at probe time if the UDP tunnel port list is
set immediately upon driver load.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set the csum_level for encapsulated packets where the encapsulation
type, l3 class and l4 class are sets that need it.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for RX checksum offload of encapsulated packets. This
essentially just means paying attention to the inner checksum flags
in the RX event, and if *either* checksum flag indicates a fail then
don't tell the kernel that checksum offload was successful.
Also, count these checksum errors and export the counts to ethtool -S.
Test the most common "good" case of RX events with a single bitmask
instead of a series of ifs. Move the more specific error checking
in to a separate function for clarity, and don't use unlikely() there
since we know at least one of the bits is bad.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This bug is harmless because it's just a sanity check and we always
pass valid values for "encap_type" but the test is off by one.
Fixes: 9b41080125 ("sfc: insert catch-all filters for encapsulated traffic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 364b605573 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.
Testing with a 7142 shows a small latency improvement of ~100 ns.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
encap_type should be checked to see if it is greater or equal to
the size of array map to fix an off-by-one array size check. This
fixes an array overrun read as detected by static analysis by
CoverityScan, CID#1398883 ("Out-of-bounds-read")
Fixes: 9b41080125 ("sfc: insert catch-all filters for encapsulated traffic")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")
This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
8000 series adapters support filtering VXLAN, NVGRE and GENEVE traffic
based on inner fields, and when the NIC recognises such traffic, it
does not match unencapsulated traffic filters any more. So add catch-
all filters for encapsulated traffic on supporting platforms.
Although recognition of VXLAN and GENEVE is based on UDP ports, and thus
will not occur until the driver (on the primary PF) notifies the
firmware of UDP ports to use, NVGRE will always be recognised, hence
without this patch 8000 series adapters will drop all NVGRE traffic.
Partly based on patches by Jon Cooper <jcooper@solarflare.com>.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rationalise several debug-or-warnings printks using netif_cond_dbg
to make output more consistent.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the NIC is switched from full-featured to low-latency, encapsulated
filters are no longer available, and this causes errors. This patch
removes those filters from the filter table on restore.
Also, if filters which are removed by the above, or which we fail to
insert when restoring filters, were UC, MC or broadcast default
filters, invalidate the corresponding vlan->default_filters entry.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
PIO buffer allocation can fail for two valid reasons:
- we've run out of them (results in -ENOSPC)
- the NIC configuration doesn't support them (results in -EPERM)
Since both these failures are expected netif_err is excessive.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ensures that we report the key and indirection table the NIC is using,
rather than (if setting them failed earlier) what we wanted it to use.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 8000 series SFC NICs have 4K PIO buffers, rather than the 2K of
the 7000 series. Rather than having a hard-coded PIO buffer size
(ER_DZ_TX_PIOBUF_SIZE), read it from the GET_CAPABILITIES_V2 MCDI
response.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an option descriptor has been sent on a queue but not followed by a
packet, there will have been no completion event, so the read and write
counts won't match and we'll think we can't do PIO. This combines with
the fact that we have two TX queues (for en/disable checksum offload),
and that both must be empty for PIO to happen.
This patch adds a separate "packet_write_count" that tracks the most
recent write_count we expect to see a completion event for; this excludes
option descriptors but _includes_ PIO descriptors (even though they look
like option descriptors). This is then used, rather than write_count,
in efx_nic_tx_is_empty().
We only bother to maintain packet_write_count on EF10, since on Siena
(a) there are no option descriptors and it always equals write_count, and
(b) there's no PIO, so we don't need it anyway.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use eth_zero_addr to assign zero address to the given address array
instead of memset when the second argument in memset is address
of zero which makes the code clearer and also add header
file linux/etherdevice.h
Signed-off-by: Shyam Saini <mayhs11saini@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/sfc/efx.c:2337:5: warning:
symbol 'efx_get_phys_port_id' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting dev_port changes the device names allocated by systemd. Any devices
with a dev_port >0 will (in default distro configurations) have a suffix of
"d<port-number>" appended.
This is not something done by other drivers, and causes confusion for users.
Fixes: 8be41320f3 ("sfc: Add code to export port_num in netdev->dev_port")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Output is of the form p<port-number>.
Note that the port numbers don't necessarily map one-to-one to physical
cages, partly because of 4x10G port modes on QSFP+ and partly because
of hw/fw implementation details.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no good reason why this should be an SRIOV-only thing.
Thus, also move it out of SRIOV-specific files.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we failed to set up RSS on EF10 (e.g. because firmware declared
RX_RSS_LIMITED), ethtool --show-nfc $dev rx-flow-hash ... should report
no fields, rather than confusingly reporting what fields we _would_ be
hashing on if RSS was working.
Fixes: dcb4123cbe ("sfc: disable RSS when unsupported")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>