Associate an arbitrary ID with each ARFS filter, allowing to properly query
for expiry. The association is maintained in a hash table, which is
protected by a spinlock.
v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot).
v2: fixed uninitialised variable (thanks davem and lkp-robot).
Fixes: 3af0f34290 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise races are possible between ethtool ops and
efx_ef10_rx_restore_rss_contexts().
Also, don't try to perform the restore on every reset, only after an MC
reboot, otherwise we'll leak RSS contexts on the NIC.
Fixes: 42356d9a13 ("sfc: support RSS spreading of ethtool ntuple filters")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx->filter_lock remains in place for use on farch, but EF10 now ignores it.
EFX_EF10_FILTER_FLAG_BUSY is no longer needed, hence it is removed.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of having an efx->type->filter_rfs_insert() method, just use
workitems with a worker function that calls efx->type->filter_insert().
The only user of this is efx_filter_rfs(), which now queues a call to
efx_filter_rfs_work().
Similarly, efx_filter_rfs_expire() is now a worker function called on a
new channel->filter_work work_struct, so the method
efx->type->filter_rfs_expire_one() is no longer called in atomic context.
We also add a new mutex efx->rps_mutex to protect the RPS state (efx->
rps_expire_channel, efx->rps_expire_index, and channel->rps_flow_id) so
that the taking of efx->filter_lock can be moved to
efx->type->filter_rfs_expire_one().
Thus, all filter table functions are now called in a sleepable context,
allowing them to use sleeping locks in a future patch.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use a linked list to associate user-facing context IDs with FW-facing
context IDs (since the latter can change after an MC reset).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_default_channel_want_txqs() is only used in efx.c, while
efx_ptp_want_txqs() and efx_ptp_channel_type (a struct) are only used
in ptp.c. In all cases these symbols should be static.
Fixes: 2935e3c382 ("sfc: on 8000 series use TX queues for TX timestamps")
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
[ecree@solarflare.com: rewrote commit message]
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For this we create and use one or more new TX queues on the PTP channel,
and enable sync events for it.
Based on a patch by Martin Habets <mhabets@solarflare.com>.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Store and handle ethtool link mode masks within the driver instead of
just a single u32. However, quite a significant amount of existing code
wants to manipulate the masks directly, and thus now uses the first
unsigned long (i.e. mask[0]) as though it were a legacy u32 mask. This
is ok because all the bits that code is interested in are in the first
32 bits of the mask; but it might be a good idea to change them in
future to use the proper bitmap API.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Medford2 NICs support more than MC_CMD_MAC_NSTATS stats, and report the new
count in a field of MC_CMD_GET_CAPABILITIES_V4. This also means that the
end generation count moves (it is, as before, the last 64 bits of the DMA
buffer, but that is no longer MC_CMD_MAC_GENERATION_END).
So read num_mac_stats from the GET_CAPABILITIES response, if present;
otherwise assume MC_CMD_MAC_NSTATS; and always use num_mac_stats - 1 rather
than MC_CMD_MAC_GENERATION_END.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Medford2 can also have 16k or 64k VI stride. This is reported by MCDI in
GET_CAPABILITIES, which fortunately is called before the driver does
anything sensitive to the VI stride (such as accessing or even allocating
VIs past the zeroth).
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support using BAR 0 on SFC9250, even though the driver doesn't bind to such
devices yet.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
Ethernet FCS inclusion (rx-fcs) is supported on EF10 NICs, conditional on
a firmware capability bit (MC_CMD_GET_CAPABILITIES_OUT_RX_INCLUDE_FCS).
To receive frames with bad FCS (rx-all) we just don't return the discard
flag EFX_RX_PKT_DISCARD from efx_ef10_handle_rx_event_errors() or
efx_farch_handle_rx_not_ok().
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of rx queues is determined by the rss_cpus parameter
or the cpu topology. If that is higher than EFX_MAX_RX_QUEUES the
driver can corrupt state.
Fixes: 8ceee660aa ("New driver "sfc" for Solarstorm SFC4000 controller.")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Presumably if there is an "add" function, there is also a "del"
function. But it causes a static checker warning because it looks like
a common cut and paste bug.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_start_all can return without initialising queues as a reset is pending.
This means that when netif_device_attach is called, the kernel can start
sending traffic without having an initialised TX queue to send to.
This patch avoids this by not calling netif_device_attach if there is a
pending reset.
Fixes: e283546c04 ("sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we fail to probe interrupts with our minimum mode, return that error.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add min_interrupt_mode specification per NIC type.
It is a bit confusing because of "highest interrupt mode is less capable".
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ndo_udp_tunnel_{add,del} to update the NIC's list of VXLAN and
GENEVE UDP ports. Also reset the port list to empty on driver load and
on driver unload, with appropriate flag set on the unload case.
These port numbers are used for RX inner checksum offload, and in future
will also be used for TX inner checksum offload and encapsulated TSO.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is mainly to prepare for a future overlay networking patch that
could cause an MC reset at probe time if the UDP tunnel port list is
set immediately upon driver load.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 364b605573 ("net: busy-poll: return busypolling status
to drivers"), napi_complete_done() returns a boolean that can be used
by drivers to conditionally rearm interrupts.
Testing with a 7142 shows a small latency improvement of ~100 ns.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In linux-4.5, busy polling was implemented in core
NAPI stack, meaning that all custom implementation can
be removed from drivers.
Not only we remove lot's of tricky code, we also remove
one lock operation in fast path.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")
This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/sfc/efx.c:2337:5: warning:
symbol 'efx_get_phys_port_id' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Output is of the form p<port-number>.
Note that the port numbers don't necessarily map one-to-one to physical
cages, partly because of 4x10G port modes on QSFP+ and partly because
of hw/fw implementation details.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's no good reason why this should be an SRIOV-only thing.
Thus, also move it out of SRIOV-specific files.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should
only be used if it is not possible to continue without potential harm;
and since the non-DEBUG driver will continue regardless (as the BUG_ON is
compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver.
So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID
or the newly defined EFX_WARN_ON_ONCE_PARANOID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's no longer used now that Falcon is gone.
Also remove a reference in a comment to an ioctl that doesn't exist.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rationale: The differences between Falcon and Siena are in many ways larger
than those between Siena and EF10 (despite Siena being nominally "Falcon-
architecture"); for instance, Falcon has no MCPU, so there is no MCDI.
Removing Falcon support from the sfc driver should simplify the latter,
and avoid the possibility of Falcon support being broken by changes to sfc
(which are rarely if ever tested on Falcon, it being end-of-lifed hardware).
The sfc-falcon driver created in this changeset is essentially a copy of the
sfc driver, but with Siena- and EF10-specific code, including MCDI, removed
and with the "efx_" identifier prefix changed to "ef4_" (for "EFX 4000-
series") to avoid collisions when both drivers are built-in.
This changeset removes Falcon from the sfc driver's PCI ID table; then in
sfc I've removed obvious Falcon-related code: I removed the Falcon NIC
functions, Falcon PHY code, and EFX_REV_FALCON_*, then fixed up everything
that referenced them.
Also, increment minor version of both drivers (to 4.1).
For now, CONFIG_SFC selects CONFIG_SFC_FALCON, so that updating old configs
doesn't cause Falcon support to disappear; but that should be undone at
some point in the future.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It gives no advantage over GSO now that xmit_more exists. If we find
ourselves unable to handle a TSO skb (because our TXQ doesn't have a
TSOv2 context and the NIC doesn't support TSOv1), hand it back to GSO.
Also do that if the TSO handler fails with EINVAL for any other reason.
As Falcon-architecture NICs don't support any firmware-assisted TSO,
they no longer advertise TSO feature flags at all.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling napi_hash_del() after netif_napi_del() is pointless.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Edward Cree <ecree@solarflare.com>
Cc: Bert Kenward <bkenward@solarflare.com>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
efx_copy_channel() doesn't correctly clear the napi_hash related state.
This means that when napi_hash_add is called for that channel nothing is
done, and we are left with a copy of the napi_hash_node from the old
channel. When we later call napi_hash_del() on this channel we have a
stale napi_hash_node.
Corruption is only seen when there are multiple entries in one of the
napi_hash lists. This is made more likely by having a very large number
of channels. Testing was carried out with 512 channels - 32 channels on
each of 16 ports.
This failure typically appears as protection faults within napi_by_id()
or napi_hash_add(). efx_copy_channel() is only used when tx or rx ring
sizes are changed (ethtool -G).
Fixes: 36763266bb ("sfc: Add support for busy polling")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 61e84623ac ("net: centralize net_device min/max MTU checking")
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Ma Yuying <yuma@redhat.com>
Suggested-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On SFN8000 series adapters the MC provides a method to get the timer
quantum and the maximum timer setting. We revert to the old values if the
new call is unavailable.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SFN8000-series NICs require a new method of setting interrupt moderation,
via MCDI. This is indicated by a workaround flag. This new MCDI command
takes an explicit time value rather than a number of ticks. It therefore
makes sense to also store the moderation values in terms of time, since
that is what the ethtool interface is interested in.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If should be done after net_dev->hw_features initialization, to keep the
feature there to be able to enable it later using ethtool.
VLAN filtering is enforced and fixed if vPort requires usage of VLAN
filters to receive tagged traffic.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to enslave an SFC interface to a bond the following BUG_ON was
hit:
kernel BUG [in ef10.c]!
CPU: 0 PID: 4383 Comm: ifenslave Tainted: G
...
Call Trace:
efx_ef10_filter_add_vlan+0x121/0x180 [sfc]
efx_ef10_filter_table_probe+0x2a2/0x4f0 [sfc]
efx_ef10_set_mac_address+0x370/0x6d0 [sfc]
efx_set_mac_address+0x7d/0x120 [sfc]
dev_set_mac_address+0x43/0xa0
bond_enslave+0x337/0xea0 [bonding]
This comes from function efx_ef10_filter_vlan_sync_rx_mode.
To solve the bug we ensure the mac_lock is taken before calling
efx_ef10_filter_add_vlan. But to avoid a priority inversion mac_lock must
be taken before filter_sem.
To satisfy these requirements we end up taking mac_lock in
efx_ef10_vport_set_mac_address, efx_ef10_set_mac_address,
efx_ef10_sriov_set_vf_vlan and efx_probe_filters.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Supports HW VLAN filtering, en/disabled using ethtool.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It allows to change set of fixed features on datapath reset.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Otherwise we get confused when two flows on different channels get the
same flow ID.
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on configuration the NIC may return errors for unprivileged
functions and/or VFs. Where these are expected and handled, reduce the
level of any output.
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These netif flags are unnecessary convolutions. It is more
straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM,
and NETIF_F_IPV6_CSUM directly.
This patch also:
- Cleans up can_checksum_protocol
- Simplifies netdev_intersect_features
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
set of features for offloading all checksums. This is a mask of the
checksum offload related features bits. It is incorrect to set both
NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
features of a device.
This patch:
- Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
NETIF_F_ALL_CSUM is being used as a mask).
- Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>