Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- fix the s2ram regression related to confusion around segment
register restoration, plus related cleanups that make the code more
robust
- a guess-unwinder Kconfig dependency fix
- an isoimage build target fix for certain tool chain combinations
- instruction decoder opcode map fixes+updates, and the syncing of
the kernel decoder headers to the objtool headers
- a kmmio tracing fix
- two 5-level paging related fixes
- a topology enumeration fix on certain SMP systems"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Resync objtool's instruction decoder source code copy with the kernel's latest version
x86/decoder: Fix and update the opcodes map
x86/power: Make restore_processor_context() sane
x86/power/32: Move SYSENTER MSR restoration to fix_processor_context()
x86/power/64: Use struct desc_ptr for the IDT in struct saved_context
x86/unwinder/guess: Prevent using CONFIG_UNWINDER_GUESS=y with CONFIG_STACKDEPOT=y
x86/build: Don't verify mtools configuration file for isoimage
x86/mm/kmmio: Fix mmiotrace for page unaligned addresses
x86/boot/compressed/64: Print error if 5-level paging is not supported
x86/boot/compressed/64: Detect and handle 5-level paging at boot-time
x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation
Pull locking fixes from Ingo Molnar:
"Misc fixes:
- Fix a S390 boot hang that was caused by the lock-break logic.
Remove lock-break to begin with, as review suggested it was
unreasonably fragile and our confidence in its continued good
health is lower than our confidence in its removal.
- Remove the lockdep cross-release checking code for now, because of
unresolved false positive warnings. This should make lockdep work
well everywhere again.
- Get rid of the final (and single) ACCESS_ONCE() straggler and
remove the API from v4.15.
- Fix a liblockdep build warning"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/lib/lockdep: Add missing declaration of 'pr_cont()'
checkpatch: Remove ACCESS_ONCE() warning
compiler.h: Remove ACCESS_ONCE()
tools/include: Remove ACCESS_ONCE()
tools/perf: Convert ACCESS_ONCE() to READ_ONCE()
locking/lockdep: Remove the cross-release locking checks
locking/core: Remove break_lock field when CONFIG_GENERIC_LOCKBREAK=y
locking/core: Fix deadlock during boot on systems with GENERIC_LOCKBREAK
Pull scheduler fixes from Ingo Molnar:
"Two fixes: a crash fix for an ARM SoC platform, and kernel-doc
warnings fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/rt: Do not pull from current CPU if only one CPU to pull
sched/core: Fix kernel-doc warnings after code movement
Pull early_ioremap fix from Ingo Molnar:
"A boot hang fix when the EFI earlyprintk driver is enabled"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJaM17CAAoJELDendYovxMvZs0H/3YIAN3wXEqcaZuAoMKdTU3C
gSipBUJKwcwo/1qJCtE5CzQnl7U6qC4BPnwNfed1AJUULFBYtpEMw6KFMhK2KELn
ADWoJzVbBzouZnj0GHuky4dQnsC5lPL3RjlWqKkzlEA3KebrxLsO7inw5Pru9zPZ
y1Isl/nLxV04f8SYITjgThCCdo9pnQOdx275YAcbQprqsOK45j+4d1Mn0fYb1Rjk
7Fk4i9UAZlY/VYycpgyC/m3PKFdawl0uqHXwu4yTKXQJGZ3BwED7ifTvHknaSOhA
DVxktnEOTg5IrrqDrr4EdZIgXQcqQF19oIOtEWCKCLvW8j/sBlSAQYkgtn5OpxY=
=gqWy
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two minor fixes for running as Xen dom0:
- when built as 32 bit kernel on large machines the Xen LAPIC
emulation should report a rather modern LAPIC in order to support
enough APIC-Ids
- The Xen LAPIC emulation is needed for dom0 only, so build it only
for kernels supporting to run as Xen dom0"
* tag 'for-linus-4.15-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: XEN_ACPI_PROCESSOR is Dom0-only
x86/Xen: don't report ancient LAPIC version
We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot
race with the call to xprt_complete_rqst().
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317
Fixes: ce7c252a8c ("SUNRPC: Add a separate spinlock to protect..")
Cc: stable@vger.kernel.org # 4.14+
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit d8f532d20e ("xprtrdma: Invoke rpcrdma_reply_handler
directly from RECV completion") introduced a performance regression
for NFS I/O small enough to not need memory registration. In multi-
threaded benchmarks that generate primarily small I/O requests,
IOPS throughput is reduced by nearly a third. This patch restores
the previous level of throughput.
Because workqueues are typically BOUND (in particular ib_comp_wq,
nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend
to aggregate on the CPU that is handling Receive completions.
The usual approach to addressing this problem is to create a QP
and CQ for each CPU, and then schedule transactions on the QP
for the CPU where you want the transaction to complete. The
transaction then does not require an extra context switch during
completion to end up on the same CPU where the transaction was
started.
This approach doesn't work for the Linux NFS/RDMA client because
currently the Linux NFS client does not support multiple connections
per client-server pair, and the RDMA core API does not make it
straightforward for ULPs to determine which CPU is responsible for
handling Receive completions for a CQ.
So for the moment, record the CPU number in the rpcrdma_req before
the transport sends each RPC Call. Then during Receive completion,
queue the RPC completion on that same CPU.
Additionally, move all RPC completion processing to the deferred
handler so that even RPCs with simple small replies complete on
the CPU that sent the corresponding RPC Call.
Fixes: d8f532d20e ("xprtrdma: Invoke rpcrdma_reply_handler ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:
Process 1 Process 2
--------- ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTB->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
mutex_lock(&nfs_clid_init_mutex);
nfs41_walk_client_list(clp, result, cred);
nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)
Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.
This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM.
Fixes: 84e54fe0a5 ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
sctp: Implement Stream Interleave: Interaction with Other SCTP Extensions
Stream Interleave would be implemented in two Parts:
1. The I-DATA Chunk Supporting User Message Interleaving
2. Interaction with Other SCTP Extensions
Overview in section 2.3 of RFC8260 for Part 2:
The usage of the I-DATA chunk might interfere with other SCTP
extensions. Future SCTP extensions MUST describe if and how they
interfere with the usage of I-DATA chunks. For the SCTP extensions
already defined when this document was published, the details are
given in the following subsections.
As the 2nd part of Stream Interleave Implementation, this patchset mostly
adds the support for SCTP Partial Reliability Extension with I-FORWARD-TSN
chunk. Then adjusts stream scheduler and stream reconfig to make them work
properly with I-DATA chunks.
In the last patch, all stream interleave codes will be enabled by adding
sysctl to allow users to use this feature.
v1 -> v2:
- removed the intl_enable check from sctp_chunk_event_lookup, as Marcelo's
suggestion.
- fixed a typo in changelog.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last patch for support of stream interleave, after this patch,
users could enable stream interleave by systcl -w net.sctp.intl_enable=1.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using idata and doing stream and asoc reset, setting ssn with
0 could only clear the 1st 16 bits of mid.
So to make this work for both data and idata, it sets mid with 0
instead of ssn, and also mid_uo for unordered idata also need to
be cleared, as said in section 2.3.2 of RFC8260.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Marcelo said in the stream scheduler patch:
Support for I-DATA chunks, also described in RFC8260, with user message
interleaving is straightforward as it just requires the schedulers to
probe for the feature and ignore datamsg boundaries when dequeueing.
All needs to do is just to ignore datamsg boundaries when dequeueing.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
handle_ftsn is added as a member of sctp_stream_interleave, used to skip
ssn for data or mid for idata, called for SCTP_CMD_PROCESS_FWDTSN cmd.
sctp_handle_iftsn works for ifwdtsn, and sctp_handle_fwdtsn works for
fwdtsn. Note that different from sctp_handle_fwdtsn, sctp_handle_iftsn
could do stream abort pd.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
report_ftsn is added as a member of sctp_stream_interleave, used to
skip tsn from tsnmap, remove old events from reasm or lobby queue,
and abort pd for data or idata, called for SCTP_CMD_REPORT_FWDTSN
cmd and asoc reset.
sctp_report_iftsn works for ifwdtsn, and sctp_report_fwdtsn works
for fwdtsn. Note that sctp_report_iftsn doesn't do asoc abort_pd,
as stream abort_pd will be done when handling ifwdtsn. But when
ftsn is equal with ftsn, which means asoc reset, asoc abort_pd has
to be done.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
validate_ftsn is added as a member of sctp_stream_interleave, used to
validate ssn/chunk type for fwdtsn or mid (message id)/chunk type for
ifwdtsn, called in sctp_sf_eat_fwd_tsn, just as validate_data.
If this check fails, an abort packet will be sent, as said in section
2.3.1 of RFC8260.
As ifwdtsn and fwdtsn chunks have different length, it also defines
ftsn_chunk_len for sctp_stream_interleave to describe the chunk size.
Then it replaces all sizeof(struct sctp_fwdtsn_chunk) with
sctp_ftsnchk_len.
It also adds the process for ifwdtsn in rx path. As Marcelo pointed
out, there's no need to add event table for ifwdtsn, but just share
prsctp_chunk_event_table with fwdtsn's. It would drop fwdtsn chunk
for ifwdtsn and drop ifwdtsn chunk for fwdtsn by calling validate_ftsn
in sctp_sf_eat_fwd_tsn.
After this patch, the ifwdtsn can be accepted.
Note that this patch also removes the sctp.intl_enable check for
idata chunks in sctp_chunk_event_lookup, as it will do this check
in validate_data later.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
generate_ftsn is added as a member of sctp_stream_interleave, used to
create fwdtsn or ifwdtsn chunk according to abandoned chunks, called
in sctp_retransmit and sctp_outq_sack.
sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is
still used for making fwdtsn.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_ifwdtsn_skip, sctp_ifwdtsn_hdr and sctp_ifwdtsn_chunk are used to
define and parse I-FWD TSN chunk format, and sctp_make_ifwdtsn is a
function to build the chunk.
The I-FORWARD-TSN Chunk Format is defined in section 2.3.1 of RFC8260.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike the various of_* routines to fetch properties, fwnode_* routines can
have an early check against a NULL fwnode_handle reference which makes them
return -EINVAL (see fwnode_call_int_op), thus making it virtually impossible to
differentiate what type of error is going on.
Have an early check in phylink_register_sfp() so we can keep proceeding with
the initialization, there is not much we can do without a valid fwnode_handle
except return early and treat this similarly to -ENOENT.
Fixes: 8fa7b9b6af ("phylink: convert to fwnode")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
It has been reported that the dummy byte we add to avoid
ZLPs can be forwarded by the modem to the PGW/GGSN, and that
some operators will drop the connection if this happens.
In theory, QMI devices are based on CDC ECM and should as such
both support ZLPs and silently ignore the dummy byte. The latter
assumption failed. Let's test out the first.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for Telit ME910 PID 0x1101.
Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz says:
====================
net: sched: Make qdisc offload uapi uniform
Several qdiscs can already be offloaded to hardware, but there's an
inconsistecy in regard to the uapi through which they indicate such
an offload is taking place - indication is passed to the user via
TCA_OPTIONS where each qdisc retains private logic for setting it.
The recent addition of offloading to RED in
602f3baf22 ("net_sch: red: Add offload ability to RED qdisc") caused
the addition of yet another uapi field for this purpose -
TC_RED_OFFLOADED.
For clarity and prevention of bloat in the uapi we want to eliminate
said added uapi, replacing it with a common mechanism that can be used
to reflect offload status of the various qdiscs.
The first patch introduces TCA_HW_OFFLOAD as the generic message meant
for this purpose. The second changes the current RED implementation into
setting the internal bits necessary for passing it, and the third removes
TC_RED_OFFLOADED as its no longer needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the previous patch, RED is now using the new uniform uapi
for indicating it's offloaded. As a result, TC_RED_OFFLOADED is no
longer utilized by kernel and can be removed [as it's still not
part of any stable release].
Fixes: 602f3baf22 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let RED utilize the new internal flag, TCQ_F_OFFLOADED,
to mark a given qdisc as offloaded instead of using a dedicated
indication.
Also, change internal logic into looking at said flag when possible.
Fixes: 602f3baf22 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qdiscs can be offloaded to HW, but current implementation isn't uniform.
Instead, qdiscs either pass information about offload status via their
TCA_OPTIONS or omit it altogether.
Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform
uAPI for the offloading status of qdiscs.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a hunk of code that is incorrectly indented with spaces
and rather than a tab. Clean this up.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King says:
====================
Add SFF module support
Add support for SFF modules. SFF modules are similar to SFP modules,
but they have fewer control signals, and are soldered down rather than
pluggable.
They also have different IDs in the EEPROM to identify as soldered down
SFF modules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for SFF modules, which are soldered down SFP modules.
These have a different phys_id value, and also have the present and
rate select signals omitted compared with their socketed counter-parts.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add "sff,sff" for SFF module support with SFP. These have a different
phys_id value, and also have the present and rate select signals omitted
compared with their socketed counter-parts.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman says:
====================
nfp: fix rtsym and XPB register handling in debug dump
this series resolves two problems in the recently added debug dump facility.
* Correctly handle reading absolute rtysms
* Correctly handle special-case PB register reads
These fixes are for code only present in net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
For XPB registers reads, some island IDs require special handling (e.g.
ARM island), which is already taken care of in nfp_xpb_readl(), so use
that instead of a straight CPP read.
Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as
0xffffffff. It has also been observed to cause a system reboot.
With this fix correct values are reported, none of which are 0xffffffff.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: 0e6c4955e1 ("nfp: dump CPP, XPB and direct ME CSRs")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In TLV-based ethtool debug dumps, don't do a CPP read for absolute
rtsyms, use the addr field in the symbol table directly as the value.
Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros.
With this fix the correct value, 0x0000004a 0x00000000 is reported.
The values may be read using ethtool debug level 2.
# ethtool -W <netdev> 2
# ethtool -w <netdev> data dump.dat
Fixes: e1e798e3fd ("nfp: dump rtsyms")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh says:
====================
net: aquantia: Atlantic driver 12/2017 updates
The patchset contains important hardware fix for machines with large MRRS
and couple of improvement in stats and capabilities reporting
patch v3:
- Fixed patch #7 after Andrew's finding. NIC level stats actually
have to be cleaned only on hw struct creation (and this is done
in kzalloc). On each hwinit we only have to reset link state
to make sure hw stats update will not increment nic stats during init.
patch v2:
- split into more detailed commits
Comment from David on wrong defines case will be submitted separately later
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a suffix to distinguish kernel mainline version and aquantia releases
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On very first start we should read out current HW counter values
to make diff based calculations later.
This also should be done each time NIC gets down/up or wakes up
after sleep state. We reset link state explicitly to prevent diffs
from being summed this first time.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce timeout from 2 secs to 1 sec. If link is down,
reduce it to 500msec. This speeds up link detection.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This metric comes from HW and is also diff-calculated, like other counters
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Originally they were filled from ring sw counters.
These sometimes incorrectly calculate byte and packet amounts
when using LRO/LSO and jumboframes. Filling ndev counters from
hardware makes them precise.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Device hardware provides only 32bit counters. Using these directly
causes byte counters to overflow soon. A separate nic level structure
with 64 bit counters is now used to collect incrementally all the stats
and report these counters to ethtool stats and ndev stats.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Systems with large MRRS on device (2K, 4K) with high data rates and/or
large MTU, atlantic observes DMA packet buffer overflow. On some systems
that causes PCIe transaction errors, hardware NMIs or datapath freeze.
This patch
1) Limits MRRS from device side to 2K (thats maximum our hardware supports)
2) Limit maximum size of outstanding TX DMA data read requests. This makes
hardware buffers running fine.
Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Different hardware device Ids correspond to different maximum speed
available. Extra checks were added for devices D108 and D109 to
remove unsupported speeds from these device capabilities list.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu says:
====================
ERSPAN version 2 (type III) support
ERSPAN has two versions, v1 (type II) and v2 (type III). This patch
series add support for erspan v2 based on existing erspan v1
implementation. The first patch refactors the existing erspan v1's
header structure, making it extensible to put additional v2's header.
The second and third patch introduces erspan v2's implementation to
ipv4 and ipv6 erspan, for both native mode and collect metadata mode.
Finally, test cases are added under the samples/bpf.
Note:
ERSPAN version 2 has many features and this patch does not implement
all. One major use case of version 2 over version 1 is its timestamp
and direction. So the traffic collector is able to distinguish the
mirrorred traffic better. Other features such as SGT (security group
tag), FT (frame type) for carrying non-ethernet packet, and optional
subheader are not implemented yet.
Example commandline for ERSPAN version 2:
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00💯:2 remote fc00💯:1 \
erspan_ver 2 erspan_dir 1 erspan_hwid 17
The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151321141525106&w=2
William Tu (4):
net: erspan: refactor existing erspan code
net: erspan: introduce erspan v2 for ip_gre
ip6_gre: add erspan v2 support
samples/bpf: add erspan v2 sample code
include/net/erspan.h | 152 ++++++++++++++++++++++++++++++++++++++---
include/net/ip6_tunnel.h | 3 +
include/net/ip_tunnels.h | 5 +-
include/uapi/linux/if_ether.h | 1 +
include/uapi/linux/if_tunnel.h | 3 +
net/ipv4/ip_gre.c | 124 +++++++++++++++++++++++++++------
net/ipv6/ip6_gre.c | 139 +++++++++++++++++++++++++++++++------
net/openvswitch/flow_netlink.c | 8 +--
samples/bpf/tcbpf2_kern.c | 77 ++++++++++++++++++---
samples/bpf/test_tunnel_bpf.sh | 38 ++++++++---
10 files changed, 472 insertions(+), 78 deletions(-)
--
A simple script to test it:
set -ex
function cleanup() {
set +ex
ip netns del ns0
ip link del ip6erspan11
ip link del veth1
}
function main() {
trap cleanup 0 2 3 9
ip netns add ns0
ip link add veth0 type veth peer name veth1
ip link set veth0 netns ns0
# non-namespace
ip addr add dev veth1 fc00💯:2/96
if [ "$1" == "v1" ]; then
echo "create IP6 ERSPAN v1 tunnel"
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00💯:2 remote fc00💯:1 \
erspan 123 erspan_ver 1
else
echo "create IP6 ERSPAN v2 tunnel"
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
local fc00💯:2 remote fc00💯:1 \
erspan_ver 2 erspan_dir 1 erspan_hwid 17
fi
ip addr add dev ip6erspan11 fc00:200::2/96
ip addr add dev ip6erspan11 10.10.200.2/24
# namespace: ns0
ip netns exec ns0 ip addr add fc00💯:1/96 dev veth0
if [ "$1" == "v1" ]; then
ip netns exec ns0 \
ip link add dev ip6erspan00 type ip6erspan seq key 102 \
local fc00💯:1 remote fc00💯:2 \
erspan 123 erspan_ver 1
else
ip netns exec ns0 \
ip link add dev ip6erspan00 type ip6erspan seq key 102 \
local fc00💯:1 remote fc00💯:2 \
erspan_ver 2 erspan_dir 1 erspan_hwid 7
fi
ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96
ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24
ip link set dev veth1 up
ip link set dev ip6erspan11 up
ip netns exec ns0 ip link set dev ip6erspan00 up
ip netns exec ns0 ip link set dev veth0 up
}
main $1
ping6 -c 1 fc00💯:1 || true
ping -c 3 10.10.200.1
exit 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tests for ipv4 ipv6 erspan version 2.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to support for ipv4 erspan, this patch adds
erspan v2 to ip6erspan tunnel.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds support for erspan version 2. Not all features are
supported in this patch. The SGT (security group tag), GRA (timestamp
granularity), FT (frame type) are set to fixed value. Only hardware
ID and direction are configurable. Optional subheader is also not
supported.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch refactors the existing erspan implementation in order
to support erspan version 2, which has additional metadata. So, in
stead of having one 'struct erspanhdr' holding erspan version 1,
breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: ethtool flash updates
Dirk says:
This series adds the ability to update the control FW with ethtool.
It should be noted that the locking scheme here is to release the RTNL
lock before the flashing operation and to take it again afterwards to
ensure consistent state from the core code point of view. In this time,
we take a reference to the device to prevent the device being freed
while its being flashed.
This provides protection for the device being flashed while at the same
time not holding up any networking related functions which would
otherwise be locked out due to RTNL being held.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>