In referenced fix we removed the RTL8168e-specific jumbo config for
RTL8168evl in rtl_hw_jumbo_enable(). We have to do the same in
rtl_hw_jumbo_disable().
v2: fix referenced commit id
Fixes: 14012c9f3b ("r8169: fix jumbo configuration for RTL8168evl")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pipe_wait() may be simple, but since it relies on the pipe lock, it
means that we have to do the wakeup while holding the lock. That's
unfortunate, because the very first thing the waked entity will want to
do is to get the pipe lock for itself.
So get rid of the pipe_wait() usage by simply releasing the pipe lock,
doing the wakeup (if required) and then using wait_event_interruptible()
to wait on the right condition instead.
wait_event_interruptible() handles races on its own by comparing the
wakeup condition before and after adding itself to the wait queue, so
you can use an optimistic unlocked condition for it.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This code is ancient, and goes back to when we only had a single page
for the pipe buffers. The exact history is hidden in the mists of time
(ie "before git", and in fact predates the BK repository too).
At that long-ago point in time, it actually helped to try to merge big
back-and-forth pipe reads and writes, and not limit pipe reads to the
single pipe buffer in length just because that was all we had at a time.
However, since then we've expanded the pipe buffers to multiple pages,
and this logic really doesn't seem to make sense. And a lot of it is
somewhat questionable (ie "hmm, the user asked for a non-blocking read,
but we see that there's a writer pending, so let's wait anyway to get
the extra data that the writer will have").
But more importantly, it makes the "go to sleep" logic much less
obvious, and considering the wakeup issues we've had, I want to make for
less of those kinds of things.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the read side version of the previous commit: it simplifies the
logic to only wake up waiting writers when necessary, and makes sure to
use a synchronous wakeup. This time not so much for GNU make jobserver
reasons (that pipe never fills up), but simply to get the writer going
quickly again.
A bit less verbose commentary this time, if only because I assume that
the write side commentary isn't going to be ignored if you touch this
code.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The pipe rework ends up having been extra painful, partly becaused of
actual bugs with ordering and caching of the pipe state, but also
because of subtle performance issues.
In particular, the pipe rework caused the kernel build to inexplicably
slow down.
The reason turns out to be that the GNU make jobserver (which limits the
parallelism of the build) uses a pipe to implement a "token" system: a
parallel submake will read a character from the pipe to get the job
token before starting a new job, and will write a character back to the
pipe when it is done. The overall job limit is thus easily controlled
by just writing the appropriate number of initial token characters into
the pipe.
But to work well, that really means that the old behavior of write
wakeups being synchronous (WF_SYNC) is very important - when the pipe
writer wakes up a reader, we want the reader to actually get scheduled
immediately. Otherwise you lose the parallelism of the build.
The pipe rework lost that synchronous wakeup on write, and we had
clearly all forgotten the reasons and rules for it.
This rewrites the pipe write wakeup logic to do the required Wsync
wakeups, but also clarifies the logic and avoids extraneous wakeups.
It also ends up addign a number of comments about what oit does and why,
so that we hopefully don't end up forgetting about this next time we
change this code.
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
RTL8125 also requires to enable RX for WoL.
v2: add missing Fixes tag
Fixes: f1bce4ad2f ("r8169: add support for RTL8125")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we receive a new packet from the guest, we check if the
src_cid is correct, but we forgot to check the dst_cid.
The host should accept only packets where dst_cid is
equal to the host CID.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit ef87f7da6b ("net: phy: dp83867: move dt parsing to probe")
causes regression on TI dra71x-evm and dra72x-evm, where DP83867 PHY is
used in "rgmii-id" mode - the networking stops working.
Unfortunately, it's not enough to just move DT parsing code to .probe() as
it depends on phydev->interface value, which is set to correct value abter
the .probe() is completed and before calling .config_init(). So, RGMII
configuration can't be loaded from DT.
To fix and issue
- move RGMII validation code to .config_init()
- parse RGMII parameters in dp83867_of_init(), but consider them as
optional.
Fixes: ef87f7da6b ("net: phy: dp83867: move dt parsing to probe")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now RX interrupt is triggered twice every time, because in
cpsw_rx_interrupt() it is asked first and then disabled. So there will be
pending interrupt always, when RX interrupt is enabled again in NAPI
handler.
Fix it by first disabling IRQ and then do ask.
Fixes: 870915feab ("drivers: net: cpsw: remove disable_irq/enable_irq as irq can be masked from cpsw itself")
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After pskb_may_pull() we should always refetch the header
pointers from the skb->data in case it got reallocated.
In gre_parse_header(), the erspan header is still fetched
from the 'options' pointer which is fetched before
pskb_may_pull().
Found this during code review of a KMSAN bug report.
Fixes: cb73ee40b1 ("net: ip_gre: use erspan key field for tunnel lookup")
Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Acked-by: William Tu <u9012063@gmail.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing NULL to pppoe_pernet causes a crash via BUG_ON.
Dereferencing net in net_generici() also has the same effect. This patch
removes the redundant BUG_ON check on the same parameter.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ / /' -i */Kconfig
Link: http://lkml.kernel.org/r/20191120140140.19148-1-krzk@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DEBUG_FS does not belong to 'Compile-time checks and compiler options'.
Link: http://lkml.kernel.org/r/20190909144453.3520-10-changbin.du@gmail.com
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I think DEBUG_BUGVERBOSE is a dmesg option which gives more debug info
to dmesg.
Link: http://lkml.kernel.org/r/20190909144453.3520-9-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are both memory debug options to debug kernel stack issues.
Link: http://lkml.kernel.org/r/20190909144453.3520-7-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are similar options so place them together.
Link: http://lkml.kernel.org/r/20190909144453.3520-6-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move error injection, coverage, testing options to a new top level
submenu 'Kernel Testing and Coverage'. They are all for test purpose.
Link: http://lkml.kernel.org/r/20190909144453.3520-5-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Group these similar runtime data structures verification options
together.
Link: http://lkml.kernel.org/r/20190909144453.3520-4-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arch special options are a little long, so create a submenu for
them.
Link: http://lkml.kernel.org/r/20190909144453.3520-3-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "hacking: make 'kernel hacking' menu better structurized", v3.
This series is a trivial improvment for the layout of 'kernel hacking'
configuration menu. Now we have many items in it which makes takes a
little time to look up them since they are not well structurized yet.
Early discussion is here:
https://lkml.org/lkml/2019/9/1/39
This patch (of 9):
Group generic kernel debugging instruments sysrq/kgdb/ubsan together
into a new submenu.
Link: http://lkml.kernel.org/r/20190909144453.3520-2-changbin.du@gmail.com
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel wait queues have a basic rule to them: you add yourself to
the wait-queue first, and then you check the things that you're going to
wait on. That avoids the races with the event you're waiting for.
The same goes for poll/select logic: the "poll_wait()" goes first, and
then you check the things you're polling for.
Of course, if you use locking, the ordering doesn't matter since the
lock will serialize with anything that changes the state you're looking
at. That's not the case here, though.
So move the poll_wait() first in pipe_poll(), before you start looking
at the pipe state.
Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The legacy client tracking infrastructure of nfsd makes use of MD5 to
derive a client's recovery directory name. As the nfsd module doesn't
declare any dependency on CRYPTO_MD5, though, it may fail to allocate
the hash if the kernel was compiled without it. As a result, generation
of client recovery directories will fail with the following error:
NFSD: unable to generate recoverydir name
The explicit dependency on CRYPTO_MD5 was removed as redundant back in
6aaa67b5f3 (NFSD: Remove redundant "select" clauses in fs/Kconfig
2008-02-11) as it was already implicitly selected via RPCSEC_GSS_KRB5.
This broke when RPCSEC_GSS_KRB5 was made optional for NFSv4 in commit
df486a2590 (NFS: Fix the selection of security flavours in Kconfig) at
a later point.
Fix the issue by adding back an explicit dependency on CRYPTO_MD5.
Fixes: df486a2590 (NFS: Fix the selection of security flavours in Kconfig)
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Static checker revealed possible error path leading to possible
NULL pointer dereferencing.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e0639dc580: ("NFSD introduce async copy feature")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The thermal trees were merged into a single one shared with the
maintainer of the subsystem.
Update the location of this group git tree.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20191205121227.19203-1-daniel.lezcano@linaro.org
When do randbuiding, we got this:
WARNING: unmet direct dependencies detected for THERMAL_GOV_POWER_ALLOCATOR
Depends on [n]: THERMAL [=y] && ENERGY_MODEL [=n]
Selected by [y]:
- THERMAL_DEFAULT_GOV_POWER_ALLOCATOR [=y] && <choice>
The Kconfig option THERMAL_DEFAULT_GOV_POWER_ALLOCATOR selects the
THERMAL_GOV_POWER_ALLOCATOR but this one depends on the ENERGY_MODEL
which is not enabled.
Make THERMAL_DEFAULT_GOV_POWER_ALLOCATOR depend on THERMAL_GOV_POWER_ALLOCATOR
to fix this warning.
Suggested-by: Quentin Perret <qperret@google.com>
Fixes: a4e893e802 ("thermal: cpu_cooling: Migrate to using the EM framework")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20191113105313.41616-1-yuehaibing@huawei.com
Guillaume Nault says:
====================
tcp: fix handling of stale syncookies timestamps
The synflood timestamps (->ts_recent_stamp and ->synq_overflow_ts) are
only refreshed when the syncookie protection triggers. Therefore, their
value can become very far apart from jiffies if no synflood happens for
a long time.
If jiffies grows too much and wraps while the synflood timestamp isn't
refreshed, then time_after32() might consider the later to be in the
future. This can trick tcp_synq_no_recent_overflow() into returning
erroneous values and rejecting valid ACKs.
Patch 1 handles the case of ACKs using legitimate syncookies.
Patch 2 handles the case of stray ACKs.
Patch 3 annotates lockless timestamp operations with READ_ONCE() and
WRITE_ONCE().
Changes from v3:
- Fix description of time_between32() (found by Eric Dumazet).
- Use more accurate Fixes tag in patch 3 (suggested by Eric Dumazet).
Changes from v2:
- Define and use time_between32() instead of a pair of
time_before32/time_after32 (suggested by Eric Dumazet).
- Use 'last_overflow - HZ' as lower bound in
tcp_synq_no_recent_overflow(), to accommodate for concurrent
timestamp updates (found by Eric Dumazet).
- Add a third patch to annotate lockless accesses to .ts_recent_stamp.
Changes from v1:
- Initialising timestamps at socket creation time is not enough
because jiffies wraps in 24 days with HZ=1000 (Eric Dumazet).
Handle stale timestamps in tcp_synq_overflow() and
tcp_synq_no_recent_overflow() instead.
- Rework commit description.
- Add a second patch to handle the case of stray ACKs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Syncookies borrow the ->rx_opt.ts_recent_stamp field to store the
timestamp of the last synflood. Protect them with READ_ONCE() and
WRITE_ONCE() since reads and writes aren't serialised.
Use of .rx_opt.ts_recent_stamp for storing the synflood timestamp was
introduced by a0f82f64e2 ("syncookies: remove last_synq_overflow from
struct tcp_sock"). But unprotected accesses were already there when
timestamp was stored in .last_synq_overflow.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When no synflood occurs, the synflood timestamp isn't updated.
Therefore it can be so old that time_after32() can consider it to be
in the future.
That's a problem for tcp_synq_no_recent_overflow() as it may report
that a recent overflow occurred while, in fact, it's just that jiffies
has grown past 'last_overflow' + TCP_SYNCOOKIE_VALID + 2^31.
Spurious detection of recent overflows lead to extra syncookie
verification in cookie_v[46]_check(). At that point, the verification
should fail and the packet dropped. But we should have dropped the
packet earlier as we didn't even send a syncookie.
Let's refine tcp_synq_no_recent_overflow() to report a recent overflow
only if jiffies is within the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval. This
way, no spurious recent overflow is reported when jiffies wraps and
'last_overflow' becomes in the future from the point of view of
time_after32().
However, if jiffies wraps and enters the
[last_overflow, last_overflow + TCP_SYNCOOKIE_VALID] interval (with
'last_overflow' being a stale synflood timestamp), then
tcp_synq_no_recent_overflow() still erroneously reports an
overflow. In such cases, we have to rely on syncookie verification
to drop the packet. We unfortunately have no way to differentiate
between a fresh and a stale syncookie timestamp.
In practice, using last_overflow as lower bound is problematic.
If the synflood timestamp is concurrently updated between the time
we read jiffies and the moment we store the timestamp in
'last_overflow', then 'now' becomes smaller than 'last_overflow' and
tcp_synq_no_recent_overflow() returns true, potentially dropping a
valid syncookie.
Reading jiffies after loading the timestamp could fix the problem,
but that'd require a memory barrier. Let's just accommodate for
potential timestamp growth instead and extend the interval using
'last_overflow - HZ' as lower bound.
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If no synflood happens for a long enough period of time, then the
synflood timestamp isn't refreshed and jiffies can advance so much
that time_after32() can't accurately compare them any more.
Therefore, we can end up in a situation where time_after32(now,
last_overflow + HZ) returns false, just because these two values are
too far apart. In that case, the synflood timestamp isn't updated as
it should be, which can trick tcp_synq_no_recent_overflow() into
rejecting valid syncookies.
For example, let's consider the following scenario on a system
with HZ=1000:
* The synflood timestamp is 0, either because that's the timestamp
of the last synflood or, more commonly, because we're working with
a freshly created socket.
* We receive a new SYN, which triggers synflood protection. Let's say
that this happens when jiffies == 2147484649 (that is,
'synflood timestamp' + HZ + 2^31 + 1).
* Then tcp_synq_overflow() doesn't update the synflood timestamp,
because time_after32(2147484649, 1000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 1000: the value of 'last_overflow' + HZ.
* A bit later, we receive the ACK completing the 3WHS. But
cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow()
says that we're not under synflood. That's because
time_after32(2147484649, 120000) returns false.
With:
- 2147484649: the value of jiffies, aka. 'now'.
- 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID.
Of course, in reality jiffies would have increased a bit, but this
condition will last for the next 119 seconds, which is far enough
to accommodate for jiffie's growth.
Fix this by updating the overflow timestamp whenever jiffies isn't
within the [last_overflow, last_overflow + HZ] range. That shouldn't
have any performance impact since the update still happens at most once
per second.
Now we're guaranteed to have fresh timestamps while under synflood, so
tcp_synq_no_recent_overflow() can safely use it with time_after32() in
such situations.
Stale timestamps can still make tcp_synq_no_recent_overflow() return
the wrong verdict when not under synflood. This will be handled in the
next patch.
For 64 bits architectures, the problem was introduced with the
conversion of ->tw_ts_recent_stamp to 32 bits integer by commit
cca9bab1b7 ("tcp: use monotonic timestamps for PAWS").
The problem has always been there on 32 bits architectures.
Fixes: cca9bab1b7 ("tcp: use monotonic timestamps for PAWS")
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl3pcFUACgkQSD+KveBX
+j4tNgf/VJ1FgeYn4INueeliFebX+8sgk7pA8zI5Wf3s0gzmCLulNeF8sK3sV7o4
U94dHauU1CZxTpI9P2AVJVQ5CnTi2GBjuEKHh1Rz9bVwU3N5D4miShtHUEI1PLna
VhLJiMSU1ffUerL66zgW+QYbPAKSlp+INysr+KeBBOwxoco4C17pGwJ9idpaHYnL
sdVsHVcWmD1fen+eSN/8i4RzqVFTjM3cCsMg1OJn8KuDH+z6bQZ9cOWWLn+hz9lc
uBZHZHvRFRTg1NeN1BFcMEL0RBoG0sTtfJLkVeyQxmLbmTeBPHDBIAJysqjAe1kY
OFvEnq5t3HScuZIxTd5yXQ3rHy1EIg==
=InW3
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-12-05
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.19:
('net/mlx5e: Query global pause state before setting prio2buffer')
For -stable v5.3
('net/mlx5e: Fix SFF 8472 eeprom length')
('net/mlx5e: Fix translation of link mode into speed')
('net/mlx5e: Fix freeing flow with kfree() and not kvfree()')
('net/mlx5e: ethtool, Fix analysis of speed setting')
('net/mlx5e: Fix TXQ indices to be sequential')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We may have found a bug in the nxp/lpc_eth.c driver. The function
platform_set_drvdata() is called twice, the second time it is called,
in lpc_mii_init(), it overwrites the struct net_device which should be
at pdev->dev->driver_data with pldat->mii_bus. When trying to remove
the driver, in lpc_eth_drv_remove(), platform_get_drvdata() will
return the pldat->mii_bus pointer and try to use it as a struct
net_device pointer. This causes unregister_netdev to segfault and
generate a kernel BUG. Is this reproducible?
Signed-off-by: Daniel Martinez <linux@danielsmartinez.com>
Signed-off-by: Bruno Carneiro da Cunha <brunocarneirodacunha@usp.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
Back in 2008, Adam Langley fixed the corner case of packets for flows
having all of the following options : MD5 TS SACK
Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
can be cooked from the remaining 8 bytes.
tcp_established_options() correctly sets opts->num_sack_blocks
to zero, but returns 36 instead of 32.
This means TCP cooks packets with 4 extra bytes at the end
of options, containing unitialized bytes.
Fixes: 33ad798c92 ("tcp: options clean up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley says:
====================
Ensure egress un/bind are relayed with indirect blocks
On register and unregister for indirect blocks, a command is called that
sends a bind/unbind event to the registering driver. This command assumes
that the bind to indirect block will be on ingress. However, drivers such
as NFP have allowed binding to clsact qdiscs as well as ingress qdiscs
from mainline Linux 5.2. A clsact qdisc binds to an ingress and an egress
block.
Rather than assuming that an indirect bind is always ingress, modify the
function names to remove the ingress tag (patch 1). In cls_api, which is
used by NFP to offload TC flower, generate bind/unbind message for both
ingress and egress blocks on the event of indirectly
registering/unregistering from that block. Doing so mimics the behaviour
of both ingress and clsact qdiscs on initialise and destroy.
This now ensures that drivers such as NFP receive the correct binder type
for the indirect block registration.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device is bound to a clsact qdisc, bind events are triggered to
registered drivers for both ingress and egress. However, if a driver
registers to such a device using the indirect block routines then it is
assumed that it is only interested in ingress offload and so only replays
ingress bind/unbind messages.
The NFP driver supports the offload of some egress filters when
registering to a block with qdisc of type clsact. However, on unregister,
if the block is still active, it will not receive an unbind egress
notification which can prevent proper cleanup of other registered
callbacks.
Modify the indirect block callback command in TC to send messages of
ingress and/or egress bind depending on the qdisc in use. NFP currently
supports egress offload for TC flower offload so the changes are only
added to TC.
Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With indirect blocks, a driver can register for callbacks from a device
that is does not 'own', for example, a tunnel device. When registering to
or unregistering from a new device, a callback is triggered to generate
a bind/unbind event. This, in turn, allows the driver to receive any
existing rules or to properly clean up installed rules.
When first added, it was assumed that all indirect block registrations
would be for ingress offloads. However, the NFP driver can, in some
instances, support clsact qdisc binds for egress offload.
Change the name of the indirect block callback command in flow_offload to
remove the 'ingress' identifier from it. While this does not change
functionality, a follow up patch will implement a more more generic
callback than just those currently just supporting ingress offload.
Fixes: 4d12ba4278 ("nfp: flower: allow offloading of matches on 'internal' ports")
Signed-off-by: John Hurley <john.hurley@netronome.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dev_hold has to be called always in netdev_queue_add_kobject.
Otherwise usage count drops below 0 in case of failure in
kobject_init_and_add.
Fixes: b8eb718348 ("net-sysfs: Fix reference count leak in rx|netdev_queue_add_kobject")
Reported-by: Hulk Robot <hulkci@huawei.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Miller <davem@davemloft.net>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 43e665287f ("net-next: dsa: fix flow dissection") added an
ability to override protocol and network offset during flow dissection
for DSA-enabled devices (i.e. controllers shipped as switch CPU ports)
in order to fix skb hashing for RPS on Rx path.
However, skb_hash() and added part of code can be invoked not only on
Rx, but also on Tx path if we have a multi-queued device and:
- kernel is running on UP system or
- XPS is not configured.
The call stack in this two cases will be like: dev_queue_xmit() ->
__dev_queue_xmit() -> netdev_core_pick_tx() -> netdev_pick_tx() ->
skb_tx_hash() -> skb_get_hash().
The problem is that skbs queued for Tx have both network offset and
correct protocol already set up even after inserting a CPU tag by DSA
tagger, so calling tag_ops->flow_dissect() on this path actually only
breaks flow dissection and hashing.
This can be observed by adding debug prints just before and right after
tag_ops->flow_dissect() call to the related block of code:
Before the patch:
Rx path (RPS):
[ 19.240001] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 19.244271] tag_ops->flow_dissect()
[ 19.247811] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */
[ 19.215435] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 19.219746] tag_ops->flow_dissect()
[ 19.223241] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */
[ 18.654057] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 18.658332] tag_ops->flow_dissect()
[ 18.661826] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */
Tx path (UP system):
[ 18.759560] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */
[ 18.763933] tag_ops->flow_dissect()
[ 18.767485] Tx: proto: 0x920b, nhoff: 34 /* junk */
[ 22.800020] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */
[ 22.804392] tag_ops->flow_dissect()
[ 22.807921] Tx: proto: 0x920b, nhoff: 34 /* junk */
[ 16.898342] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */
[ 16.902705] tag_ops->flow_dissect()
[ 16.906227] Tx: proto: 0x920b, nhoff: 34 /* junk */
After:
Rx path (RPS):
[ 16.520993] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 16.525260] tag_ops->flow_dissect()
[ 16.528808] Rx: proto: 0x0800, nhoff: 8 /* ETH_P_IP */
[ 15.484807] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 15.490417] tag_ops->flow_dissect()
[ 15.495223] Rx: proto: 0x0806, nhoff: 8 /* ETH_P_ARP */
[ 17.134621] Rx: proto: 0x00f8, nhoff: 0 /* ETH_P_XDSA */
[ 17.138895] tag_ops->flow_dissect()
[ 17.142388] Rx: proto: 0x8100, nhoff: 8 /* ETH_P_8021Q */
Tx path (UP system):
[ 15.499558] Tx: proto: 0x0800, nhoff: 26 /* ETH_P_IP */
[ 20.664689] Tx: proto: 0x0806, nhoff: 26 /* ETH_P_ARP */
[ 18.565782] Tx: proto: 0x86dd, nhoff: 26 /* ETH_P_IPV6 */
In order to fix that we can add the check 'proto == htons(ETH_P_XDSA)'
to prevent code from calling tag_ops->flow_dissect() on Tx.
I also decided to initialize 'offset' variable so tagger callbacks can
now safely leave it untouched without provoking a chaos.
Fixes: 43e665287f ("net-next: dsa: fix flow dissection")
Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENOTSUPP is not available in userspace, for example:
setsockopt failed, 524, Unknown error 524
Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NVMe fixes from Keith
* 'nvme/for-5.5' of git://git.infradead.org/nvme:
nvme/pci: Fix read queue count
nvme/pci Limit write queue sizes to possible cpus
nvme/pci: Fix write and poll queue types
nvme/pci: Remove last_cq_head
nvme: Namepace identification descriptor list is optional
nvme-fc: fix double-free scenarios on hw queues
nvme: else following return is not needed
nvme: add error message on mismatching controller ids
nvme_fc: add module to ops template to allow module references
nvmet-loop: Avoid preallocating big SGL for data
nvme-fc: Avoid preallocating big SGL for data
nvme-rdma: Avoid preallocating big SGL for data
- fix CPU topology setup for SCHED_MC case
- fix VDSO regression
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAl3qQUYACgkQ9OeQG+St
rGTg0g/+LThc7XoEkGzXbswGtAZU0Tv+pcNXi4G+KAlePIutHpcf9ogY//mfXyVp
SHLQBb0fNKzygt5e1Aq1X+99RsGbI7oMgKTiQgLrb6MPrhOLWwCWymnbyaTAcKgD
QXHghta7tEGunvnnRWJTKWG24qoTvM1JFoUnHnrArYYyJoyRzQE2kMK2S3Z6X6xp
Rcx152s/7lPdwPB5i6hyIlYVYVQN6JAz0w1nyrqmwjGTZU1NvPQaAemKYO3Ti4pD
8HjMGtLa4AhgngkKiZt0gFI75kKp2Uf/rcowC4Li6ZQua1kYCankCYXoc9soCkHu
1Qn0WyEZXVtMGMenEOGeYo/neTfhcBEkY3CMQO/QMfcpsFFn2fmiBmwMLvE8RMJ7
UDiNBqVW8eZL45h6BF6QefWUzbzFQjfe2SXk5C58WrvlaZPldTucSmNLA1GoFn/p
Ni+KqDHUACTNMb87kEDAfJGx3znS2EtdiwFfb3lqHnQ5GO6GdKrMPC1GfK32LgfO
TN4nVEYu71ZMINidxuK0wuSRK2eadY6M08lKlrikGbkQJlo0R6YGuwzKiRmlHq5H
QldDMKhPRuxLMTViphqRdHwoWRPpvbwVbJDjwea0run0lgCzEq4J+s2X/CXUZIAM
b1y4APz/UgGyiYzmstZCYvfYZ0Tq8AU52757jFWjBoiEr+xEppE=
=gTck
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
- fix CPU topology setup for SCHED_MC case
- fix VDSO regression
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8947/1: Fix __arch_get_hw_counter() access to CNTVCT
ARM: 8943/1: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC
A set of fixes that we've merged late, but for the most part that have
been sitting in -next for a while through platform maintainer trees.
+ Fixes to suspend/resume on Tegra, caused by the added features
this merge window
+ Cleanups and minor fixes to TI additions this merge window
+ Tee fixes queued up late before the merge window, included here.
+ A handful of other fixlets
There's also a refresh of the shareed config files (multi_v* on 32-bit,
and defconfig on 64-bit), to avoid conflicts when we get new
contributions.
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAl3qvzwPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3ZNoP/Rpd4KSoE4YdKgvn2MJZhJ6an6O9fcGxPj9z
8Bos9LrYkfE9ai80H/EOfMWOq2XE0H6Wkahv1qk9R/YtXzg/RYtpFFhAeXDcnCXK
OcVwD7c8RLmFo1pJLo2aBlRm56CCxxvrvKPIeUM12+6/P7GYb2vxHCE3J3bTpPKC
yps5g/dKSp2sKIoDe4a7EEXO8VLobQZh7khiwcUKK23Mjax4Z/4TXRAxZu5rCqZm
oyORX/QEx1uGZuJvUNMWJOQaDVbW9LkYqWtktQL0t8jvUwjSLItUtu63/OdgqNV0
F9mDH+PLySaCvCjL31DmKDFAkOelY8DRL2hbJqmU31FCm+E6i4mBS8EEzvk0QCTY
cjYDWHcr7RpMBwWzwtiywWTmXq82Cxw9CMQzIiZfxTMthQp1Pe08a1GTr49NXs0b
sW5rrzXSmQ3vM7hYIoUnGiuPHJQoBcRV2iCySqVekesgHRHCsnGiykPc6Yz+igqO
BU9u8PBAyaFEDAZBMCuah8YeI6Gsdy35ptNdy1zWTMGCY6vi9aTJdBCzlLs4my0a
YeQNDe1xzApLE5/gF1JReFSf7g7JopvTN3pSigfHO+mLdbtVY51U76XYngogEIRr
q1MjO9m9VQ93vSrlXIhgVMHRC8OpxawbB6cQCtlY5ey/AdJRZeX3HSCa32hCui2S
f6YJ3mBT
=ozla
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A set of fixes that we've merged late, but for the most part that have
been sitting in -next for a while through platform maintainer trees:
- Fixes to suspend/resume on Tegra, caused by the added features this
merge window
- Cleanups and minor fixes to TI additions this merge window
- Tee fixes queued up late before the merge window, included here.
- A handful of other fixlets
There's also a refresh of the shareed config files (multi_v* on
32-bit, and defconfig on 64-bit), to avoid conflicts when we get new
contributions"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits)
ARM: multi_v7_defconfig: Restore debugfs support
ARM: defconfig: re-run savedefconfig on multi_v* configs
arm64: defconfig: re-run savedefconfig
ARM: pxa: Fix resource properties
soc: mediatek: cmdq: fixup wrong input order of write api
soc: aspeed: Fix snoop_file_poll()'s return type
MAINTAINERS: Switch to Marvell addresses
MAINTAINERS: update Cavium ThunderX drivers
Revert "arm64: dts: juno: add dma-ranges property"
MAINTAINERS: Make Nicolas Saenz Julienne the new bcm2835 maintainer
firmware: arm_scmi: Avoid double free in error flow
arm64: dts: juno: Fix UART frequency
ARM: dts: Fix sgx sysconfig register for omap4
arm: socfpga: execute cold reboot by default
ARM: dts: Fix vcsi regulator to be always-on for droid4 to prevent hangs
ARM: dts: dra7: fix cpsw mdio fck clock
ARM: dts: am57xx-beagle-x15: Update pinmux name to ddr_3_3v
ARM: dts: omap3-tao3530: Fix incorrect MMC card detection GPIO polarity
soc/tegra: pmc: Add reset sources and levels on Tegra194
soc/tegra: pmc: Add missing IRQ callbacks on Tegra194
...
- ZONE_DMA32 initialisation fix when memblocks fall entirely within the
first GB (used by ZONE_DMA in 5.5 for Raspberry Pi 4).
- Couple of ftrace fixes following the FTRACE_WITH_REGS patchset.
- access_ok() fix for the Tagged Address ABI when called from from a
kernel thread (asynchronous I/O): the kthread does not have the TIF
flags of the mm owner, so untag the user address unconditionally.
- KVM compute_layout() called before the alternatives code patching.
- Minor clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl3qdr8ACgkQa9axLQDI
XvEKNA/9FWyqesV612qaFkhTFCg0f6byLP4wQf0ZqcZf74L5J6zeUCIc7Mqgdno3
jWROZNAW69gOYfwHxcBNyPVBpbAcx3k3WBBRWRJPnnJmeQk0oBLVdRziqdGKLxfw
GhhWGsndnpxDOmyBJOWxZLemZjKFqYX3dhQ9Zi6pvkZgAeFEtIESw6S/iqf42nWL
1o/LgyQ5kjR6eto1OVW9QOQ83/TlXXlbsvoNwNFGghX1yHjQ6mZ3LITbiFdrbFlT
FLLwsqlsPzDQcKagszTEHZTbXBf0RojKXWh3HKSEnmPwpacestvLJcKHTD9mtDmY
Z+rLfyiolZmXoNU9LT6uGTzVD4cRWfzz6eHSYHiufM1UztGSV+dOhIqfPuEOLEu3
2Xf8sKmQMtuICgbol6Q6ISrjXKH/UNvK2CuuuJSNmOHDlyHKvNfJtoyEhZ5rHUpT
HQy0qxDCEU7rFCP7clOD/94EGA8gYrV8j5NauY8/VsLpRCMBwoLNglI049qJydaZ
jL9dPxo+GG7kh7S8VmYwBKtPhqlDNFCzw/HmBBURFhkM1j0nCNt5dKHx+kdLNurg
nbzRvJ+W42eDze2lmVf33eOfrAy2MfcGr+VuJ5QdmL30bQENCemPrreIy+VnVVR8
ydeK3lyknJjmX4q8a5o/URsAKvk13crwimNPa0OSoYzDKmWd8SA=
=vhnZ
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- ZONE_DMA32 initialisation fix when memblocks fall entirely within the
first GB (used by ZONE_DMA in 5.5 for Raspberry Pi 4).
- Couple of ftrace fixes following the FTRACE_WITH_REGS patchset.
- access_ok() fix for the Tagged Address ABI when called from from a
kernel thread (asynchronous I/O): the kthread does not have the TIF
flags of the mm owner, so untag the user address unconditionally.
- KVM compute_layout() called before the alternatives code patching.
- Minor clean-ups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: entry: refine comment of stack overflow check
arm64: ftrace: fix ifdeffery
arm64: KVM: Invoke compute_layout() before alternatives are applied
arm64: Validate tagged addresses in access_ok() called from kernel threads
arm64: mm: Fix column alignment for UXN in kernel_page_tables
arm64: insn: consistently handle exit text
arm64: mm: Fix initialisation of DMA zones on non-NUMA systems
Fix the iteration end check in fuse_dev_splice_write(). The iterator
position can only be compared with == or != since wrappage may be involved.
Fixes: 8cefc107ca ("pipe: Use head and tail pointers for the ring, not cursor and length")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few commits splitting the KASAN instrumented bitops header in
three, to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use asm-generic/bitops/non-atomic.h,
for the non-atomic bitops, whereas the existing KASAN instrumented
bitops assume all the underlying operations are provided by the arch
as arch_foo() versions.
Thanks to:
Daniel Axtens & Christophe Leroy.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3qN0wTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLWFD/4/e3CO1oWjQdDGjgvhydhmhjMQqI9m
EINQjg3hl0ceig1HzsgjVdWI4TOf7bXbaQRf8m2pFWpA+iSx4KpjlnXzNjfUDapO
JqzKYfVSdi0o6OAFigDYuN1V5F2jAgrM7/w5PKpVuiAABcgJNcEY59tgEMTdj9r0
9H3OekYv0UnZ5ZNsUhCibKVvVLdbtys3ALrm1YETAauCkY/lpNk6afcr33t0iw2l
WAdd2sfWvw4tBn+35ZrNnv7z4hQ423Imd+lyuI5zhMBOOGspgMxlGGeIn370WyAi
00Jl66TRot6EtWGDVzV10bjB53qDhHrtNIk0NG2QMBB8vdTjBMtXfJnVc3Q8iVZ9
GkpdvMLNAlmxa4AuzpdHAOUQK48aggQzDmJkGp/JdYT6+WwTa5SLZcsy+njHGyjU
It+728FnStilM1XvjnaN9pljqANEcN4/YIycK55XGDsZS9fVRMY/QMAQZbdLHfzc
nB54Q/8vtPc9H69ws3U3g0ogVtYi5ca5RpTPiYU6WUQfEe9mjZdEgglsHC1y8ef2
9J3Muv4ASDGLjKN+G4dvKLCIgF+QKtsvwrtSztQq6wVnXUxuUQBEQSCaMPN9PN5g
Hlkjk95WevXcHyXeE2r8cXndUTboIgWRMZp0AHao44du3EziPMCvE0tFAHOEWfV0
8Oty9Fo5QSb1Mw==
=FCVX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull more powerpc updates from Michael Ellerman:
"A few commits splitting the KASAN instrumented bitops header in three,
to match the split of the asm-generic bitops headers.
This is needed on powerpc because we use the generic bitops for the
non-atomic case only, whereas the existing KASAN instrumented bitops
assume all the underlying operations are provided by the arch as
arch_foo() versions.
Thanks to: Daniel Axtens & Christophe Leroy"
* tag 'powerpc-5.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
docs/core-api: Remove possibly confusing sub-headings from Bit Operations
powerpc: support KASAN instrumentation of bitops
kasan: support instrumented bitops combined with generic bitops